Retrieval-Governed Context: Scope-Gated Selection of Instructions and Tools for LLMs and Intelligent Agents

Hwang, Scott

doi:10.5281/zenodo.19705464

Published April 15, 2026 | Version v1

Preprint Open

Retrieval-Governed Context: Scope-Gated Selection of Instructions and Tools for LLMs and Intelligent Agents

Hwang, Scott¹

1. The Pennsylvania State University College of Medicine

Dynamically assembling system prompts (selecting instructions, tools, and safety policies at runtime) is increasingly common in production LLM systems (Chase, 2022; Mikinka, 2025). We present retrieval-governed context, an architecture that attaches structured governance metadata (scope gates, priority weights, conflict resolution, and mandatory injection) directly to instructions and tools as first-class schema fields, then layers this governance on top of standard retrieval backends. Instructions and tools are unified in a single typed corpus, enabling a shared pipeline for scope-gated retrieval, composition, and safety enforcement. We position BEAR as a systematic substrate for context engineering, comparable in behavioral output to careful hand-authored prompt engineering, but organized around scalable authoring, governance, and unified tool selection.

We evaluate on three corpora of increasing metadata richness. On the ToolBench benchmark (3,225 APIs from the benchmark evaluation split, 47 categories), governance significantly improves every retrieval backend tested (p<0.0001), with effect sizes inversely related to embedding quality (Cohen's d = 0.50 for BM25 down to d = 0.21 for the strongest dense model). On MetaTool (199 tools without category metadata), governance has negligible effect on recall (d<0.08, n.s.), confirming that the benefit scales with metadata richness. When an LLM generates category tags from tool descriptions in a single offline pass (MetaTool+Tags), governance improves Recall@5 by 18–30 percentage points across backends (all p<0.0001). When tags are instead inferred from query text alone at runtime, governance hurts retrieval (d=-0.20 to -0.59), demonstrating that tag alignment between corpus and query is required.

On a purpose-built 58-instruction behavioral corpus with rich multi-tag metadata, governance is the dominant factor. Removing scope gates degrades F1 by 33%, and retrieval-governed prompting matches conditional prompt assembly on anticipated contexts while achieving 0.917–1.000 recall on semantically discoverable instructions (vs. 0.000; McNemar p<0.003), with 90% token savings on tool retrieval and zero cross-domain tool leakage.

Our contribution is not a new retrieval algorithm but a backend-agnostic governance layer that makes context engineering systematic, with evidence that governance metadata quality and embedding quality are orthogonal contributors to retrieval-based context engineering.

Files

Hwang_Retrieval-Governed-Context.pdf

Files (626.8 kB)

Name	Size	Download all
Hwang_Retrieval-Governed-Context.pdf md5:18361302d8aeb49175a876c2c0520f29	626.8 kB	Preview Download

Additional details

Submitted: 2026-04-27

Submitted to ACM Transactions on Intelligent Systems and Technology

Repository URL: https://github.com/snhwang/bear
Development Status: Active

	All versions	This version
Views	9	9
Downloads	7	7
Data volume	8.1 MB	8.1 MB

Hwang_Retrieval-Governed-Context.pdf

Files (626.8 kB)

Dates

Software

Retrieval-Governed Context: Scope-Gated Selection of Instructions and Tools for LLMs and Intelligent Agents

Authors/Creators

Description

Files

Hwang_Retrieval-Governed-Context.pdf

Files (626.8 kB)

Additional details

Dates

Software