Retrieval-Governed Context: Scope-Gated Selection of Instructions and Tools for LLMs and Intelligent Agents
Description
Dynamically assembling system prompts (selecting instructions, tools, and safety policies at runtime) is increasingly common in production LLM systems (Chase, 2022; Mikinka, 2025). We present retrieval-governed context, an architecture that attaches structured governance metadata (scope gates, priority weights, conflict resolution, and mandatory injection) directly to instructions and tools as first-class schema fields, then layers this governance on top of standard retrieval backends. Instructions and tools are unified in a single typed corpus, enabling a shared pipeline for scope-gated retrieval, composition, and safety enforcement. We position BEAR as a systematic substrate for context engineering, comparable in behavioral output to careful hand-authored prompt engineering, but organized around scalable authoring, governance, and unified tool selection.
We evaluate on three corpora of increasing metadata richness. On the ToolBench benchmark (3,225 APIs from the benchmark evaluation split, 47 categories), governance significantly improves every retrieval backend tested (p<0.0001), with effect sizes inversely related to embedding quality (Cohen's d = 0.50 for BM25 down to d = 0.21 for the strongest dense model). On MetaTool (199 tools without category metadata), governance has negligible effect on recall (d<0.08, n.s.), confirming that the benefit scales with metadata richness. When an LLM generates category tags from tool descriptions in a single offline pass (MetaTool+Tags), governance improves Recall@5 by 18–30 percentage points across backends (all p<0.0001). When tags are instead inferred from query text alone at runtime, governance hurts retrieval (d=-0.20 to -0.59), demonstrating that tag alignment between corpus and query is required.
On a purpose-built 58-instruction behavioral corpus with rich multi-tag metadata, governance is the dominant factor. Removing scope gates degrades F1 by 33%, and retrieval-governed prompting matches conditional prompt assembly on anticipated contexts while achieving 0.917–1.000 recall on semantically discoverable instructions (vs. 0.000; McNemar p<0.003), with 90% token savings on tool retrieval and zero cross-domain tool leakage.
Our contribution is not a new retrieval algorithm but a backend-agnostic governance layer that makes context engineering systematic, with evidence that governance metadata quality and embedding quality are orthogonal contributors to retrieval-based context engineering.
Files
Hwang_Retrieval-Governed-Context.pdf
Files
(626.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:18361302d8aeb49175a876c2c0520f29
|
626.8 kB | Preview Download |
Additional details
Dates
- Submitted
-
2026-04-27Submitted to ACM Transactions on Intelligent Systems and Technology
Software
- Repository URL
- https://github.com/snhwang/bear
- Development Status
- Active