Published April 15, 2026 | Version v1
Preprint Open

Retrieval-Governed Context: Scope-Gated Selection of Instructions and Tools for LLMs and Intelligent Agents

Authors/Creators

  • 1. The Pennsylvania State University College of Medicine

Description

Dynamically assembling system prompts (selecting instructions, tools, and safety policies at runtime) is increasingly common in production LLM systems (Chase, 2022; Mikinka, 2025). We present retrieval-governed context, an architecture that attaches structured governance metadata (scope gates, priority weights, conflict resolution, and mandatory injection) directly to instructions and tools as first-class schema fields, then layers this governance on top of standard retrieval backends. Instructions and tools are unified in a single typed corpus, enabling a shared pipeline for scope-gated retrieval, composition, and safety enforcement. We position BEAR as a systematic substrate for context engineering, comparable in behavioral output to careful hand-authored prompt engineering, but organized around scalable authoring, governance, and unified tool selection.

We evaluate on three corpora of increasing metadata richness. On the ToolBench benchmark (3,225 APIs from the benchmark evaluation split, 47 categories), governance significantly improves every retrieval backend tested (p<0.0001), with effect sizes inversely related to embedding quality (Cohen's d = 0.50 for BM25 down to d = 0.21 for the strongest dense model). On MetaTool (199 tools without category metadata), governance has negligible effect on recall (d<0.08, n.s.), confirming that the benefit scales with metadata richness. When an LLM generates category tags from tool descriptions in a single offline pass (MetaTool+Tags), governance improves Recall@5 by 18–30 percentage points across backends (all p<0.0001). When tags are instead inferred from query text alone at runtime, governance hurts retrieval (d=-0.20 to -0.59), demonstrating that tag alignment between corpus and query is required.

On a purpose-built 58-instruction behavioral corpus with rich multi-tag metadata, governance is the dominant factor. Removing scope gates degrades F1 by 33%, and retrieval-governed prompting matches conditional prompt assembly on anticipated contexts while achieving 0.917–1.000 recall on semantically discoverable instructions (vs. 0.000; McNemar p<0.003), with 90% token savings on tool retrieval and zero cross-domain tool leakage.

Our contribution is not a new retrieval algorithm but a backend-agnostic governance layer that makes context engineering systematic, with evidence that governance metadata quality and embedding quality are orthogonal contributors to retrieval-based context engineering.

Files

Hwang_Retrieval-Governed-Context.pdf

Files (626.8 kB)

Name Size Download all
md5:18361302d8aeb49175a876c2c0520f29
626.8 kB Preview Download

Additional details

Dates

Submitted
2026-04-27
Submitted to ACM Transactions on Intelligent Systems and Technology

Software

Repository URL
https://github.com/snhwang/bear
Development Status
Active