On-Policy Oracle Injection for Fine-Tuning Tool-Using Agents

Katz, Shahar

doi:10.5281/zenodo.20502390

Published June 1, 2026 | Version v1

Publication Open

On-Policy Oracle Injection for Fine-Tuning Tool-Using Agents

Katz, Shahar (Researcher)¹

1. ZyG Edge Ltd

Fine-tuning a tool-using AI agent is fundamentally different from fine-tuning a general purpose LLM, and is identified as an open problem in the on-policy distillation literature (Song & Zheng, 2026). The agent operates within a narrow distribution defined by its system prompt, available tools, and data access—a distribution the base model was never trained on. When a human expert corrects the agent’s decision, the standard approach— having a corrector LLM generate the corrected output—fails to teach the agent’s distribution, because the corrector is not the agent. In the context of a tool-using agent, even the same LLM is off-policy if it is not the same agent that generates the correction.

We introduce On-Policy Oracle Injection (OPOI): inject the expert’s verdict into the agent’s prompt as a minimal steering signal, run the actual agent end-to-end with its real tools, capture the oracle-guided trace, and clean the oracle references before training. We identify three failure modes of the off-policy alternative specific to tool-using agents— vocabulary mismatch, information leakage, and conflated optimisation targets—and show how OPOI addresses each.

Empirically, we compare two production models fine-tuned on the same task: one trained on OPOI data, one on Off-Policy Corrector Distillation (OPCD) data. The OPOI model generates with 16% higher confidence on identical validation prompts, closing 49% of the base → perfect confidence gap versus 37% for OPCD. Both achieve comparable task accuracy (+3% over baseline), but the OPOI model is significantly more consistent across runs (consensus 0.894 vs 0.854). Both methods teach the right answers; only on-policy training teaches the agent’s distribution.

Files

On Policy Oracle Injection for Fine Tuning Tool Using Agents.pdf

Files (411.0 kB)

Name	Size	Download all
On Policy Oracle Injection for Fine Tuning Tool Using Agents.pdf md5:94eeaf3bfbbdff9e34e66709a5d00f6a	411.0 kB	Preview Download

	All versions	This version
Views	10	10
Downloads	5	5
Data volume	4.5 MB	4.5 MB

On-Policy Oracle Injection for Fine-Tuning Tool-Using Agents

Authors/Creators

Description

Files

On Policy Oracle Injection for Fine Tuning Tool Using Agents.pdf

Files (411.0 kB)