Published March 30, 2026 | Version v1
Report Open

Agent Learning: Execution-Aware Policy Optimization for LLM Tool Systems

Description

Large language model (LLM) agents increasingly rely on external tools and structured workflows to accomplish complex tasks. While recent work has emphasized improving reasoning quality and prompting strategies, the orchestration layer responsible for tool selection, execution sequencing, escalation, and constraint handling remains largely heuristic.

This work introduces Agent Learning, a framework that formalizes orchestration in tool-augmented LLM systems as an execution-aware policy optimization problem. An agent is modeled as a triplet consisting of a fixed stochastic reasoning module, a set of external tools, and an orchestration policy mapping system states to actions.

An execution-aware cost functional captures latency, monetary cost, constraint violations, execution failures, and divergence between planned and observed outcomes. The orchestration policy is optimized with respect to this cost while keeping the reasoning module and tool environment fixed.

Empirical evaluation in a controlled synthetic tool environment demonstrates that learned policies consistently outperform heuristic baselines in minimizing execution cost and improving constraint satisfaction across multiple random seeds.

Code: https://github.com/OBA-Research/agent-learning

Files

agent-learning.pdf

Files (239.0 kB)

Name Size Download all
md5:cdc68311f35e1ddaa9117f26c52fd339
239.0 kB Preview Download