Agent Learning: Execution-Aware Policy Optimization for LLM Tool Systems
Authors/Creators
Description
Large language model (LLM) agents increasingly rely on external tools and structured workflows to accomplish complex tasks. While recent work has emphasized improving reasoning quality and prompting strategies, the orchestration layer responsible for tool selection, execution sequencing, escalation, and constraint handling remains largely heuristic.
This work introduces Agent Learning, a framework that formalizes orchestration in tool-augmented LLM systems as an execution-aware policy optimization problem. An agent is modeled as a triplet consisting of a fixed stochastic reasoning module, a set of external tools, and an orchestration policy mapping system states to actions.
An execution-aware cost functional captures latency, monetary cost, constraint violations, execution failures, and divergence between planned and observed outcomes. The orchestration policy is optimized with respect to this cost while keeping the reasoning module and tool environment fixed.
Empirical evaluation in a controlled synthetic tool environment demonstrates that learned policies consistently outperform heuristic baselines in minimizing execution cost and improving constraint satisfaction across multiple random seeds.
Code: https://github.com/OBA-Research/agent-learning
Files
agent-learning.pdf
Files
(239.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:cdc68311f35e1ddaa9117f26c52fd339
|
239.0 kB | Preview Download |