Agent Learning: Execution-Aware Policy Optimization for LLM Tool Systems

Bamigbade, Opeyemi; Oni, Stephen

doi:10.5281/zenodo.19339516

Published March 30, 2026 | Version v1

Report Open

Agent Learning: Execution-Aware Policy Optimization for LLM Tool Systems

Large language model (LLM) agents increasingly rely on external tools and structured workflows to accomplish complex tasks. While recent work has emphasized improving reasoning quality and prompting strategies, the orchestration layer responsible for tool selection, execution sequencing, escalation, and constraint handling remains largely heuristic.

This work introduces Agent Learning, a framework that formalizes orchestration in tool-augmented LLM systems as an execution-aware policy optimization problem. An agent is modeled as a triplet consisting of a fixed stochastic reasoning module, a set of external tools, and an orchestration policy mapping system states to actions.

An execution-aware cost functional captures latency, monetary cost, constraint violations, execution failures, and divergence between planned and observed outcomes. The orchestration policy is optimized with respect to this cost while keeping the reasoning module and tool environment fixed.

Empirical evaluation in a controlled synthetic tool environment demonstrates that learned policies consistently outperform heuristic baselines in minimizing execution cost and improving constraint satisfaction across multiple random seeds.

Code: https://github.com/OBA-Research/agent-learning

Files

agent-learning.pdf

Files (239.0 kB)

Name	Size	Download all
agent-learning.pdf md5:cdc68311f35e1ddaa9117f26c52fd339	239.0 kB	Preview Download

	All versions	This version
Views	76	76
Downloads	74	74
Data volume	22.7 MB	22.7 MB

Agent Learning: Execution-Aware Policy Optimization for LLM Tool Systems

Authors/Creators

Description

Files

agent-learning.pdf

Files (239.0 kB)