AgentAssay: Token-Efficient Regression Testing for Non-Deterministic AI Agent Workflows

Published March 3, 2026 | Version v1

Preprint Open

AgentAssay is the first token-efficient framework for regression testing non-deterministic AI agent

workflows. Autonomous AI agents are deployed at unprecedented scale, yet no principled methodology

existed for verifying that an agent has not regressed after changes to its prompts, tools, models, or

orchestration logic. AgentAssay introduces stochastic three-valued verdicts (PASS/FAIL/INCONCLUSIVE)

grounded in statistical hypothesis testing, five-dimensional agent coverage metrics, agent-specific

mutation testing operators, and a token-efficient testing pipeline that achieves 78-100% cost

reduction while maintaining rigorous statistical guarantees.

Key results from experiments across 5 models (GPT-5.2, Claude Sonnet 4.6, Mistral-Large-3,

Llama-4-Maverick, Phi-4), 3 scenarios, and 6,500 trials ($59.64 total cost):

- SPRT achieves 78% trial savings across all scenarios

- Behavioral fingerprinting achieves 79% detection power where binary pass/fail testing has 0%

- Full token-efficient pipeline achieves 100% cost savings through trace-first offline analysis

The implementation comprises ~20,000 lines of Python with 751 tests and adapters for 10 agent

frameworks (LangGraph, CrewAI, AutoGen, OpenAI, smolagents, Semantic Kernel, Bedrock, MCP, Vertex AI,

and generic).

Technical Report. 52 pages, 5 figures, 9 theorems, 42 formal definitions.

Files

Name	Size	Download all
main.pdf md5:7415fbeb3d0e2682015434df01b3fec8	469.4 kB	Preview Download

Cites: Preprint: arXiv:2602.22302 (arXiv)
Is compiled by: Other: https://www.varunpratap.com/products/agentassay (URL)
Is documented by: Other: https://qualixar.com/products/agentassay (URL)

References: arXiv:2602.22302 (Agent Behavioral Contracts — prior work by same author)