State-Diff as the Universal Agent Score

Figurelli, Rogério

doi:10.5281/zenodo.18645449

Published February 15, 2026 | Version v1

Preprint Open

State-Diff as the Universal Agent Score

Figurelli, Rogério (Researcher)

Tool-using agents increasingly act across heterogeneous systems — tickets, codebases, documents, databases, calendars, CRMs — yet their evaluation often remains narrative: a judge model reads an agent’s explanation and decides whether the task “sounds done.” This creates a systematic failure mode: false positives where the narrative is coherent but the world-state remains unchanged, partially changed, or changed incorrectly. Our work proposes State-Diff (SD) as a universal outcome contract and score for agents: define success in terms of a measurable, typed change between a pre-state S₀ and a target state S★, scored by a field-aligned distance on the resulting diff. SD is designed to be portable across tools, replayable for governance, and resistant to ambiguity laundering and post-hoc rationalization. We formalize state schemas, a diff operator, and a universal scoring function U(S₀, S_T, S★) that supports abstention under partial observability. We argue that SD reduces narrative-induced false positives by replacing “story completion” with “state completion,” aligning evaluation with audit requirements and risk management.

Files

State-Diff as the Universal Agent Score.pdf

Files (82.1 kB)

Name	Size	Download all
State-Diff as the Universal Agent Score.pdf md5:b348f5e2a666acf950e7da1f39a4da41	82.1 kB	Preview Download

	All versions	This version
Views	22	22
Downloads	5	5
Data volume	492.5 kB	492.5 kB

State-Diff as the Universal Agent Score

Authors/Creators

Description

Files

State-Diff as the Universal Agent Score.pdf

Files (82.1 kB)