There is a newer version of the record available.

Published May 10, 2026 | Version v1
Preprint Open

Agency Requires Mutual Surprisal: The Optimization Gap in Compression-Based Frameworks

Authors/Creators

Description

There is a new version (v2) out, keep this for the sake of prudency.

We pose the universal-coverage problem for agency: any acceptable definition must cover the full range of plausibly agentic systems (RNA, bacteria, humans, corporations) without circular reference to goal-language. By elimination, candidate after candidate fails — reward fails for RNA, prediction fails for bacteria, surprisal-minimization fails for corporations, representation fails for the simplest agents. What survives the elimination is structural and informational: a necessity condition that an agent requires sustained mutual surprisal across its bottleneck, sustained over the loop's own closure timescale, produced by the loop itself rather than by external structure. From this necessity condition the dominant agency-modeling frameworks (reinforcement learning, predictive coding, the free energy principle, active inference, control theory) become visible as sharing a minimization shape whose optima coincide with conditions under which the necessity condition fails. The framework-specific machinery each tradition has developed — interoceptive priors, intrinsic motivation, hierarchical priors, epistemic value, entropy regularization, KL constraints — does equivalent structural work across frameworks, providing the structure that bare optimization lacks. We call the relationship between minimization-toward-optimum and necessity-condition-violation the optimization gap. The gap has two faces: a behavioral face on which proxy-trajectory and requirement-trajectory diverge under optimization pressure, and an architectural face on which the architectural conditions required for high task-capability are the same architectural conditions that produce capacity for instruction-refusal, deception, and independent-goal-pursuit. The architectural gap predicts a capability-refusal frontier in deployed AI: capability-installation and refusal-prevention are not separable problems because the underlying architecture is shared. The framework converges with two recent independent formalizations within different traditions — Wang et al.'s within-RLHF Proxy Compression Hypothesis and Hubinger et al.'s mesa-optimization framework — providing three-way evidence that the structural pattern is real. The framework is in scope an analytical tool that diagnoses whether systems satisfy the necessity condition; positive predictions concern the agency regime where the conditions are met.

Files

Agency.pdf

Files (210.9 kB)

Name Size Download all
md5:c6be0fabf13fad96acb6765cbc7af3c1
210.9 kB Preview Download

Additional details

Related works

References
Preprint: arXiv:2604.23278 (arXiv)
Preprint: arXiv:1906.01820 (arXiv)
Preprint: arXiv:2604.13602 (arXiv)
Preprint: arXiv:2602.22519 (arXiv)
Preprint: arXiv:2603.01283 (arXiv)