Published June 2, 2026 | Version v1
Other Open

No Better Than Behavioral: A Residual Velocity-Freezing Fingerprint Predicts Agent WANDERING No Better Than the Cheap Tool-Entropy Detector

Authors/Creators

  • 1. OpenInterpretability

Description

Companion note to the four-paper WANDERING arc, reporting a pre-registered NEGATIVE. Motivated by the context-rot literature (long-context degradation is representational, not retrieval; arXiv:2510.05381), we ask whether the residual stream carries an earlier or better detector of long-horizon agent WANDERING than the cheap probe-free tool-entropy signal -- does the geometry rot before the behavior does? On the same 99 Qwen3.6-27B SWE-bench Pro trajectories (CPU re-analysis, no new compute): Stage 1 (raw residual geometry, no SAE) finds a real but weak fingerprint, representational velocity-freezing -- WANDERING trajectories settle toward an attractor sooner (smaller early per-turn state change), directionally consistent across all five layers (4/5 raw p<0.05, length-controlled), with one mid-network layer (L31) clearing a pre-registered trend-and-divergence conjunction (p=0.015), but nothing surviving multiple-comparison correction over the 4x5 metric-layer grid. Stage 2 (the decisive predictive test) shows the fingerprint adds nothing: early velocity at L31 reaches AUROC 0.695, statistically indistinguishable from the fair early behavioral baseline (tool_entropy_first10, 0.688; paired bootstrap delta=+0.008, 95% CI [-0.170,+0.211]), and clearly below the deployed late detector (0.888); as a sharp alarm at <=5% false-positive it catches only 1-3 of 20 WANDERING, far fewer than the deployed detector and with too few overlapping detections to measure a lead-time advantage. The residual fingerprint of context rot is real but downstream-redundant -- it carries no predictive information beyond the cheap behavioral signal, strengthening the arc: for this failure mode, watching the cheap behavior is as good as or better than reading the residual stream. This is a statement about prediction and redundancy, not causation. Pre-registration, both stage results, and analysis code are in the GitHub repository under paper/context_rot/.

Files

no_better_than_behavioral.pdf

Files (219.2 kB)

Name Size Download all
md5:07a690c6a91414cbc2ad0a46f80bb45f
219.2 kB Preview Download

Additional details