Published May 24, 2026 | Version v0.9
Working paper Open

Tool-Entropy Collapse: A Cross-Architecture Signature of Agent WANDERING Failure

Authors/Creators

  • 1. OpenInterpretability

Description

v0.9 — calibration revision. Removed "breakthrough" language throughout (abstract, §9.1, §10, §13 conclusion) in favor of more measured phrasing ("most promising candidate signal in this work"). Conclusion now explicitly flags the W/S ≈ 0.41 ratio match between Qwen and Llama as the most suggestive pattern that merits independent replication on additional models before being treated as a discovery, and explicitly notes the N=20 sample size on the Qwen primary dataset and production-scale FP cost implications. Mid-layer ablation claim hedged between edge-layer specificity and layer-count effect interpretations. Same 6-detector arc, same numeric results, same scope (multi-turn code-execution agent tasks with rich action spaces). Original v1 PDF remains accessible at doi.org/10.5281/zenodo.20368601.

Paper summary: We identify a 34% blind spot in probe-based LLM agent failure monitoring on Qwen3.6-27B SWE-bench Pro: the WANDERING sub-class where probe says "success" but agent never emits finish_tool. We test six detector designs across three signal channels (text, residual cross-layer, action entropy). The most promising candidate is tool-use entropy collapse: WANDERING agents collapse onto a small set of repeated tool calls (W/S median ratio ≈ 0.41 in Qwen and Llama, 0.71 in GPT-5), enabling a Tier-3 autonomous-termination detector at 70% recall × 5% false-positive rate on the primary dataset.

Cross-architecture validation: Llama-70b (n=2,315, p<10⁻¹⁵, ratio ≈0.41) and GPT-5 router (n=1,419, p=8.9×10⁻³⁵, ratio ≈0.71) confirm direction. Cross-task validation on METR MALT (15+ task families) is NULL (p=0.81), scoping the claim to multi-turn code-execution agent tasks with rich action spaces.

Reproducibility: all code, per-trajectory output JSONs, and figure-generation scripts at GitHub under Apache-2.0. OpenInterp Phase 6 dataset (99 trajectories × per-turn residuals at L11/L23/L31/L43/L55 in bf16 safetensors) will be released at HuggingFace upon paper acceptance.

Notes

v0.9 (2026-05-24) — calibration revision of v0.8 (10.5281/zenodo.20368601). Same empirical results; replaces 'breakthrough' framing with 'most promising candidate'; adds explicit hedge on W/S 0.41 ratio match (merits independent replication); flags N=20 primary dataset sample size and production-scale FP cost in deployment guidance.

Files

fig1_cross_arch_entropy.pdf

Files (680.8 kB)

Name Size Download all
md5:ebbc4bdc7896bceee0373c9a7044f407
41.2 kB Preview Download
md5:ec2f1e081e3a0bc46f09e768805075dd
23.8 kB Preview Download
md5:9e0b8c098a6cfc875c44d60b71c34404
32.2 kB Preview Download
md5:79c298298f0738a9dbea96f9733c180d
24.3 kB Preview Download
md5:259a9bc2e820a43b61d7c9b45855f887
17.0 kB Preview Download
md5:bd0741266e41c02be063dc6eceb65113
512.2 kB Preview Download
md5:012cf245746e5a08786f08378b804cf5
30.1 kB Download

Additional details

Related works

Is new version of
Working paper: 10.5281/zenodo.20368601 (DOI)
Is supplement to
Software: https://github.com/OpenInterpretability/openinterp-swebench-harness (URL)