Tool-Entropy Collapse: A Cross-Architecture Signature of Agent WANDERING Failure
Description
v0.9 — calibration revision. Removed "breakthrough" language throughout (abstract, §9.1, §10, §13 conclusion) in favor of more measured phrasing ("most promising candidate signal in this work"). Conclusion now explicitly flags the W/S ≈ 0.41 ratio match between Qwen and Llama as the most suggestive pattern that merits independent replication on additional models before being treated as a discovery, and explicitly notes the N=20 sample size on the Qwen primary dataset and production-scale FP cost implications. Mid-layer ablation claim hedged between edge-layer specificity and layer-count effect interpretations. Same 6-detector arc, same numeric results, same scope (multi-turn code-execution agent tasks with rich action spaces). Original v1 PDF remains accessible at doi.org/10.5281/zenodo.20368601.
Paper summary: We identify a 34% blind spot in probe-based LLM agent failure monitoring on Qwen3.6-27B SWE-bench Pro: the WANDERING sub-class where probe says "success" but agent never emits finish_tool. We test six detector designs across three signal channels (text, residual cross-layer, action entropy). The most promising candidate is tool-use entropy collapse: WANDERING agents collapse onto a small set of repeated tool calls (W/S median ratio ≈ 0.41 in Qwen and Llama, 0.71 in GPT-5), enabling a Tier-3 autonomous-termination detector at 70% recall × 5% false-positive rate on the primary dataset.
Cross-architecture validation: Llama-70b (n=2,315, p<10⁻¹⁵, ratio ≈0.41) and GPT-5 router (n=1,419, p=8.9×10⁻³⁵, ratio ≈0.71) confirm direction. Cross-task validation on METR MALT (15+ task families) is NULL (p=0.81), scoping the claim to multi-turn code-execution agent tasks with rich action spaces.
Reproducibility: all code, per-trajectory output JSONs, and figure-generation scripts at GitHub under Apache-2.0. OpenInterp Phase 6 dataset (99 trajectories × per-turn residuals at L11/L23/L31/L43/L55 in bf16 safetensors) will be released at HuggingFace upon paper acceptance.
Notes
Files
fig1_cross_arch_entropy.pdf
Files
(680.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:ebbc4bdc7896bceee0373c9a7044f407
|
41.2 kB | Preview Download |
|
md5:ec2f1e081e3a0bc46f09e768805075dd
|
23.8 kB | Preview Download |
|
md5:9e0b8c098a6cfc875c44d60b71c34404
|
32.2 kB | Preview Download |
|
md5:79c298298f0738a9dbea96f9733c180d
|
24.3 kB | Preview Download |
|
md5:259a9bc2e820a43b61d7c9b45855f887
|
17.0 kB | Preview Download |
|
md5:bd0741266e41c02be063dc6eceb65113
|
512.2 kB | Preview Download |
|
md5:012cf245746e5a08786f08378b804cf5
|
30.1 kB | Download |
Additional details
Related works
- Is new version of
- Working paper: 10.5281/zenodo.20368601 (DOI)
- Is supplement to
- Software: https://github.com/OpenInterpretability/openinterp-swebench-harness (URL)