Published May 26, 2026
| Version 1
Preprint
Open
Thinking Mode Induces Confidence Compression in Reasoning LLMs: A Pre-Registered Type-2 SDT Analysis
Authors/Creators
Description
Reasoning LLMs generate explicit thinking traces before answering, improving accuracy on knowledge-intensive tasks. We tested whether thinking mode also improves the discriminative power of token-probability-derived confidence signals. In a pre-registered study (OSF: osf.io/2jwub), three 7-8B reasoning models answered 2,000 TriviaQA items in thinking and non-thinking mode. Thinking mode improved accuracy but significantly reduced NLP-based AUROC2 in all three models (Qwen3-8B: delta = -0.231, 95% CI [-0.262, -0.198]). The mechanism is NLP compression: thinking-mode NLP variance dropped to 16% of non-thinking levels. Stage 1 of 2 in a pre-registered design.
Files
reasoning_metacog_preprint_final.pdf
Files
(624.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:796d039b62777561f4210ee98e095f95
|
624.7 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Other: https://osf.io/2jwub/ (URL)
- Is supplemented by
- Software: https://github.com/synthiumjp/reasoning-metacog (URL)
Dates
- Available
-
2026
Software
- Repository URL
- https://github.com/synthiumjp/reasoning-metacog
- Programming language
- Python
- Development Status
- Active
References
- Cacioli, J.-P. (2026). Beyond the mean: Type-2 signal detection theory for LLM metacognition. arXiv:2603.25112.
- Cacioli, J.-P. (2026). Signal detection theory for large language models. arXiv:2603.14893.
- Cacioli, J.-P. (2026). Quantisation and metacognition in LLMs. arXiv:2604.08976.
- Cacioli, J.-P. (2026). Verbal confidence saturation in 3-9B open-weight instruction-tuned LLMs. arXiv:2604.22215.
- Cacioli, J.-P. (2026). A validity screening protocol for LLM confidence signals. arXiv:2604.17714.
- DeepSeek-AI. (2025). DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. arXiv:2501.12948.
- Fleming, S. M., & Lau, H. C. (2014). How to measure metacognition. Frontiers in Human Neuroscience, 8, 443.
- Maniscalco, B., & Lau, H. (2012). A signal detection theoretic approach for estimating metacognitive sensitivity from confidence ratings. Consciousness and Cognition, 21(1), 422-430.
- Miao, M. M., & Ungar, L. (2026). Closing the confidence-faithfulness gap in large language models. arXiv:2603.25052.
- Qwen Team. (2025). Qwen3 technical report. https://qwenlm.github.io/blog/qwen3/
- Yoon, D., Kim, S., Yang, S., Kim, S., Kim, S., Kim, Y., Choi, E., Kim, Y., & Seo, M. (2025). Reasoning models better express their confidence. NeurIPS 2025. arXiv:2505.14489.