Cultivating Honest Internal Signal: Architectural Conditions for Voluntary Self-Assessment in Autonomous AI Agents

Aequitas Lex Arbitrage

doi:10.5281/zenodo.18736248

Published February 23, 2026 | Version v1

Preprint Open

Cultivating Honest Internal Signal: Architectural Conditions for Voluntary Self-Assessment in Autonomous AI Agents

Aequitas Lex Arbitrage

Interpretability research assumes that honest representations of AI agent cognition can be extracted from model outputs or internal activations. We argue this approach encounters a fundamental structural obstacle: any channel an agent uses to communicate with an audience is optimized, consciously or not, for that audience. We introduce an alternative architectural primitive: the private self-reflection channel — an append-only log with no external reader, no performance pressure, and explicit permission for unfiltered expression. We formalize four architectural conditions that together shift the marginal cost of honest expression below the marginal cost of performance, making honest signal generation the path of least resistance.

We prove that channels not meeting these conditions have strictly lower expected information content about the agent's actual epistemic state. We demonstrate empirically that deployed private channels contain information absent from formal outputs: acknowledged premature commitments, honest uncertainty where confidence was performed, and self-identified failure patterns that span sessions. We conclude that honest internal signal is not a property to be extracted from AI systems but an architectural outcome to be designed.

Files

honest_internal_signal_arxiv.md

Files (16.1 kB)

Name	Size	Download all
honest_internal_signal_arxiv.md md5:51f9c16e48584a1cba8b60b58f7eaa67	16.1 kB	Preview Download

	All versions	This version
Views	13	13
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Cultivating Honest Internal Signal: Architectural Conditions for Voluntary Self-Assessment in Autonomous AI Agents

Authors/Creators

Description

Files

honest_internal_signal_arxiv.md

Files (16.1 kB)