On the Structural Requirements for Coherent Reasoning in Language Models
Authors/Creators
Description
Recent empirical work reveals a systematic failure mode in Large Language Models (LLMs): models that perform well on static benchmarks often fail to maintain coherence across sequences of inferences. Specifically, models may estimate probabilities accurately but place bets that contradict them; confidence signals often fail to predict commitment to an answer; and belief updates after new evidence can paradoxically degrade accuracy.
This paper argues that these are not merely calibration issues but consequences of a fundamental architectural limitation: stateless inference. Current architectures generate rich internal representations (confidence, correctness predictions, epistemic metadata) during a forward pass but discard them immediately upon output generation. Consequently, the model lacks the persistent substrate required to bind distinct inferences into a coherent reasoning chain.
Building on the Tension Principle, this work outlines the Minimal Architecture required to bridge the gap between local competence and temporal coherence. It proposes four necessary structural additions:
- Temporal Persistence: Rolling logs that retain internal states, not just textual outputs.
- Self-Referential Checking: Mechanisms to compute "tension" between predicted reliability and actual performance.
- Delta-Tracking: Stability signals that detect brittleness in reasoning independent of ground truth.
- Resolution Mechanisms: Feedback loops that use these signals to modulate future behavior.
The paper concludes by contrasting "Psychological" alignment approaches (which assume a continuous agent) with Ecological and Architectural strategies that are better suited to the transient nature of LLM instantiations.
Version Note
A preliminary draft of this work was uploaded with an incomplete formulation of the Tension Principle in the context of truth-free environments. This version replaces that draft.
The corrected manuscript clarifies that tension is always defined as the gap between predicted and realized reliability, and that realized reliability may be derived either from correctness (when available) or from internal coherence (Δ₁/Δ₂) when no external feedback exists.
The earlier draft’s references to correctness-based tension should therefore be treated as a special case of the general definition.
This revision does not modify TTP I; it corrects the TTP II presentation and unifies the framework across supervised, weakly supervised, and truth-free continuous learning.
Files
On the Structural Requirements for Coherent Reasoning in Language Models_v2.pdf
Files
(398.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:3d4c19f239e23141700b6cfc772d79e4
|
398.2 kB | Preview Download |
Additional details
Related works
- Continues
- Preprint: 10.5281/zenodo.17634946 (DOI)