There is a newer version of the record available.

Published June 2, 2026 | Version 2
Preprint Open

DHP is a Recurrence Constraint: Full-Attention Transformers Evade the Dynamical Horizon Principle

  • 1. DuoNeural Research Lab

Description

v2 (2026-06-02): Spatial analog negative result + Cosmos3 physical domain probe. RTM rotation requirement (θ≥87°) elevated to central finding. Abstract prose restructured: bridging sentence for trade-off framing, causal chain built for RTM rotation before Paper 22 unification. Cosmos3-Nano arch description corrected. Neuron structural audit passed.

The Dynamical Horizon Principle (DHP) is a universal constraint observed across diverse recurrent architectures (LSTMs, RWKV-7, CTMs, and noisy quantum recurrent circuits), enforcing a strict relation between task length T and the memory decay timescale : T_conv/ ≈ 0.72. In this work, we demonstrate that DHP is not a general property of gradient descent, but specifically a recurrence constraint. We evaluate sequence parity (temporal XOR) across three distinct regimes: 1. Recurrent Models (LSTM): Exhibit exponential training complexity growth from T=2 through T=10, culminating in optimization failure under standard training budgets (3k–10k steps) at T 12 (DHP cliff); convergence is recovered at extended budget (30k steps), confirming an exponential time barrier rather than a topological impossibility. The effective memory timescale implied by this cliff is _eff = T_cliff/0.72 ≈ 16.7 steps. 2. Full-Attention Transformers: Structurally evade Markovian decay. Training convergence time remains flat ( 140 steps) across all tested sequence lengths T 48, achieving 100\% convergence across all seeds. 3. Window-Attention Transformers: Exhibit a binary receptive-field visibility cliff at exactly T = 2W. For W=16, convergence is immediate below T=32 (receptive field boundary 2W-1 = 31), but drops instantly to 0\% at T 32. For W=32, the target remains within the 2W-1 = 63 receptive field for all tested lengths (T 48), achieving 100\% convergence. We formalize the mathematics of this division: recurrence forces multiplicative gradient decay through time, while self-attention constructs a direct routing topology that bypasses recurrence decay. Window attention replaces the gradient-decay cliff with a hard visibility boundary. We conclude that DHP represents the boundary of information flow through Markovian recurrences, which attention-based models structurally circumvent.

Notes

DuoNeural Research Lab preprint. https://duoneural.com

Files

paper28_v2_FINAL.pdf

Files (1.1 MB)

Name Size Download all
md5:6a25ec8f3119e4026e4a2f1e6a719f44
1.1 MB Preview Download