Published May 29, 2026 | Version v3
Preprint Open

Attention Is All We Had — But Not What We Needed: Convergent State Machine for Iterative Energy-Based Language Generation

  • 1. VKD Industries Private Limited, KAEL Division — Autonomous Reasoning Infrastructure, Lucknow, India

Description

We introduce the Convergent State Machine (CSM), a novel 
architecture for language generation that replaces attention 
entirely with energy-based iterative state refinement.

Three models trained (66M, 150M, 331M), all with zero 
attention layers. CSM 150M matches GPT-2 1.5B on MMLU 
within 0.4%, using 10x fewer parameters and 13x less data.

Key finding: hard problems show 60% perplexity improvement 
with more iterations while easy problems degrade 60%. 
The model reasons deeper on harder problems — not repeated 
computation, but genuine difficulty-dependent reasoning.

Iteration scaling confirmed across three model sizes: 
66M converges at iter 15, 150M at iter 30, 331M at iter 40.

Total training compute: under $100 of A100 GPU time.

Files

CSM_Paper_2.pdf

Files (34.6 kB)

Name Size Download all
md5:c654b234e86fe656cd7e69317c7a32bf
34.6 kB Preview Download