There is a newer version of the record available.

Published May 27, 2026 | Version v1
Preprint Open

Attention Is All We Had — But Not What We Needed: Convergent State Machine for Iterative Energy-Based Language Generation

  • 1. VKD Industries Private Limited, KAEL Division — Autonomous Reasoning Infrastructure, Lucknow, India

Description

We introduce the Convergent State Machine (CSM), a novel 
architecture for language generation that replaces attention 
entirely with energy-based iterative state refinement. 

A 150M parameter CSM with zero attention layers matches 
GPT-2 1.5B on MMLU within 0.3%, using 10x fewer parameters 
and 13x less training data. State convergence dynamics scale 
with model size: 66M converges by iteration 15, 150M sustains 
refinement to iteration 30+.

Total training compute: under $50 of A100 GPU time.

Keywords: attention-free, iterative reasoning, energy-based, 
state machine, language model, scaling law

Files

CSM_Paper.pdf

Files (30.8 kB)

Name Size Download all
md5:d54c425d8e05bddfc0f2608f233075a7
30.8 kB Preview Download