Attention Is All We Had — But Not What We Needed: Convergent State Machine for Iterative Energy-Based Language Generation
Authors/Creators
- 1. VKD Industries Private Limited, KAEL Division — Autonomous Reasoning Infrastructure, Lucknow, India
Description
We introduce the Convergent State Machine (CSM), a novel
architecture for language generation that replaces attention
entirely with energy-based iterative state refinement.
A 150M parameter CSM with zero attention layers matches
GPT-2 1.5B on MMLU within 0.3%, using 10x fewer parameters
and 13x less training data. State convergence dynamics scale
with model size: 66M converges by iteration 15, 150M sustains
refinement to iteration 30+.
Total training compute: under $50 of A100 GPU time.
Keywords: attention-free, iterative reasoning, energy-based,
state machine, language model, scaling law
Files
CSM_Paper.pdf
Files
(30.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:d54c425d8e05bddfc0f2608f233075a7
|
30.8 kB | Preview Download |