Attention Is All We Had — But Not What We Needed: Convergent State Machine for Iterative Energy-Based Language Generation

Dwivedi, Arunesh

doi:10.5281/zenodo.20442828

Published May 29, 2026 | Version v3

Preprint Open

Attention Is All We Had — But Not What We Needed: Convergent State Machine for Iterative Energy-Based Language Generation

Dwivedi, Arunesh (Researcher)¹

1. VKD Industries Private Limited, KAEL Division — Autonomous Reasoning Infrastructure, Lucknow, India

We introduce the Convergent State Machine (CSM), a novel
architecture for language generation that replaces attention
entirely with energy-based iterative state refinement.

Three models trained (66M, 150M, 331M), all with zero
attention layers. CSM 150M matches GPT-2 1.5B on MMLU
within 0.4%, using 10x fewer parameters and 13x less data.

Key finding: hard problems show 60% perplexity improvement
with more iterations while easy problems degrade 60%.
The model reasons deeper on harder problems — not repeated
computation, but genuine difficulty-dependent reasoning.

Iteration scaling confirmed across three model sizes:
66M converges at iter 15, 150M at iter 30, 331M at iter 40.

Total training compute: under $100 of A100 GPU time.

Files

CSM_Paper_2.pdf

Files (34.6 kB)

Name	Size	Download all
CSM_Paper_2.pdf md5:c654b234e86fe656cd7e69317c7a32bf	34.6 kB	Preview Download

192

Views

141

Downloads

Show more details

	All versions	This version
Views	192	56
Downloads	141	40
Data volume	9.3 MB	2.8 MB

More info on how stats are collected....

DOI

Resource type

Preprint

Publisher

Zenodo

Languages

English

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: May 29, 2026
Modified: May 29, 2026

Attention Is All We Had — But Not What We Needed: Convergent State Machine for Iterative Energy-Based Language Generation

Authors/Creators

Description

Files

CSM_Paper_2.pdf

Files (34.6 kB)