Attention Is All We Had — But Not What We Needed: Convergent State Machine for Iterative Energy-Based Language Generation

Dwivedi, Arunesh

doi:10.5281/zenodo.20404405

There is a newer version of the record available.

Published May 27, 2026 | Version v1

Preprint Open

Attention Is All We Had — But Not What We Needed: Convergent State Machine for Iterative Energy-Based Language Generation

Dwivedi, Arunesh (Researcher)¹

1. VKD Industries Private Limited, KAEL Division — Autonomous Reasoning Infrastructure, Lucknow, India

We introduce the Convergent State Machine (CSM), a novel
architecture for language generation that replaces attention
entirely with energy-based iterative state refinement.

A 150M parameter CSM with zero attention layers matches
GPT-2 1.5B on MMLU within 0.3%, using 10x fewer parameters
and 13x less training data. State convergence dynamics scale
with model size: 66M converges by iteration 15, 150M sustains
refinement to iteration 30+.

Total training compute: under $50 of A100 GPU time.

Keywords: attention-free, iterative reasoning, energy-based,
state machine, language model, scaling law

Files

CSM_Paper.pdf

Files (30.8 kB)

Name	Size	Download all
CSM_Paper.pdf md5:d54c425d8e05bddfc0f2608f233075a7	30.8 kB	Preview Download

111

Views

Downloads

Show more details

	All versions	This version
Views	111	96
Downloads	78	68
Data volume	6.4 MB	5.2 MB

More info on how stats are collected....

DOI

Resource type

Preprint

Publisher

Zenodo

Languages

English

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: May 27, 2026
Modified: May 27, 2026

Attention Is All We Had — But Not What We Needed: Convergent State Machine for Iterative Energy-Based Language Generation

Authors/Creators

Description

Files

CSM_Paper.pdf

Files (30.8 kB)