Who Needs Attention? Spiking Language Modeling via Synaptogenic Adaptive Processing Units

Galiza Cerdeira Gonzalez, Antonio; Świderski, Mateusz; Indurkhya, Bipin

doi:10.5281/zenodo.18778858

Published March 1, 2026 | Version 1.0.0

Preprint Open

Who Needs Attention? Spiking Language Modeling via Synaptogenic Adaptive Processing Units

1. Jagiellonian University
2. Jan Matejko Academy Of Fine Arts

Who Needs Attention? Spiking Language Modeling via Synaptogenic Adaptive Processing Units

A spiking neural network generates coherent multi-turn conversation from pure next-token prediction, without attention, without RLHF, and without filtering — running on a $290 used GPU.

We introduce the Synaptogenic Adaptive Processing Unit Language Model (SAPU-LM), a multi-timescale spiking reservoir architecture that replaces attention entirely with trained recurrent dynamics in leaky integrate-and-fire neurons. The chatbot "Nemo" emerges from freezing the learned spiking topology and retraining only 8.5% of parameters on conversational data, achieving 38.05 test perplexity on DailyDialog.

The architecture spans a lineage from a frozen Echo State Network (~19,500 perplexity) to 84.15 perplexity (M-SAPU-LM) on a WikiText-103 10M-token subsample — an ~80× improvement from training reservoir weights via surrogate gradients. A Tiling Parallel SAPU (TPSAPU) shares a single 512×512 recurrent weight matrix across three timescales and recovers to 84.67 perplexity after L1 pruning, suggesting that membrane time constant τ alone creates functional differentiation. Ternary quantization compresses the learned recurrent core to ~45 KB at 93.6% sparsity.

L1 pruning reveals timescale-dependent topology emergence: fast reservoirs maintain distributed connectivity while slow reservoirs self-organize into diagonal self-excitatory memory cells — a structure discovered by the network, not imposed by design. The trained ternary spiking core maps directly to analog resistor-capacitor-comparator circuits; a proof-of-concept hardware exporter has been developed.

To our knowledge, this is the first demonstration of open-ended next-token prediction using a trained spiking reservoir with no attention mechanism. Code and checkpoints: https://gitlab.com/AntonioGCGonzalez/synaptogenic-adaptive-processing-unit-language-models

This is a preliminary technical report. Several configurations are ongoing; results will be updated in subsequent revisions.

Files

WhoNeedsAttention.pdf

Files (1.7 MB)

Name	Size	Download all
WhoNeedsAttention.pdf md5:89d3cf1348284e217163df4a3aa89f7d	1.7 MB	Preview Download

Additional details

Submitted: 2023-03-01

Submited pre-print to guarantee priority rights

Repository URL: https://gitlab.com/AntonioGCGonzalez/synaptogenic-adaptive-processing-unit-language-models
Programming language: Python , HTML , JSON
Development Status: Active

	All versions	This version
Views	196	196
Downloads	164	164
Data volume	468.9 MB	468.9 MB

WhoNeedsAttention.pdf

Files (1.7 MB)

Dates

Software

Who Needs Attention? Spiking Language Modeling via Synaptogenic Adaptive Processing Units

Authors/Creators

Description

Files

WhoNeedsAttention.pdf

Files (1.7 MB)

Additional details

Dates

Software