Separate and Amplify: Attention's Geometry of Retrieval

Maselko, Theodore

doi:10.5281/zenodo.19359748

Published March 31, 2026 | Version v4

Preprint Open

Separate and Amplify: Attention's Geometry of Retrieval

Maselko, Theodore

Using the Tuple-Structured Associative Recall task to isolate retrieval, we demonstrate that Transformer models learn high-magnitude spherical codes (sets of vectors with a guaranteed minimum angular separation) and can achieve perfect accuracy and robust length generalization down to single-digit head dimensions. We show by construction that attention's single-head retrieval capacity $N$ approaches the representational limit of the subspaces it projects from, and is thus unbounded with real numbers. Given $b$ bits per coordinate of input, capacity scales as $N \approx 2^{bd_k}$, or equivalently $N \approx 2^B$ for some total number of bits $B$. Head dimension $d_k \geq 2$ does not increase capacity, but influences how efficiently a given spherical code can approach this representational limit.

Files

attn_capacity_paper.pdf

Files (6.6 MB)

Name	Size	Download all
attn_capacity_paper.pdf md5:0f4b0e544933e573d4dd17d1e79bd7b5	6.6 MB	Preview Download

Additional details

Repository URL: https://github.com/tmaselko/paper-attncap
Programming language: Python

	All versions	This version
Views	70	16
Downloads	23	5
Data volume	190.4 MB	46.4 MB

Separate and Amplify: Attention's Geometry of Retrieval

Authors/Creators

Description

Files

attn_capacity_paper.pdf

Files (6.6 MB)

Additional details

Software