FELA: Fourier Encoder with Linear Attention for Efficient Long Context Language Modeling
Authors/Creators
Description
We introduce FELA (Fourier Encoder with Linear Attention), a hybrid sequence mixer replacing quadratic self-attention with an FNO mixer at O(N log N) cost for global frequency structure and a GLA mixer at O(N) cost for bounded recurrent state. Motivated by a Schrödinger vs. Heisenberg analogy from quantum mechanics, FELA evolves a compact spectral state rather than materializing the full N×N interaction matrix.
The 1.13B model, trained for 22B tokens (Chinchilla optimal), achieves 1.49 BPB on WikiText-103 at 32K context, improving by 0.29 BPB (16.5%) as context grows from 256 to 32,768 tokens. Empirical measurements confirm sublinear VRAM scaling: the 1.13B model uses 4.5 GB at 16K tokens while standard, unoptimized SDPA would require over 1,200 GB (extrapolated). FELA is 2.18x faster than standard SDPA at 16K tokens; SDPA OOMs at 32K while FELA runs to 131K. The 1.13B model matches GPT-2 XL on BoolQ (61.4% vs. 61.3%) using 2.4x fewer training FLOPs.
Interpretability analysis reveals that FELA spontaneously develops a coarse to fine frequency hierarchy: early FNO layers learn low frequency global filters while later layers specialize to high frequency local patterns. GLA gate values exhibit a depth dependent memory gradient, early layers forget aggressively (ḡ=0.65-0.72) while deep layers retain near full context (ḡ=0.91). Logit lens analysis on a 109M parameter proxy model shows predictions commit by layer 6 of 12. Experiments demonstrate that FELA trains directly on raw UTF-8 bytes (vocabulary size 261, no tokenizer) at 16,384 token context, a regime inaccessible to standard O(N2) attention.
Files
FELA-ACML-FINAL_SPRINGER.pdf
Files
(1.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:12b4ab31f6385f463948b122c62ad5e1
|
1.4 MB | Preview Download |
Additional details
Dates
- Created
-
2026
Software
- Repository URL
- https://huggingface.co/itstheraj/fela-acml2026
- Programming language
- Python
- Development Status
- Active