Published February 7, 2026
| Version v1.0.0
Software
Open
Routed Attention: Learning When to Think Hard
Authors/Creators
Description
Routed attention learns to dynamically select between O(N) causal convolution and O(N²) softmax attention on a per-position basis. A lightweight router network examines each position and routes it to the appropriate computational pathway. Using curriculum learning (first train with no attention penalty, then gradually increase it), routed attention achieves 100% accuracy with only 0.3% attention usage at distance 126 (99.7% compute savings), and 100% accuracy with 25% attention usage at distance 510 (75% compute savings).
Notes
Files
MikeyBeez/DifferentialLR-v1.0.0.zip
Files
(362.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:b57db5fc585797f023132771eba0c98a
|
362.4 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/MikeyBeez/DifferentialLR/tree/v1.0.0 (URL)
Software
- Repository URL
- https://github.com/MikeyBeez/DifferentialLR