Published February 7, 2026 | Version v1.0.0
Software Open

Routed Attention: Learning When to Think Hard

Authors/Creators

Description

Routed attention learns to dynamically select between O(N) causal convolution and O(N²) softmax attention on a per-position basis. A lightweight router network examines each position and routes it to the appropriate computational pathway. Using curriculum learning (first train with no attention penalty, then gradually increase it), routed attention achieves 100% accuracy with only 0.3% attention usage at distance 126 (99.7% compute savings), and 100% accuracy with 25% attention usage at distance 510 (75% compute savings).

Notes

If you use this software, please cite it as below.

Files

MikeyBeez/DifferentialLR-v1.0.0.zip

Files (362.4 kB)

Name Size Download all
md5:b57db5fc585797f023132771eba0c98a
362.4 kB Preview Download

Additional details

Related works