Routed Attention: Learning When to Think Hard

Bonsignore, Mike

doi:10.5281/zenodo.18518956

Published February 7, 2026 | Version v1.0.0

Software Open

Routed Attention: Learning When to Think Hard

Bonsignore, Mike

Routed attention learns to dynamically select between O(N) causal convolution and O(N²) softmax attention on a per-position basis. A lightweight router network examines each position and routes it to the appropriate computational pathway. Using curriculum learning (first train with no attention penalty, then gradually increase it), routed attention achieves 100% accuracy with only 0.3% attention usage at distance 126 (99.7% compute savings), and 100% accuracy with 25% attention usage at distance 510 (75% compute savings).

Notes

If you use this software, please cite it as below.

Files

MikeyBeez/DifferentialLR-v1.0.0.zip

Files (362.4 kB)

Name	Size	Download all
MikeyBeez/DifferentialLR-v1.0.0.zip md5:b57db5fc585797f023132771eba0c98a	362.4 kB	Preview Download

Additional details

Is supplement to: Software: https://github.com/MikeyBeez/DifferentialLR/tree/v1.0.0 (URL)

Repository URL: https://github.com/MikeyBeez/DifferentialLR

649

Views

Downloads

Show more details

	All versions	This version
Views	649	649
Downloads	27	27
Data volume	10.1 MB	10.1 MB

More info on how stats are collected....

DOI

Resource type

Software

Publisher

Zenodo

License: MIT License

A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code. Read more

Technical metadata

Created: February 7, 2026
Modified: February 7, 2026

Routed Attention: Learning When to Think Hard

Authors/Creators

Description

Notes

Files

MikeyBeez/DifferentialLR-v1.0.0.zip

Files (362.4 kB)

Additional details

Related works

Software