K-Operators: A Linear-Time Sequence Mixer with Learned Decayed Positional Kernels
Description
We introduce K-Operators, a sequence modeling architecture designed for linear-time
execution, combining learned exponential decay with learnable positional kernels. The core K2
layer decomposes sequence mixing into two complementary paths: (1) a low-rank gamma-decayed
recurrent interaction with per-channel learned decay rates spanning short to long memory, and
(2) a learnable causal base kernel Kbase providing asymmetric local correction that exponential
decay alone cannot express.
Systematic ablation across tokenization granularities reveals that removing either component
degrades performance even under equal parameter budgets: on WikiText-2 (subword), the full
architecture achieves 19.99 ± 0.09 PPL at 4.08M parameters (5-seed sweep) vs. 20.99 PPL
for an equal-capacity model without Kbase; on Tiny Shakespeare (character-level), 4.41 ± 0.01
PPL at 0.81M parameters (5-seed sweep) vs. 4.78 PPL without Kbase—within 0.06 PPL of
a 10.65M parameter Transformer baseline. The optimal contribution of Kbase scales inversely
with token granularity—∼4% for character-level, ∼0.5% for subword—but is never zero. This
ratio is discovered automatically via gradient descent with a sigmoid floor that acts as implicit
architectural regularization.
Uncapping the gamma decay range from [0.85, 0.995] to [0.15, 0.995] yields substantial gains:
the model learns to use the full spectrum, with some channels selecting γ ≈ 0.15 (2-token effective
window) while others maintain γ > 0.99 (100+ token memory). The architecture does not require
explicit positional encodings; positional information is instead captured implicitly through the
learned causal kernel structure.
We also describe an iterative equilibrium refinement loop with learned step-size η. While
mathematically motivated, ablation shows refinement consistently hurts performance in our
experiments; we document it for completeness and future investigation.
Files
1hg260.png
Files
(2.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:7f56683965e5e4c43a16b07e95e2d503
|
218.4 kB | Preview Download |
|
md5:17cafc80886ab54aa2de0cc08ec47b97
|
166.1 kB | Preview Download |
|
md5:ca5b892ea1f054e0afa227f9df288cca
|
35.1 kB | Download |
|
md5:eddfa111acd20ca07cd58cefa11f40cd
|
1.1 MB | Preview Download |
|
md5:2aa4547684562a19776922ef2a82cd0a
|
176.3 kB | Preview Download |
|
md5:69a46d2d5a453f3d7e794e33eded17f8
|
366.4 kB | Preview Download |
Additional details
Related works
- Is new version of
- Preprint: 10.5281/zenodo.19004569 (DOI)
- Is supplemented by
- https://github.com/AileenKoneko/K-language-model (URL)
Software
- Repository URL
- https://github.com/AileenKoneko/K-language-model
- Programming language
- Python
- Development Status
- Active