Published March 20, 2026 | Version v2
Preprint Open

K-Operators: A Linear-Time Sequence Mixer with Learned Decayed Positional Kernels

  • 1. Independent Researcher

Description

We introduce K-Operators, a sequence modeling architecture designed for linear-time
execution, combining learned exponential decay with learnable positional kernels. The core K2
layer decomposes sequence mixing into two complementary paths: (1) a low-rank gamma-decayed
recurrent interaction with per-channel learned decay rates spanning short to long memory, and
(2) a learnable causal base kernel Kbase providing asymmetric local correction that exponential
decay alone cannot express.
Systematic ablation across tokenization granularities reveals that removing either component
degrades performance even under equal parameter budgets: on WikiText-2 (subword), the full
architecture achieves 19.99 ± 0.09 PPL at 4.08M parameters (5-seed sweep) vs. 20.99 PPL
for an equal-capacity model without Kbase; on Tiny Shakespeare (character-level), 4.41 ± 0.01
PPL at 0.81M parameters (5-seed sweep) vs. 4.78 PPL without Kbase—within 0.06 PPL of
a 10.65M parameter Transformer baseline. The optimal contribution of Kbase scales inversely
with token granularity—∼4% for character-level, ∼0.5% for subword—but is never zero. This
ratio is discovered automatically via gradient descent with a sigmoid floor that acts as implicit
architectural regularization.
Uncapping the gamma decay range from [0.85, 0.995] to [0.15, 0.995] yields substantial gains:
the model learns to use the full spectrum, with some channels selecting γ ≈ 0.15 (2-token effective
window) while others maintain γ > 0.99 (100+ token memory). The architecture does not require
explicit positional encodings; positional information is instead captured implicitly through the
learned causal kernel structure.
We also describe an iterative equilibrium refinement loop with learned step-size η. While
mathematically motivated, ablation shows refinement consistently hurts performance in our
experiments; we document it for completeness and future investigation.

Files

1hg260.png

Files (2.1 MB)

Name Size Download all
md5:7f56683965e5e4c43a16b07e95e2d503
218.4 kB Preview Download
md5:17cafc80886ab54aa2de0cc08ec47b97
166.1 kB Preview Download
md5:ca5b892ea1f054e0afa227f9df288cca
35.1 kB Download
md5:eddfa111acd20ca07cd58cefa11f40cd
1.1 MB Preview Download
md5:2aa4547684562a19776922ef2a82cd0a
176.3 kB Preview Download
md5:69a46d2d5a453f3d7e794e33eded17f8
366.4 kB Preview Download

Additional details

Related works

Is new version of
Preprint: 10.5281/zenodo.19004569 (DOI)
Is supplemented by
https://github.com/AileenKoneko/K-language-model (URL)

Software

Repository URL
https://github.com/AileenKoneko/K-language-model
Programming language
Python
Development Status
Active