There is a newer version of the record available.

Published March 13, 2026 | Version 0.1
Preprint Open

Kernels Might Be What You Need: Efficient Sequence Modeling with K-Operators

  • 1. Independent Researcher

Description

We introduce K-Operators, a kernel-decomposed sequence modeling architecture that

replaces attention entirely with structured causal kernel operators. On Tiny Shakespeare

character-level modeling, a 1.14M-parameter K-Operators model achieves 4.43 ±0.05 validation

perplexity across 7 seeds—approaching the 4.35 PPL of a 10.65M-parameter Transformer baseline

(nanoGPT) while using 9.3×fewer parameters and requiring no positional encodings.

The architecture decomposes sequence mixing into a hierarchy of operators: K1 layers for

position-wise feature mixing, K2 layers for causal sequence interaction via a learned base kernel

combined with low-rank gamma-decayed recurrence, and a K0 layer for final rescaling. These are

composed into a K-Stack backbone (K1 →K(×N )

2 →K1 →K0) and refined through a learned

iterative equilibrium loop governed by a scalar step-size η. Two interchangeable gamma-decay

backends (mask and block) offer different memory/speed trade-offs. Diagnostic analysis reveals

interpretable learned dynamics: the model progressively transfers sequence mixing from the

initialized base kernel to the adaptive recurrent path, develops per-layer functional specialization,

and learns to self-regulate the refinement loop—including robustness to 10×learning rate

misspecification via automatic η suppression.

Files

k2_dynamics.pdf

Files (422.4 kB)

Name Size Download all
md5:b57abc5b849f3fdf3586371484b3b34b
30.8 kB Preview Download
md5:448cd1829847418bc23bae5f4ca56c33
350.9 kB Preview Download
md5:aea0663186e2540b58b7a7eb9f6071f8
40.6 kB Download

Additional details

Software

Repository URL
https://github.com/AileenKoneko/K-language-model
Programming language
Python
Development Status
Active