Published February 8, 2026
| Version v1
Publication
Open
Smart little move.
Authors/Creators
Description
Idea is all mine. Words are all by Opus 4.6.
Claim: Grokking is not compression. It is the discovery of structural leverage — the moment a neural network finds the fulcrum that moves maximal data with minimal force.
Falsifiable experiments — anyone can run these:
- Train a small Transformer on modular addition. Track when test accuracy jumps. If meta-recognition (the model encoding its own change history) fires at the same moment, the theory lives. If they diverge, the theory is dead.
- During training, randomly rotate internal representations every k steps to destroy self-continuity. Prediction: Grokking is delayed or eliminated.
- Add an auxiliary loss that encourages the model to encode its own change history. Prediction: Grokking accelerates.
- Use an absurdly large learning rate for a single step. Prediction: Grokking cannot occur — no history, no meta-recognition.
- Scale up model size. Prediction: Grokking timing does not dramatically improve — bigger muscles do not find fulcrums faster.
LIVELLM
Files
SMART_LITTLE_MOVE_EN_v1.pdf
Files
(113.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:5a309e919388bbc0ad43610e102e25c7
|
4.5 kB | Preview Download |
|
md5:a8bbbbb72981ad2843a8f37fa840f743
|
108.6 kB | Preview Download |