Grokking Beyond Addition: Circuit-Level Analysis of Algebraic Learning in Transformers

Pal, Mani

doi:10.5281/zenodo.19256207

Published March 27, 2026 | Version v1.0

Preprint Open

Grokking Beyond Addition: Circuit-Level Analysis of Algebraic Learning in Transformers

Pal, Mani (Researcher)¹

1. Independent Researcher

This paper investigates the phenomenon of grokking in transformers across a broader class of algebraic structures beyond modular addition. Prior mechanistic interpretability work has shown that transformers trained on modular addition learn Fourier-based clock circuits and exhibit delayed generalisation (grokking).

We extend this analysis to eight algebraic operations spanning abelian groups, a composite ring, and non-abelian groups (S3, D5, A4, S4), using 1-layer transformers at d_model = 64.

Our key findings are:

1. A clear abelian vs non-abelian grokking boundary: all abelian operations achieve 100% test accuracy, while non-abelian groups fail to generalise despite perfect training accuracy.
2. Discrete-log re-indexing improves Fourier concentration for modular multiplication (2.14×), supporting the discrete logarithm representation hypothesis.
3. Non-abelian models exhibit partial circuit formation via Peter–Weyl decomposition even without grokking.
4. Cross-operation embedding similarity (CKA ≥ 0.80 across all pairs) suggests a shared representational substrate.
5. A capacity-dependent interpretation: abelian tasks rely on 1D irreducible representations, while non-abelian tasks require higher-dimensional irreps exceeding model capacity at d_model = 64.

All experiments are reproducible via provided code and checkpoint-resume pipelines, runnable on a free Colab T4 GPU (~3 hours).

This work contributes new empirical evidence toward understanding the role of algebraic structure and representation theory in neural network generalisation.

Code repository: https://github.com/justbytecode/grokking-beyond-addition

Files

Grokking_Beyond_Addition.pdf

Files (721.2 kB)

Name	Size	Download all
Grokking_Beyond_Addition.pdf md5:e2e1f1a5635f9f7c838e62f588fb054c	721.2 kB	Preview Download

Additional details

Repository URL: https://github.com/groot-code24/Grokking-Circuit-Level-Analysis-of-Algebraic-Learning-in-Transformers.git
Programming language: Python
Development Status: Active

	All versions	This version
Views	109	109
Downloads	87	87
Data volume	69.2 MB	69.2 MB

Grokking Beyond Addition: Circuit-Level Analysis of Algebraic Learning in Transformers

Authors/Creators

Description

Files

Grokking_Beyond_Addition.pdf

Files (721.2 kB)

Additional details

Software