Intelligence as Predictive Compression: Evidence from GPT-2 Analysis and Learned Concept Bottlenecks

Ghazouani, Ahmed

doi:10.5281/zenodo.18921833

Published March 9, 2026 | Version v2

Journal article Open

Intelligence as Predictive Compression: Evidence from GPT-2 Analysis and Learned Concept Bottlenecks

Ghazouani, Ahmed

We present a mathematical framework connecting intelligence to predictive compression through ε-machines (minimal sufficient statistics of the past for predicting the future) and demonstrate that modern transformer language models implicitly implement this compression. Through systematic reverse-engineering of GPT-2, we reveal a three-phase "V-shape" crystallization pattern: tokens compress into ~200 predictive equivalence classes by layer 2, undergo controlled semantic disambiguation in middle layers, and recrystallize into context-specific representations by layer 11. We validate this theory by training a learned discrete bottleneck model that routes tokens through 512 concepts using Gumbel-softmax, achieving 2.3× better validation loss (1.60 vs 3.30) and producing coherent text compared to static pre-clustered baselines that collapse during training. We further compare our architecture against standard models (char-RNN, small GPT, GPT-2 124M), showing that enforced compression achieves competitive performance with 19% fewer parameters and dramatically better interpretability. Our results suggest that intelligence emerges from compression into minimal predictive representations, with practical implications for reducing training costs through enforced discrete bottlenecks.

9 pages, 3 figures, 12 tables. Code available upon request.

Files

Intelligence_as_Predictive_Compression.pdf

Files (733.6 kB)

Name	Size	Download all
Intelligence_as_Predictive_Compression.pdf md5:491fbc0a1638f006c10d1950d214dce0	733.6 kB	Preview Download

	All versions	This version
Views	24	13
Downloads	8	6
Data volume	33.1 MB	14.9 MB

Intelligence as Predictive Compression: Evidence from GPT-2 Analysis and Learned Concept Bottlenecks

Authors/Creators

Description

Files

Intelligence_as_Predictive_Compression.pdf

Files (733.6 kB)