Self-Supervised Learning as Constrained Free-Energy Systems
Authors/Creators
Description
This paper proposes that self-supervised learning methods are physical systems minimizing free energy under representational constraints, explaining why diverse approaches (VICReg, DINO, SimCLR, BYOL, Barlow Twins, JEPA) converge on similar hyperparameter ranges despite different theoretical motivations. The framework decomposes total free-energy deviation as F[q_t] - F* = κ + CD(t), where κ represents irreducible structural costs from architectural constraints and CD(t) measures dynamic misalignment. Training succeeds when gradient flow reduces CD(t) faster than constraints inflate κ. Organizational overhead η—the fraction of capacity consumed by coherence maintenance—must remain below a critical threshold η_c for stable representations. Documented empirical phenomena receive unified interpretation: variance-collapse universality (all SSL methods fail when embedding variance approaches zero); momentum convergence (BYOL, DINO, MoCo independently discover m ≈ 0.996, creating timescale τ = 1/(1-m) ≈ 250 steps matching characteristic relaxation times); batch size scaling (SimCLR requires ~4096 samples for manifold percolation); and depth thresholds (transformers exhibit emergent capabilities around 10-12 layers). These narrow ranges suggest underlying constraint boundaries. Each method implements the same physics through different mechanisms: VICReg's variance/covariance terms maintain dimensional spread; DINO's momentum creates timescale separation for stable reference tracking; SimCLR's negative samples ensure manifold coverage; BYOL's predictor breaks symmetry; Barlow Twins' decorrelation reduces redundancy; JEPA's prediction horizon enables recursive temporal coherence. All keep organizational overhead subcritical. The paper connects to the constraint eigenvalue framework's triplet architecture, proposing that SSL systems realize some eigenbranch configuration. Physical and biological systems typically follow the decagonal eigenbranch (π, φ, 10), but whether SSL matches this or discovers architecture-specific values remains an open question the framework helps sharpen. The convergence of independent research groups on similar thresholds suggests they discovered the same underlying constraint geometry through different optimization paths.
Files
self-supervised-learning-as-constrained-free-energy-systems.pdf
Files
(116.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:e0f9e345fb5152ab8a71f1cda37c0983
|
116.7 kB | Preview Download |
Additional details
Related works
- Is identical to
- Other: https://scienceandmathematics.com/self-supervised-learning-as-constrained-free-energy-systems/ (URL)
Dates
- Available
-
2025-11-17First published on scienceandmathematics.com
- Updated
-
2026-01-18Clarifies that SSL systems may realize eigenbranch configurations distinct from the physical decagonal branch.