There is a newer version of the record available.

Published January 18, 2026 | Version v2
Other Open

Self-Supervised Learning as Constrained Free-Energy Systems

Authors/Creators

Description

This paper proposes that self-supervised learning methods are physical systems minimizing free energy under representational constraints, explaining why diverse approaches (VICReg, DINO, SimCLR, BYOL, Barlow Twins, JEPA) converge on similar hyperparameter ranges despite different theoretical motivations. The framework decomposes total free-energy deviation as F[q_t] - F* = κ + CD(t), where κ represents irreducible structural costs from architectural constraints and CD(t) measures dynamic misalignment. Training succeeds when gradient flow reduces CD(t) faster than constraints inflate κ. Organizational overhead η—the fraction of capacity consumed by coherence maintenance—must remain below a critical threshold η_c for stable representations. Documented empirical phenomena receive unified interpretation: variance-collapse universality (all SSL methods fail when embedding variance approaches zero); momentum convergence (BYOL, DINO, MoCo independently discover m ≈ 0.996, creating timescale τ = 1/(1-m) ≈ 250 steps matching characteristic relaxation times); batch size scaling (SimCLR requires ~4096 samples for manifold percolation); and depth thresholds (transformers exhibit emergent capabilities around 10-12 layers). These narrow ranges suggest underlying constraint boundaries. Each method implements the same physics through different mechanisms: VICReg's variance/covariance terms maintain dimensional spread; DINO's momentum creates timescale separation for stable reference tracking; SimCLR's negative samples ensure manifold coverage; BYOL's predictor breaks symmetry; Barlow Twins' decorrelation reduces redundancy; JEPA's prediction horizon enables recursive temporal coherence. All keep organizational overhead subcritical. The paper connects to the constraint eigenvalue framework's triplet architecture, proposing that SSL systems realize some eigenbranch configuration. Physical and biological systems typically follow the decagonal eigenbranch (π, φ, 10), but whether SSL matches this or discovers architecture-specific values remains an open question the framework helps sharpen. The convergence of independent research groups on similar thresholds suggests they discovered the same underlying constraint geometry through different optimization paths.

Files

self-supervised-learning-as-constrained-free-energy-systems.pdf

Files (116.7 kB)

Additional details

Dates

Available
2025-11-17
First published on scienceandmathematics.com
Updated
2026-01-18
Clarifies that SSL systems may realize eigenbranch configurations distinct from the physical decagonal branch.