Published June 2, 2026 | Version v1
Preprint Open

Boundaries of Stationary Feature Learning: A Minimax Barrier for Scaling Laws and the Necessity of Compositional Structure

Authors/Creators

Description

I do not derive the Chinchilla scaling law; I map the boundaries of the regime in which such a derivation could even be attempted. Working in the μP feature-learning setting on a Sobolev-on-manifold data model, I establish what the stationary limit of feature learning can and cannot do.

(i) A barrier. The classical Sobolev minimax lower bound makes β₀ = 2s/(2s+d*) an unconditional ceiling for any estimator from D samples; feature learning is a special case, so no stationary first-order method can exceed it.

(ii) Self-organised criticality. Treating the target's intrinsic smoothness as a free parameter t, the variational attractor realises source exponent r(ν) = t(ν+1)/(1+2t) relative to its own kernel; this is monotone in the capacity exponent ν, equals exactly r = 1/2 at ν = 1/(2t), and the barrier forbids the corresponding β > β₀ for over-aligned ν.

(iii) H1 and H2 as objects, not assumptions. I derive the capacity penalty Σλₖᵛ as the rich-regime implicit bias of a depth-L diagonal/deep-linear network, with ν = 1/L, and identify the load-bearing value ν = d*/(2s) as the depth–smoothness matching at which the attractor saturates the barrier. The correct approximation exponent is α = 2s/d* and α > β₀ holds unconditionally.

(iv) Where deviations come from. Any empirical β > β₀ must originate outside stationary Sobolev learning: from reduced effective dimension d_loc ≪ d (compositional/Besov data) or from transient non-stationary kernel alignment. This complements the dynamical feature-learning models of Bordelon, Atanasov and Pehlevan.

Files

chinchilla.pdf

Files (650.2 kB)

Name Size Download all
md5:b09e3350c7c31690a3a92ad8b05b3220
650.2 kB Preview Download