Boundaries of Stationary Feature Learning: A Minimax Barrier for Scaling Laws and the Necessity of Compositional Structure
Authors/Creators
Description
I do not derive the Chinchilla scaling law; I map the boundaries of the regime in which such a derivation could even be attempted. Working in the μP feature-learning setting on a Sobolev-on-manifold data model, I establish what the stationary limit of feature learning can and cannot do.
(i) A barrier. The classical Sobolev minimax lower bound makes β₀ = 2s/(2s+d*) an unconditional ceiling for any estimator from D samples; feature learning is a special case, so no stationary first-order method can exceed it.
(ii) Self-organised criticality. Treating the target's intrinsic smoothness as a free parameter t, the variational attractor realises source exponent r(ν) = t(ν+1)/(1+2t) relative to its own kernel; this is monotone in the capacity exponent ν, equals exactly r = 1/2 at ν = 1/(2t), and the barrier forbids the corresponding β > β₀ for over-aligned ν.
(iii) H1 and H2 as objects, not assumptions. I derive the capacity penalty Σλₖᵛ as the rich-regime implicit bias of a depth-L diagonal/deep-linear network, with ν = 1/L, and identify the load-bearing value ν = d*/(2s) as the depth–smoothness matching at which the attractor saturates the barrier. The correct approximation exponent is α = 2s/d* and α > β₀ holds unconditionally.
(iv) Where deviations come from. Any empirical β > β₀ must originate outside stationary Sobolev learning: from reduced effective dimension d_loc ≪ d (compositional/Besov data) or from transient non-stationary kernel alignment. This complements the dynamical feature-learning models of Bordelon, Atanasov and Pehlevan.
Files
chinchilla.pdf
Files
(650.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:b09e3350c7c31690a3a92ad8b05b3220
|
650.2 kB | Preview Download |