Published February 5, 2026 | Version 1.0
Publication Open

Fractal–Hyperbolic Degeneracy in Overparameterized Learning Manifolds v1.0

Description

Why do modern overparameterized neural networks train so efficiently despite massive degeneracy, wide flat minima, and exponentially many redundant paths?
Why does training consistently collapse to a low‑intrinsic‑dimension core, display hyperbolic curvature signatures, exhibit fractal roughness in loss boundaries, and undergo sharp phase transitions such as grokking?

These features are well‑documented but theoretically fragmented. Random‑matrix explanations clarify low rank; tangent‑kernel limits describe early training; hyperbolic embeddings explain hierarchy; and mode connectivity explains flat valleys. But none explains why they all co‑occur, or why overparameterization seems to help optimization rather than hinder it.

This paper introduces three minimal primitives that unify these observations into a single geometric account:

  1. Gradient erosion — training subtractively carves away low‑friction (near‑null) directions, leaving a resistant, low‑dimensional core.
  2. Fisher metric as parametric friction — curvature defines a direction‑wise friction field ϕ(θ; u) = uᵀF(θ)u that shapes flow after erosion.
  3. Overparameterization as degeneracy amplifier — extra parameters enlarge the negative space available for erosion, accelerating convergence.

These primitives causally explain:

  • Fractal roughness → multiscale carving of redundant structure
  • Hyperbolic curvature → exponential rarity of high‑friction directions
  • Low intrinsic dimension → collapse onto the resistant core
  • Low‑rank Fisher spectrum → friction‑resolved degeneracy
  • Flat minima → symmetry as the fixed point of negative‑space collapse
  • Grokking‑like transitions → friction‑field phase changes at negative closure

From this geometry, a three‑phase optimization protocol emerges (Acquisition → Re‑Ask → Execution), triggered by intrinsic‑dimension stall and Fisher‑rank concentration. These geometric signals adapt learning‑rate and metric updates to evolving curvature, reducing steps to matched accuracy by 15–35% in a quadratic toy model and a small Vision Transformer on CIFAR‑10, under equal compute.

This work is foundational and reductionist. It offers no new architectures or benchmarks; instead, it provides precise, domain‑general objects for understanding and engineering overparameterized training flow. It exposes deep structural kinship between erosion, degeneracy, curvature, and symmetry—and presents an actionable systems‑level lens for future optimization design.

Files

Fractal Hyperbolic Degeneracy in Overparameterized Learning Manifolds v1.0.pdf

Additional details

Additional titles

Subtitle (English)
A Reductionist Framework for Manifold Geometry, Gradient Erosion, and Phased Optimization Dynamics