Published February 6, 2026 | Version 1.0
Publication Open

Friction-Guided Optimization: Negative Tomography in Overparameterized Learning

Description

Why do overparameterized networks converge efficiently despite massive redundancy, flat basins, low intrinsic dimension, low‑rank curvature spectra, fractal roughness, hyperbolic traits, and abrupt phase transitions like grokking?
All of these empirical behaviors appear fragmented in current theory. Random matrix theory explains low rank; tangent-kernel limits explain early training; mode connectivity explains flat basins; hyperbolic embeddings explain hierarchical curvature; intrinsic‑dimension collapse explains grokking—but no minimal set of primitives explains why they all co‑occur or how training actually organizes itself in high‑dimensional degeneracy.

This working paper reframes training as erosive negative tomography: an architecture‑level perspective in which gradient erosion collapses low‑friction nullspace, the empirical Fisher information acts as a parametric friction field ϕ(θ; u) = uᵀF(θ)u, and overparameterization serves as a degeneracy amplifier that creates vast corridors of near‑equivalence. Early optimization does not “descend” so much as carve: removing degrees of freedom that fail to affect predictions. As low-friction directions are eroded, friction anisotropy sharpens, creating apparent “gravity” and forcing flow onto a resistant, low‑dimensional core. Flat minima emerge as the symmetric fixed point of this negative‑space collapse.

This view unifies major empirical observations:

  • Flat minima → symmetry under saturated negatives
  • Low intrinsic dimension → erosion of nullspace into a minimal core
  • Low‑rank Fisher → data‑constrained directions survive erosion
  • Fractal roughness → multiscale boundary carving
  • Hyperbolic curvature → rarity of high‑friction directions
  • Grokking → friction‑field phase transitions at negative closure

From these primitives, a three‑phase protocol (Friction‑Guided Optimization, FGO) emerges:

  1. Acquisition (High‑LR scour) – rapid carving of low‑friction subspace.
  2. Re‑Ask (Fisher‑aware refinement) – updates aligned to anisotropic curvature.
  3. Execution (Low‑LR polish) – symmetry‑aligned convergence in carved valleys.

Phase transitions are triggered by intrinsic‑dimension stall and Fisher-rank concentration, replacing calendar schedules with geometry‑aware control laws. In toy quadratics and a small transformer, FGO reduces steps to matched accuracy by 15–35% under equal compute.

This paper extends Negative Tomography (DOI: 10.5281/zenodo.18510535) into ML, treating training not as construction but as structured removal. The aim is not to propose a new optimizer, but to give systems designers a minimal, domain‑general vocabulary for reasoning about friction fields, phase transitions, and degeneracy‑driven dynamics in the scaling regime.

Files

Friction-Guided-Optimization-v1.0.pdf

Files (776.5 kB)

Name Size Download all
md5:bbda77554f26f2338afc5f4d46abd62e
776.5 kB Preview Download

Additional details

Additional titles

Subtitle (English)
A Systems-Theoretic Framework for Gradient Erosion as Negative Tomography, Fisher-Defined Parametric Friction Fields, Degeneracy Amplification, and Friction-Guided Phase Transitions in Overparameterized Optimization Manifolds