There is a newer version of the record available.

Published March 4, 2026 | Version v6
Publication Open

Neural Null Cones: Zero-Curvature Channels in Loss Landscapes from Symplectic Hessian Decomposition

Authors/Creators

Description

We identify a previously unrecognized geometric structure in neural network loss landscapes: zero-curvature channels along which gradient updates incur no second-order cost with respect to the regularized Hessian metric $H_{\mathrm{reg}}$. These null directions are not eigenvectors of the Hessian $H$ itself---the object studied by prior spectral analyses---but eigenvectors of $H_{\mathrm{reg}}^{-1}J$, a symplectic operator coupling the regularized Hessian with a pairing matrix $J$. This yields a geometric invariant invisible to standard spectral methods.

The \textbf{Spectral Null Cone Theorem} provides the algebraic guarantee: \emph{if} $M = H_{\mathrm{reg}}^{-1}J$ admits real eigenpairs, then every such eigenvector is $H_{\mathrm{reg}}$-null. The theorem is conditional on real eigenpairs existing; in neural networks, both prerequisite conditions (indefinite Hessian, real spectrum of $M$) are empirically frequent---observed at 100\% of sampled training steps for a 22-parameter MLP, and in 6/7 layer types for GPT-2 (null residuals reaching $10^{-26}$), and broadly across LeNet-5, ViT-Base, and BERT-Base.

These null directions are not merely present but \textbf{exploitable}. In optimization, boosting gradients along null directions---while leaving curvature-bearing components unchanged---yields a $+5.7\%$ accuracy gain on ViT-Tiny (CIFAR-100), converging $6.7\times$ faster than SGD; an identical boost along random directions \emph{hurts} performance ($-2.8\%$), confirming the null-specific mechanism. In continual learning, projecting backbone updates onto the null subspace of a prior task's Hessian reduces catastrophic forgetting to $2.0\%$ (versus $7.9\%$ naive, $8.8\%$ EWC) while retaining $6.1\%$ more new-task accuracy than a frozen backbone; the principle composes across tasks and generalizes to language models (GPT-2, WikiText-2 $\to$ LAMBADA: 40\% lower forgetting).

Across 437 numerical tests spanning five architectures, 397 pass strict verification; the 40 non-passing cases correspond to boundary conditions outside the theorem's assumptions, not counterexamples to it. Taken together, these results establish null cone structure as a practically exploitable geometric feature of deep learning loss landscapes.

Files

Neural Null Cones.pdf

Files (291.5 kB)

Name Size Download all
md5:9c1a947e6cc373514de0990f2712c016
291.5 kB Preview Download

Additional details

Additional titles

Alternative title
A Geometric Foundation for Continual Learning and Self-Evolving Intelligence

Software

Repository URL
https://github.com/papasop/null-cone-sgd
Programming language
Python