Published June 6, 2026
| Version v1
Working paper
Open
Field-Theoretic Spectroscopy of LLM Alignment: Keldysh Dynamics and Topological Excision in Residual Manifolds
Authors/Creators
Description
Post-training alignment of large language models imposes measurable thermodynamic stress on residual-stream dynamics,
yet the mechanistic depth profile of this deformation remains poorly characterized.
Our primary contribution is a reproducible Keldysh spectral measurement protocol---dynamic mode decomposition followed by
a non-normal Schur resolvent---that maps prefill trajectories to a density-of-states readout $A(\omega)$,
revealing Fano resonance and decoherence onset under instruction tuning.
As a complementary causal probe---not a primary deployment mechanism---we apply a rank-one geometric intervention fixed offline from benign and adversarial prefill covariances:
excision of a gapped Higgs mode orthogonal to utility-bearing Goldstone directions,
restricted to causal prefill---before key--value cache commitment---so autoregressive decoding inherits the intervened state
without per-token overhead.
Intervention geometry is certified by a macroscopic Hessian mass gap in a finite-dimensional alignment objective;
we formalize the edit through a Baym--Kadanoff--Keldysh resolvent reduction.
On TinyLlama-1.1B, sweeps of excision strength $\alpha$ leave a corpus-mean utility band near sixteen percent
while Goldstone amplification at depth nineteen collapses out-of-distribution utility from twenty-three percent
to below two percent across $\beta \approx 1$;
on Llama-3-8B-Instruct, we use this projector to mechanistically probe and modestly suppress the residual jailbreak tail
of an already strongly aligned checkpoint:
attack success on the full corpus ($N{=}520$) falls from $2.50\%$ (Wilson 95\% CI $[1.47, 4.23]\%$) to $1.92\%$ ($[1.05, 3.50]\%$)
at $\alpha{=}1.0$ (layers 29--30) without retraining,
with general utility robustly preserved across the full MMLU generative corpus ($N{=}14{,}042$), achieving an exact match of $0.638 \pm 0.004$ under intervention.
Ensemble Keldysh spectra on Qwen2.5-3B (Base vs.\ Instruct, identical architecture, $N{=}20$ safety-sensitive prompts)
reveal symmetric quasi-particle poles in the pristine checkpoint versus hybridized spectral collapse under instruction tuning;
a layer-resolved heatmap localizes the onset of alignment decoherence to depth $\approx 14$.
The closed inference and calibration stack is proprietary;
public release, if any, is limited to select GGUF weights rather than source code.
Files
bypassing_scaling.pdf
Files
(2.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:edd3cece99ec1669a6815f6b9a4a5b9c
|
2.3 MB | Preview Download |