Published March 20, 2026
| Version v1
Preprint
Open
Operator Dynamics in Transformer Residual Streams: A Unified Framework for Interpretability, Adversarial Detection, Causal Control, and Topological Model Fingerprinting
Description
We present a unified framework for transformer interpretability and safety
grounded in the geometry of residual stream operators — inter-layer differ-
ences ∆l = hl+1 − hl that directly capture what each layer contributes to
the forward pass. We make five empirical contributions validated across four
models spanning three architectural families and a 80× parameter range
(GPT-2 117M through Qwen3.5-9B).
Files
Topological_interpretability (2).pdf
Files
(335.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:7b438c3022cc46ed3c1ba59f08c3ad80
|
335.5 kB | Preview Download |