Published March 20, 2026 | Version v1
Preprint Open

Operator Dynamics in Transformer Residual Streams: A Unified Framework for Interpretability, Adversarial Detection, Causal Control, and Topological Model Fingerprinting

  • 1. Independent researcher

Description

We present a unified framework for transformer interpretability and safety
grounded in the geometry of residual stream operators — inter-layer differ-
ences ∆l = hl+1 − hl that directly capture what each layer contributes to
the forward pass. We make five empirical contributions validated across four
models spanning three architectural families and a 80× parameter range
(GPT-2 117M through Qwen3.5-9B).

Files

Topological_interpretability (2).pdf

Files (335.5 kB)

Name Size Download all
md5:7b438c3022cc46ed3c1ba59f08c3ad80
335.5 kB Preview Download