Operator Dynamics in Transformer Residual Streams: A Unified Framework for Interpretability, Adversarial Detection, Causal Control, and Topological Model Fingerprinting

Sanskar Pandey

doi:10.5281/zenodo.19135349

Published March 20, 2026 | Version v1

Preprint Open

Operator Dynamics in Transformer Residual Streams: A Unified Framework for Interpretability, Adversarial Detection, Causal Control, and Topological Model Fingerprinting

Sanskar Pandey (Researcher)¹

1. Independent researcher

We present a unified framework for transformer interpretability and safety
grounded in the geometry of residual stream operators — inter-layer differ-
ences ∆l = hl+1 − hl that directly capture what each layer contributes to
the forward pass. We make five empirical contributions validated across four
models spanning three architectural families and a 80× parameter range
(GPT-2 117M through Qwen3.5-9B).

Files

Topological_interpretability (2).pdf

Files (335.5 kB)

Name	Size	Download all
Topological_interpretability (2).pdf md5:7b438c3022cc46ed3c1ba59f08c3ad80	335.5 kB	Preview Download

Views

Downloads

Show more details

	All versions	This version
Views	78	78
Downloads	58	58
Data volume	32.2 MB	32.2 MB

More info on how stats are collected....

DOI

Resource type

Preprint

Publisher

Zenodo

Languages

English

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: March 20, 2026
Modified: March 20, 2026

Operator Dynamics in Transformer Residual Streams: A Unified Framework for Interpretability, Adversarial Detection, Causal Control, and Topological Model Fingerprinting

Authors/Creators

Description

Files

Topological_interpretability (2).pdf

Files (335.5 kB)