# We Can Predict Which Layer Will Matter Most for Changing a Model's Next-Token Answer Before Running Any Intervention Sweep

Peña, Angel

doi:10.5281/zenodo.19010660

Published March 13, 2026 | Version v2

Patent Open

# We Can Predict Which Layer Will Matter Most for Changing a Model's Next-Token Answer Before Running Any Intervention Sweep

Peña, Angel (Researcher)

Continuous Representations, Discrete Commitment: A Causal Threshold in Decoder-Only LLMs

Correlational and interventional analyses of LLM internals appear to disagree: probes show gradual representational change across depth, while activation patching reveals sharp behavioral transitions. We resolve this by showing the two methods measure different properties.

We perform layerwise residual-stream swaps with paired controls across three decoder-only architectures (GPT-2 Small, Gemma-2-2B, Qwen2.5-1.5B) and find a replicated causal commitment transition at 62–71% network depth. Below this threshold, swaps produce negligible behavioral change; at or above it, outputs flip immediately with large margin transfer. The transition is specific to the main intervention (not matched by random-norm, self, or position-shuffle controls) and stable across patch scales and random seeds in the two mid-size models.

Representations evolve continuously. Causal commitment does not. The two findings are compatible once the distinction between representational change and output determination is made explicit.

Code and evaluation notebooks are available in the companion repository.

Keywords: mechanistic interpretability, activation patching, causal intervention, commitment threshold, decoder-only transformers

Files

before_the_lock.pdf

Files (1.7 MB)

Name	Size	Download all
before_the_lock.pdf md5:2558b2fa7f6269e0e63c1c2b07d2b121	1.7 MB	Preview Download

	All versions	This version
Views	476	404
Downloads	266	231
Data volume	545.0 MB	500.9 MB

# We Can Predict Which Layer Will Matter Most for Changing a Model's Next-Token Answer Before Running Any Intervention Sweep

Authors/Creators

Description

Files

before_the_lock.pdf

Files (1.7 MB)