Published May 7, 2026 | Version v1
Preprint Open

QKV Decomposition for Transformer XAI: Diagnosis, Routing Correction, and Path Flexibility

Authors/Creators

Description

We present a method for diagnosing transformer prediction errors and surgically correcting them through Q/K/V weight analysis. By decomposing attention head weights into query functions (what to search), key responses (what to advertise), and value channels (what to transmit), we can identify failure causes without running any input through the model. Applied to GPT-2, we trace how factual knowledge (e.g., France→Paris) emerges at layer 10 head 8 (+25.8 logit contribution) and is subsequently reversed at layer 12 head 0 (+149.9 for “the”). Targeted retraining of only the diagnosed layer recovers knowledge accuracy from 2/8 to 8/8 capitals with
zero side effects on general capabilities (11/15 maintained, PPL 42.7→42.6). Crucially, the model already possesses this knowledge internally; the failure is one of routing, not absence. We further show that routing correction can be achieved through attention V, FFN, or even V-only (Wv slice, 590k parameters)—suggesting that knowledge routing is not confined to FFN layers, contrary to the common interpretation of Geva et al. [3]. Our approach is validated against Captum (top-1 neuron agreement) and TransformerLens (logit lens consistency). Method specifics are proprietary; analysis scripts and model weights are released for reproduction.

Files

river_9 (1).pdf

Files (287.7 kB)

Name Size Download all
md5:1e997e0941b60f84f5e3bffd9e693fb8
287.7 kB Preview Download

Additional details