Alignment Robustness Depends More on Training than Architecture: A Cross-Vendor Analysis of Attention Specialization in Large Language Models
Description
We present a systematic empirical study examining how preference optimization methods (RLHF, DPO) affect attention head specialization across eight vendor families and more than 25 large language model variants. Using a standardized evaluation protocol (bfloat16 precision, three-seed cross-validation, and SHA-256–verified prompts), we quantify attention head diversity via the Specialization Index (SI) and compare base and instruction-tuned model pairs.
Main finding: Robustness to alignment-induced specialization loss is strongly associated with training methodology, following a consistent hierarchy: Training Methodology > Sliding Window Attention > Architecture > Scale.
Key results:
-
SI reduction pattern: RLHF and DPO reduce SI in most model families lacking architectural protection (LLaMA-3.1: −56.3%; LLaMA-2: −7.95%), whereas models equipped with Sliding Window Attention maintain or increase specialization (Mistral: +4.2%).
-
Architecture-dependent sensitivity: At matched scale, Grouped Query Attention exhibits approximately 5,800× higher sensitivity to random attention noise than Multi-Head Attention (ratio-of-means across three seeds; permutation test, p < 0.05).
-
Training-based robustness: Synthetic training (Phi family) yields scale-invariant specialization (SI ≈ 0.33 across a 10.8× parameter range), and Qwen2 shows no observed recursive degradation within the tested 50-generation window.
This release includes 19 documented Jupyter notebooks that support the full experimental pipeline, 27 result JSON files, and command-line tools that enable end-to-end reproducibility.
The paper text is released under CC-BY-4.0; accompanying code and tooling are released under the MIT License.
Files
github_release_v4.0.zip
Files
(1.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:59c4307fb28bae9bb83567fc0c512356
|
1.1 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Dataset: 10.5281/zenodo.18110161 (DOI)
- Preprint: 10.5281/zenodo.18142454 (DOI)
- Preprint: 10.5281/zenodo.18165365 (DOI)
Dates
- Submitted
-
2026-01-20
Software
- Repository URL
- https://github.com/buk81/uniformity-asymmetry
- Programming language
- Python
- Development Status
- Active