There is a newer version of the record available.

Published January 20, 2026 | Version 4.0
Preprint Open

Alignment Robustness Depends More on Training than Architecture: A Cross-Vendor Analysis of Attention Specialization in Large Language Models

Authors/Creators

  • 1. ROR icon IU International University of Applied Sciences

Description

We present a systematic empirical study examining how preference optimization methods (RLHF, DPO) affect attention head specialization across eight vendor families and more than 25 large language model variants. Using a standardized evaluation protocol (bfloat16 precision, three-seed cross-validation, and SHA-256–verified prompts), we quantify attention head diversity via the Specialization Index (SI) and compare base and instruction-tuned model pairs.

Main finding: Robustness to alignment-induced specialization loss is strongly associated with training methodology, following a consistent hierarchy: Training Methodology > Sliding Window Attention > Architecture > Scale.

Key results:

  • SI reduction pattern: RLHF and DPO reduce SI in most model families lacking architectural protection (LLaMA-3.1: −56.3%; LLaMA-2: −7.95%), whereas models equipped with Sliding Window Attention maintain or increase specialization (Mistral: +4.2%).

  • Architecture-dependent sensitivity: At matched scale, Grouped Query Attention exhibits approximately 5,800× higher sensitivity to random attention noise than Multi-Head Attention (ratio-of-means across three seeds; permutation test, p < 0.05).

  • Training-based robustness: Synthetic training (Phi family) yields scale-invariant specialization (SI ≈ 0.33 across a 10.8× parameter range), and Qwen2 shows no observed recursive degradation within the tested 50-generation window.

This release includes 19 documented Jupyter notebooks that support the full experimental pipeline, 27 result JSON files, and command-line tools that enable end-to-end reproducibility.
The paper text is released under CC-BY-4.0; accompanying code and tooling are released under the MIT License.

Files

github_release_v4.0.zip

Files (1.1 MB)

Name Size Download all
md5:59c4307fb28bae9bb83567fc0c512356
1.1 MB Preview Download

Additional details

Related works

Is supplement to
Dataset: 10.5281/zenodo.18110161 (DOI)
Preprint: 10.5281/zenodo.18142454 (DOI)
Preprint: 10.5281/zenodo.18165365 (DOI)

Dates

Submitted
2026-01-20

Software

Repository URL
https://github.com/buk81/uniformity-asymmetry
Programming language
Python
Development Status
Active