Capability Is Not in the Weights: LM-Head Locking, and why projecting one transformer's MLP weights into another cannot recover its capabilities

Al Hajeri, Mohamed

doi:10.5281/zenodo.20446689

Published May 29, 2026 | Version v1

Preprint Open

Capability Is Not in the Weights: LM-Head Locking, and why projecting one transformer's MLP weights into another cannot recover its capabilities

Al Hajeri, Mohamed (Researcher)

We attempted to construct a small specialized language model (458M parame‐ ters) by mathematically projecting the MLP weights of seven larger, indepen‐ dently trained transformer donors into a fully-calibrated host. The procedure spanned 17 iterations exploring every combination of donor selection, basisalignment methodology, magnitude handling, and blend ratio. Every variant produced output strictly worse than the unmodified host. We report this not as a failure of any single donor or method, but as evidence for a structural barrier: the host language-model (LM) head's joint training with its native MLPs creates a basis-specific reading expectation that any foreign MLP signal violates, regard‐ less of how well the foreign signal is selected, projected, or magnitude-matched. The mathematics of weight-level projection is correct; the underlying premise — that semantic capability can be extracted from a donor's MLP weights and reinserted into a different host — does not survive empirical testing at small tar‐ get scale. We additionally find that public capability claims made for the donors are not reflected in measurements of those donors' own internal structure, sug‐ gesting the marketing-claim-driven selection methodology common in the field is fundamentally uninformed by the underlying weight geometry.

Files

Capability Is Not in the Weights.pdf

Files (1.2 MB)

Name	Size	Download all
Capability Is Not in the Weights.pdf md5:0ee42a8e413403ced642c509e4e4b0fa	1.2 MB	Preview Download

	All versions	This version
Views	45	45
Downloads	18	18
Data volume	25.6 MB	25.6 MB

Capability Is Not in the Weights: LM-Head Locking, and why projecting one transformer's MLP weights into another cannot recover its capabilities

Authors/Creators

Description

Files

Capability Is Not in the Weights.pdf

Files (1.2 MB)