Published May 29, 2026 | Version v1

Capability Is Not in the Weights: LM-Head Locking, and why projecting one transformer's MLP weights into another cannot recover its capabilities

Description

We attempted to construct a small specialized language model (458M parame‐ ters) by mathematically projecting the MLP weights of seven larger, indepen‐ dently trained transformer donors into a fully-calibrated host. The procedure spanned 17 iterations exploring every combination of donor selection, basisalignment methodology, magnitude handling, and blend ratio. Every variant produced output strictly worse than the unmodified host. We report this not as a failure of any single donor or method, but as evidence for a structural barrier: the host language-model (LM) head's joint training with its native MLPs creates a basis-specific reading expectation that any foreign MLP signal violates, regard‐ less of how well the foreign signal is selected, projected, or magnitude-matched. The mathematics of weight-level projection is correct; the underlying premise — that semantic capability can be extracted from a donor's MLP weights and reinserted into a different host — does not survive empirical testing at small tar‐ get scale. We additionally find that public capability claims made for the donors are not reflected in measurements of those donors' own internal structure, sug‐ gesting the marketing-claim-driven selection methodology common in the field is fundamentally uninformed by the underlying weight geometry.

Files

Capability Is Not in the Weights.pdf

Files (1.2 MB)

Name Size Download all
md5:0ee42a8e413403ced642c509e4e4b0fa
1.2 MB Preview Download