Published June 2, 2026 | Version 1.0.0
Preprint Open

A Rate-Distortion Function for Model Merging

  • 1. ROR icon Jaypee University of Engineering and Technology

Description

Practitioners merge many LoRA fine-tunes into one deployable model, but no principled limit says how well this can be done at a fixed storage budget. This paper casts merging as multi-source lossy source coding under worst-task distortion and establishes what is, to the authors' knowledge, the first rate-distortion theorem for it. For T tasks with rank-r updates, a per-task error radius B, and a storage budget of R bits, the worst-task distortion under a Stiefel-random worst-case distribution has a closed-form floor B²(1 − d_eff/(Tr)), set by an effective dimension d_eff (the expected rank of the summed projectors onto the task subspaces), plus a compression term scaling as Θ(2^(−2R/d_eff)). An explicit Gaussian-QR rotation followed by uniform scalar quantization matches the lower bound up to a constant factor for fixed task count T.

Across 16 real LoRA adapters (4 base models × 4 tasks), the task subspaces are linearly independent (d_eff = Tr in every layer), so the bound's floor is zero, yet current merging methods leave 0.10–0.22 nats/token of worst-task error. The operative question is therefore algorithmic, not informational. Among standard methods only TIES beats naive averaging, and only via the bottleneck task, whereas subspace alignment (KnOTS) is statistically indistinguishable from naive averaging, and a universal "less-is-more" regime appears in which 2-bit task-vector quantization beats higher-bit quantization on all four architectures. All empirical claims carry 95% bootstrap confidence intervals

Files

document_pdf.pdf

Files (591.0 kB)

Name Size Download all
md5:cf1aa1adcaa5a805a48cbbbbc69bf451
591.0 kB Preview Download