Published February 25, 2026
| Version v1
Preprint
Open
Per-Fact Graduated Consolidation Resolves the Capacity Ceiling in Weight-Edited Language Models
Description
Language models that learn from conversation via direct weight editing (MEMIT) face a hard capacity ceiling: the 8B Llama model sustains reliable recall for only ~13 unconstrained edits before cascading interference collapses performance. Prior attempts to offload knowledge into LoRA adapters failed: the alignment tax (37% recall degradation on 8B) blocks the transfer pathway, and per-edit gating produced 0% advancement. We resolve both failures with per-fact graduated consolidation: each fact independently tracks its consolidation stage, a graduated dissolution schedule (1.0 -> 0.5 -> 0.1 -> 0.0) progressively reduces MEMIT influence, and cumulative fusing -- training each cycle on an already-fused model -- overcomes the alignment tax through incremental prior erosion. In a capacity sweep on Llama 3.1 8B (4-bit, 2xH100) with {5, 10, 15, 20} facts across 3 sleep cycles, every condition achieves 100% advancement rate and 1.00 chat recall. MEMIT edits dissolve as designed, making the buffer renewable: effective lifetime capacity becomes unbounded. This is Paper 6 in the Sleeping LLM series, superseding the MEMIT-only architecture of Paper 5.
Notes
Files
6-Per-Fact-Graduated-Consolidation.pdf
Files
(86.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:fc42f8214f0f1ae7f48568ee765bcc92
|
86.5 kB | Preview Download |
Additional details
Related works
- Continues
- Preprint: 10.5281/zenodo.18778768 (DOI)
- Preprint: 10.5281/zenodo.18778760 (DOI)
- Preprint: 10.5281/zenodo.18778762 (DOI)
- Preprint: 10.5281/zenodo.18778764 (DOI)
- Preprint: 10.5281/zenodo.18778766 (DOI)