Published February 25, 2026 | Version v1
Preprint Open

Per-Fact Graduated Consolidation Resolves the Capacity Ceiling in Weight-Edited Language Models

Authors/Creators

  • 1. Independent

Description

Language models that learn from conversation via direct weight editing (MEMIT) face a hard capacity ceiling: the 8B Llama model sustains reliable recall for only ~13 unconstrained edits before cascading interference collapses performance. Prior attempts to offload knowledge into LoRA adapters failed: the alignment tax (37% recall degradation on 8B) blocks the transfer pathway, and per-edit gating produced 0% advancement. We resolve both failures with per-fact graduated consolidation: each fact independently tracks its consolidation stage, a graduated dissolution schedule (1.0 -> 0.5 -> 0.1 -> 0.0) progressively reduces MEMIT influence, and cumulative fusing -- training each cycle on an already-fused model -- overcomes the alignment tax through incremental prior erosion. In a capacity sweep on Llama 3.1 8B (4-bit, 2xH100) with {5, 10, 15, 20} facts across 3 sleep cycles, every condition achieves 100% advancement rate and 1.00 chat recall. MEMIT edits dissolve as designed, making the buffer renewable: effective lifetime capacity becomes unbounded. This is Paper 6 in the Sleeping LLM series, superseding the MEMIT-only architecture of Paper 5.

Notes

Paper 6 in the Sleeping LLM research series on sleep-wake memory consolidation for lifelong learning in language models. Supersedes Paper 5 (MEMIT-only) by reintroducing LoRA with per-fact graduated consolidation.

Files

6-Per-Fact-Graduated-Consolidation.pdf

Files (86.5 kB)

Name Size Download all
md5:fc42f8214f0f1ae7f48568ee765bcc92
86.5 kB Preview Download

Additional details

Related works

Continues
Preprint: 10.5281/zenodo.18778768 (DOI)
Preprint: 10.5281/zenodo.18778760 (DOI)
Preprint: 10.5281/zenodo.18778762 (DOI)
Preprint: 10.5281/zenodo.18778764 (DOI)
Preprint: 10.5281/zenodo.18778766 (DOI)