Published February 8, 2026
| Version v1
Preprint
Open
The Alignment Tax on Continual Learning: Inverse Scaling of Memory Consolidation in Language Models
Description
We report a surprising inverse scaling phenomenon in LoRA-based memory consolidation for language models. At 3B parameters, sleep-wake consolidation achieves 47% factual recall after training. At 8B, recall drops to 37% with significant confabulation. At 70B, recall is zero despite successful training (low loss, correct gradient flow). We identify RLHF alignment as the cause: safety training creates a behavioral prior that overrides LoRA-injected knowledge at inference time. The effect scales with model size because larger models receive more extensive alignment training. This 'alignment tax' on continual learning has implications for any system attempting to inject new knowledge into aligned language models via parameter-efficient fine-tuning.
Notes
Files
2-Alignment-Tax.pdf
Files
(107.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:c4e46c34c6784043687fe73067b7f9bb
|
107.1 kB | Preview Download |
Additional details
Related works
- Continues
- Preprint: 10.5281/zenodo.18778760 (DOI)
- Is continued by
- Preprint: 10.5281/zenodo.18778764 (DOI)