Published February 8, 2026 | Version v1
Preprint Open

The Alignment Tax on Continual Learning: Inverse Scaling of Memory Consolidation in Language Models

Authors/Creators

  • 1. Independent

Description

We report a surprising inverse scaling phenomenon in LoRA-based memory consolidation for language models. At 3B parameters, sleep-wake consolidation achieves 47% factual recall after training. At 8B, recall drops to 37% with significant confabulation. At 70B, recall is zero despite successful training (low loss, correct gradient flow). We identify RLHF alignment as the cause: safety training creates a behavioral prior that overrides LoRA-injected knowledge at inference time. The effect scales with model size because larger models receive more extensive alignment training. This 'alignment tax' on continual learning has implications for any system attempting to inject new knowledge into aligned language models via parameter-efficient fine-tuning.

Notes

Part of the Sleeping LLM research series on sleep-wake memory consolidation for lifelong learning in language models.

Files

2-Alignment-Tax.pdf

Files (107.1 kB)

Name Size Download all
md5:c4e46c34c6784043687fe73067b7f9bb
107.1 kB Preview Download

Additional details

Related works

Continues
Preprint: 10.5281/zenodo.18778760 (DOI)
Is continued by
Preprint: 10.5281/zenodo.18778764 (DOI)