The Alignment Tax on Continual Learning: Inverse Scaling of Memory Consolidation in Language Models

Baranov, Vladimir

doi:10.5281/zenodo.18778762

Published February 8, 2026 | Version v1

Preprint Open

The Alignment Tax on Continual Learning: Inverse Scaling of Memory Consolidation in Language Models

Baranov, Vladimir¹

1. Independent

We report a surprising inverse scaling phenomenon in LoRA-based memory consolidation for language models. At 3B parameters, sleep-wake consolidation achieves 47% factual recall after training. At 8B, recall drops to 37% with significant confabulation. At 70B, recall is zero despite successful training (low loss, correct gradient flow). We identify RLHF alignment as the cause: safety training creates a behavioral prior that overrides LoRA-injected knowledge at inference time. The effect scales with model size because larger models receive more extensive alignment training. This 'alignment tax' on continual learning has implications for any system attempting to inject new knowledge into aligned language models via parameter-efficient fine-tuning.

Notes

Part of the Sleeping LLM research series on sleep-wake memory consolidation for lifelong learning in language models.

Files

2-Alignment-Tax.pdf

Files (107.1 kB)

Name	Size	Download all
2-Alignment-Tax.pdf md5:c4e46c34c6784043687fe73067b7f9bb	107.1 kB	Preview Download

Additional details

Continues: Preprint: 10.5281/zenodo.18778760 (DOI)
Is continued by: Preprint: 10.5281/zenodo.18778764 (DOI)

	All versions	This version
Views	158	158
Downloads	53	53
Data volume	6.4 MB	6.4 MB

The Alignment Tax on Continual Learning: Inverse Scaling of Memory Consolidation in Language Models

Authors/Creators

Description

Notes

Files

2-Alignment-Tax.pdf

Files (107.1 kB)

Additional details

Related works