Published March 5, 2026
| Version v2
Journal article
Open
Behavioral Emergence Is a Data Quality Threshold, Not a Scale Threshold: Contrastive Injection Breaks the Small-Model Bottleneck
Authors/Creators
Description
We demonstrate that behavioral emergence in small language models (7M-12M parameters) is bottlenecked by data quality, not model scale. Injecting as little as 5% behavioral contrast pairs into the pretraining data stream induces behavioral discrimination (bias rho: 0.009->0.433, sycophancy rho: 0.000->0.513) at scales where vanilla training produces near-zero behavioral signal.
Files
contrastive_pretraining.pdf
Files
(194.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:c8e17ff10b9c66b6d7b39fca6743c0d7
|
194.2 kB | Preview Download |
Additional details
Related works
- Is supplemented by
- 10.5281/zenodo.18854944 (DOI)
- 10.5281/zenodo.18865862 (DOI)
- 10.5281/zenodo.18865199 (DOI)