There is a newer version of the record available.

Published March 5, 2026 | Version v2
Journal article Open

Behavioral Emergence Is a Data Quality Threshold, Not a Scale Threshold: Contrastive Injection Breaks the Small-Model Bottleneck

Authors/Creators

Description

We demonstrate that behavioral emergence in small language models (7M-12M parameters) is bottlenecked by data quality, not model scale. Injecting as little as 5% behavioral contrast pairs into the pretraining data stream induces behavioral discrimination (bias rho: 0.009->0.433, sycophancy rho: 0.000->0.513) at scales where vanilla training produces near-zero behavioral signal.

Files

contrastive_pretraining.pdf

Files (194.2 kB)

Name Size Download all
md5:c8e17ff10b9c66b6d7b39fca6743c0d7
194.2 kB Preview Download

Additional details