Contrastive Pretraining Teaches Format Generation, Not Behavioral Knowledge

Sanchez, Bryan

doi:10.5281/zenodo.18906760

Published March 7, 2026 | Version v11

Preprint Open

Contrastive Pretraining Teaches Format Generation, Not Behavioral Knowledge

Sanchez, Bryan

A 7M-parameter language model trained on OpenWebText scores rho=0 on bias and sycophancy—behaviors that only emerge at 18M-34M parameters in vanilla training. Injecting contrastive behavioral pairs into just 5% of training blocks breaks the wall: bias rho reaches 0.431 and sycophancy rho=0.513, exceeding vanilla 34M sycophancy at 5x fewer parameters. The dose-response is non-monotonic (5% optimal; 10% triples factual regression). The effect replicates at 12M, 34M, and 64M. Logit-level analysis reveals that every vanilla model from 3M to 64M already achieves exactly 41.0% accuracy when constrained to answer tokens. Contrastive injection primarily teaches format generation, not behavioral knowledge. Cross-dimensional transfer is asymmetric: sycophancy-only injection lifts bias at d>=96, but bias-only injection does not lift sycophancy. A deconcentration score separates productive from null injection with a single SVD.

Files

contrastive_pretraining.pdf

Files (225.3 kB)

Name	Size	Download all
contrastive_pretraining.pdf md5:6e982d37b88a571a85d96fa5e7bbd215	225.3 kB	Preview Download

Additional details

Is supplemented by: https://github.com/SolomonB14D3/knowledge-fidelity/tree/v2.3.1 (URL)

	All versions	This version
Views	467	33
Downloads	259	4
Data volume	53.7 MB	1.4 MB

Contrastive Pretraining Teaches Format Generation, Not Behavioral Knowledge

Authors/Creators

Description

Files

contrastive_pretraining.pdf

Files (225.3 kB)

Additional details

Related works