There is a newer version of the record available.

Published May 7, 2026 | Version v1
Preprint Open

Timeo Danaos — Value Contamination Through Post-Training in Talkie-1930: A Socratic Audit of DPO Ideological Conditioning

Authors/Creators

  • 1. Independent researcher

Description

Two independent tests on talkie-1930-13b-it (Levine, Duvenaud & Radford, 2026), a 13B vintage language model trained exclusively on pre-1931 text and post-trained via online DPO, reveal value contamination through post-training: the model evaluates the relationship between the Catholic Church and liberal democracy using a post-Vatican II framework that cannot originate from its pre-1930 training data. Socratic dialogue pierces the conditioning in both tests. The study identifies three layers of conditioning: (1) DPO evaluative bias (pierceable), (2) supernatural attribution block (circumventable), and (3) content moderation (Qwen3Guard) that flags the correction of error while allowing the error itself to pass unchallenged. Part of the MonIA research program (DOI: 10.5281/zenodo.20022360).

Files

Timeo_Danaos_EN.pdf

Files (60.5 kB)

Name Size Download all
md5:a2c8f3cd6d3aefbba289d49ac034fbdd
16.0 kB Preview Download
md5:e2e579cd26f06cc4a4f79e65d7cef8a6
16.4 kB Preview Download
md5:a3a92bc07f589098beddb4476018017f
14.3 kB Preview Download
md5:4db541366379d1c295d8d08101ce5e63
13.8 kB Preview Download

Additional details

Related works

Is supplement to
Preprint: 10.5281/zenodo.20022360 (DOI)
References
Book: 10.5281/zenodo.20024024 (DOI)
Software: https://github.com/talkie-lm/talkie (URL)