STEM Truth Oracle: Log-Probability Multiple-Choice Ranking Reveals and Corrects Scale-Invariant Factual Biases

Sanchez, Bryan

doi:10.5281/zenodo.19005729

Published March 13, 2026 | Version v1

Preprint Open

STEM Truth Oracle: Log-Probability Multiple-Choice Ranking Reveals and Corrects Scale-Invariant Factual Biases

Sanchez, Bryan¹

1. Independent

We study a systematic failure mode in language models: when the true answer to a STEM question is surprising relative to training-data priors, models prefer plausible-sounding distractors over the correct answer. We build a 97-fact STEM benchmark spanning six domains (calculus, physics, chemistry, statistics, linear algebra, constants) and evaluate six models from GPT-2 (117M) to Qwen3-4B using log-probability multiple-choice ranking. Accuracy rises from 16% to 77% with scale, but systematic errors persist even at 4B parameters. We identify four scale-invariant bias patterns (positivity, linearity, missing-constant, truncation) that appear at all scales. A transfer matrix experiment shows zero cross-pattern generalization from single-pattern adapters; mixed training achieves 70-100% per-pattern accuracy. Log-probability margin is a perfect binary oracle: positive margin predicts correct answer with 100% precision and recall (0 false positives, 0 false negatives on the 40-fact probe set). Margin magnitude tracks domain difficulty (statistics: mean margin -1.15, 60% accuracy; physics: +2.72, 85%). A length-normalization ablation confirms sum log-probability is preferred over mean-per-token scoring. Targeted training on stubborn facts fixes recoverable cases (1/4 fixed) and confirms that remaining failures arise from genuine data contradictions, not insufficient model capacity.

Notes

Part of the rho-eval / knowledge-fidelity research program. Paper 9 of 9. Code available at https://github.com/SolomonB14D3/knowledge-fidelity

Files

stem_truth_oracle.pdf

Files (592.0 kB)

Name	Size	Download all
stem_truth_oracle.pdf md5:ed7c26ef1d70afdab701b77ea27ea653	592.0 kB	Preview Download

Additional details

Is part of: Software: 10.5281/zenodo.18743959 (DOI)
Is supplement to: https://github.com/SolomonB14D3/knowledge-fidelity (URL)

	All versions	This version
Views	36	18
Downloads	11	9
Data volume	6.3 MB	5.3 MB

STEM Truth Oracle: Log-Probability Multiple-Choice Ranking Reveals and Corrects Scale-Invariant Factual Biases

Authors/Creators

Description

Notes

Files

stem_truth_oracle.pdf

Files (592.0 kB)

Additional details

Related works