STEM Truth Oracle: Log-Probability Multiple-Choice Ranking Reveals and Corrects Scale-Invariant Factual Biases

Sanchez, Bryan

doi:10.5281/zenodo.19010629

Published March 13, 2026 | Version 1.1

Preprint Open

STEM Truth Oracle: Log-Probability Multiple-Choice Ranking Reveals and Corrects Scale-Invariant Factual Biases

Sanchez, Bryan¹

1. Independent

We study a systematic failure mode in language models: when the true answer to a STEM question is surprising relative to training-data priors, models prefer plausible-sounding distractors over the correct answer. We build a 97-fact STEM benchmark spanning six domains (calculus, physics, chemistry, statistics, linear algebra, constants) and evaluate six models from GPT-2 (117M) to Qwen3-4B using log-probability multiple-choice ranking. Accuracy rises from 16% to 77% with scale, but systematic errors persist even at 4B parameters. We identify four scale-invariant bias patterns (positivity, linearity, missing-constant, truncation) that appear at all scales. A transfer matrix experiment shows zero cross-pattern generalization from single-pattern adapters; mixed training achieves 70-100% per-pattern accuracy. Log-probability margin is a perfect binary oracle: positive margin predicts correct answer with 100% precision and recall on the 40-fact probe set. Margin magnitude tracks domain difficulty.

v1.1 changes: Expanded limitations section, replaced informal self-references with DOI citations, strengthened abstract opening, added GitHub link.

Notes

Part of the rho-eval / knowledge-fidelity research program. Paper 9 of 9. Code: https://github.com/SolomonB14D3/knowledge-fidelity

Files

stem_truth_oracle.pdf

Files (502.4 kB)

Name	Size	Download all
stem_truth_oracle.pdf md5:3d97c93e936d25538c69d0c202d2caeb	502.4 kB	Preview Download

Additional details

Is part of: Software: 10.5281/zenodo.18743959 (DOI)
Is supplement to: https://github.com/SolomonB14D3/knowledge-fidelity (URL)

	All versions	This version
Views	36	18
Downloads	11	2
Data volume	6.3 MB	1.0 MB

STEM Truth Oracle: Log-Probability Multiple-Choice Ranking Reveals and Corrects Scale-Invariant Factual Biases

Authors/Creators

Description

Notes

Files

stem_truth_oracle.pdf

Files (502.4 kB)

Additional details

Related works