From Verification Failure to Swarm Solution: Measuring and Addressing Scalable AI Oversight

Maio, Anthony D.

doi:10.5281/zenodo.18234621

Published January 13, 2026 | Version 1.0

Preprint Open

From Verification Failure to Swarm Solution: Measuring and Addressing Scalable AI Oversight

Maio, Anthony D. (Researcher)¹

1. None

As AI systems grow more capable, ensuring reliable human oversight becomes increasingly critical. We present a two-part investigation into scalable oversight. First, we introduce Cross-Model Epistemic Divergence (CMED), a methodology using "epistemic traps"—problems with counterintuitive correct answers—to measure verification failures. Testing GPT-4o-mini as a verifier of Claude Sonnet's reasoning, we find that while verifiers achieve approximately 97% agreement on correctly-solved problems, 20-40% of subtly flawed derivations pass verification undetected. This asymmetry reveals a fundamental limitation: single-model verification provides false confidence rather than genuine oversight. Second, we propose the Heterogeneous Divergence-Convergence Swarm (HDCS), an ensemble architecture addressing these limitations through model family diversity. By combining workers from different training lineages (Llama, Mistral, Gemma) whose errors are uncorrelated, HDCS enables error detection through disagreement. Key innovations include a baseline-first anti-anchoring protocol preventing executive models from lazily editing drafts, and structured JSON outputs enabling systematic disagreement analysis. Our work provides both a diagnostic tool for measuring oversight failures and a constructive approach to building more robust AI verification systems.

Files

scalable_oversight_paper.pdf

Files (445.8 kB)

Name	Size	Download all
figures.tex md5:9724c6bc98de1485c1f9f97e9c1ccaff	3.8 kB	Download
references.bib md5:e9d2976ca05d7c9bbf6018f71e903267	6.4 kB	Download
scalable_oversight_paper.pdf md5:5c641e1d48511cd17b3cf387bbc15b58	398.8 kB	Preview Download
scalable_oversight_paper.tex md5:fc0584f8792f3b06a64d159f242c1b64	36.8 kB	Download

	All versions	This version
Views	49	49
Downloads	28	28
Data volume	10.5 MB	10.5 MB

From Verification Failure to Swarm Solution: Measuring and Addressing Scalable AI Oversight

Authors/Creators

Description

Files

scalable_oversight_paper.pdf

Files (445.8 kB)