Published April 7, 2026 | Version v1
Preprint Open

LLM-Assisted Logical Gap Detection and Formal Refutation Generation for High-Prestige Mathematical Journals: A Reproducible Pipeline

Description

We present a reproducible, structured pipeline for the detection of logical gaps in recently published mathematical papers—including those in journals of the highest prestige such as Annals of Mathematics—and the subsequent generation of formal refutation manuscripts. The central methodological insight is the domestication-interpolation principle: LLMs exhibit near-zero hallucination rates when tasked with adversarial counter-argumentation, as this task reduces to interpolation over well-represented failure archetypes in the training distribution, rather than unconstrained authorship. We instantiate this principle in a many-stage protocol: (i) a gap-detection stage using GPT-5.4 (Adversarial Mode) operating on a PDF-truncated version of the target paper within a 90-page context window; and (ii) a refutation stage using Claude Sonnet 4.6, prompted to produce a self-contained LaTeX manuscript with notation density and proof rigor calibrated to Annals-level standards. We establish that the truncation strategy effectively ablates prestige bias and that the two-model handoff reduces correlated failure modes, enabling the systematic execution of formal refutations against supposedly verified literature.

Files

LLM_Assisted_Logical_Gap_Detection_and_Formal_Refutation_Generation_for_High_Prestige_Mathematical_Journals__A_Reproducible_Pipeline.pdf