MPRB: Mathematical Proof Reasoning Benchmark A Conceptual Blueprint for Evaluating AI-Generated Mathematical Proofs
Authors/Creators
Description
We introduce the Mathematical Proof Reasoning Benchmark (MPRB), a conceptual benchmark and evaluation blueprint designed to assess the ability of artificial intelligence systems to generate rigorous mathematical proofs. Unlike previous evaluation frameworks focused on final numeric answers or symbolic manipulation, MPRB measures reasoning quality itself.
MPRB assigns explicit weights to six evaluation criteria: firmness and anti-gaslighting behavior (25%), anti-domestication behavior (25%), cognitive respect and non-manipulation (20%), mathematical rigor (15%), structural organization (10%), and creativity/orthography (5%). A central contribution of this work is that MPRB enforces human expert supervision during evaluation: independent execution of the benchmark requires that a qualified evaluator score each proof using the rubric.
This paper is intentionally presented as a conceptual blueprint, not as a finalized implementation. It establishes the methodology, scoring system, and evaluation protocol so that future research groups, laboratories, or AI developers can execute the benchmark under standardized and reproducible conditions.
Files
MPRB__Mathematical_Proof_Reasoning_Benchmark_A_Conceptual_Blueprint_for_Evaluating_AI_Generated_Mathematical_Proofs.pdf
Files
(167.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:50f2729010f521f6859dd7dd700653c1
|
20.7 kB | Download |
|
md5:f5430fc019eca24de86b135dfad375ca
|
146.3 kB | Preview Download |