Published April 8, 2026 | Version v1
Dataset Open

Mentor::i: code, data, and supplementary materials for "Systematic Ablation Reveals Hidden Failures in Multi-Agent AI for Science"

  • 1. ROR icon Wageningen University & Research

Description

This deposit contains the code, data, and supplementary materials for the paper "Systematic Ablation Reveals Hidden Failures in Multi-Agent AI for Science" by Bianchi & Schokker, 2026

The paper introduces a systematic ablation methodology for retrieval-augmented multi-agent AI systems, validated through a triple-triangulation evaluation framework that combines deterministic ground-truth metrics, calibrated LLM-as-judge scoring, and natural-language-inference fact-checking. The methodology is applied to more than 36,000 individual evaluations spanning 200 scientific papers and 250 expert-curated questions across ten experiments.

Contents of this deposit:

- corpus_papers.csv — 200-paper manifest (50 core bioinformatics / veterinary epidemiology papers + 150 arXiv distractors) with DOIs, PMC IDs, arXiv IDs, source URLs, and licensing.
- download_corpus.py — Python script that recreates the corpus on demand from the manifest.
- corpus_README.md — reproduction guide for the 200-paper corpus.
- corpus_metadata.json — per-paper metadata for the 50 core papers.
- ground_truth.json — 250 expert-curated evaluation questions with expected answers and concepts.
- validation_main.json, validation_cross_document.json, validation_synthesis.json, validation_ood.json — question validation results across the four categories.
- mentori_results.tar.gz — raw JSON outputs from all ten experiments (V4-0 through V4-9), 47 MB compressed, ~200 MB extracted.
- paper_figures.Rmd — single source of truth for all main and Extended Data figures, as an R Markdown document.
- paper_figures_tiff.tar.gz — pre-rendered TIFF versions of every figure at 300 dpi (the submission versions for the Extended Data figures).
- paper_figures_pdf.tar.gz — pre-rendered PDF (vector) versions of every figure (the submission versions for the main figures).

To reproduce the paper figures from scratch:

  git clone https://github.com/vbianchi/Mentori.git
  cd Mentori
  ./publication/data/download_results.sh
  Rscript -e "rmarkdown::render('publication/reports/paper_figures.Rmd')"

The Mentori multi-agent workspace itself is an open-source software release available at https://github.com/vbianchi/Mentori and is licensed separately under MIT for the code and CC-BY 4.0 for figures and derived data.

The 200-paper evaluation corpus is NOT redistributed in primary form in this deposit due to publisher copyright. Use download_corpus.py (included) together with corpus_papers.csv to reconstruct the exact corpus on demand.

Files

corpus_metadata.json

Files (55.0 MB)

Name Size Download all
md5:472cf782507a6f8c72a1690ac68fb1af
9.6 kB Preview Download
md5:5e78148532c00ca2bab2c4243ad8fafe
32.8 kB Preview Download
md5:53ef6c32814df73c56ce7c5078f092a5
6.0 kB Preview Download
md5:d778bd3ed02ba272a0a49245e1ef75ba
4.5 kB Download
md5:326a85abb73b8df9a9ca398caa65c5db
234.6 kB Preview Download
md5:8c093936e2bca4d8331c213ec332e568
49.5 MB Download
md5:02281956e9588e4e206a6a99c6ed360f
89.5 kB Download
md5:9f13a7d8276d1b4d304820ae6e1d2ab7
177.2 kB Download
md5:e6cab007890e11a4b7c2689271355add
4.4 MB Download
md5:8565fc206c9f9aeb9b23437c72cfdde0
88.8 kB Preview Download
md5:8f4c1ff9b5d808b597c83faff82d1f68
250.3 kB Preview Download
md5:ed60c4016ee0ab46d3c3043a099dd281
82.2 kB Preview Download
md5:13dd825e2296131c37ae55e97aa9ae76
81.9 kB Preview Download

Additional details

Additional titles

Alternative title
Systematic Ablation Reveals Hidden Failures in Multi-Agent AI for Science

Dates

Submitted
2026-04-09
Manuscript submitted to Nature Machine Intelligence
Collected
2026-01/2026-04
Experimental data collection period