DALPHIN: A multicentric open benchmark for digital pathology AI copilots

Lems, Carlijn M.; Moonemans, Sander; Klubíčková, Natálie; Brattoli, Biagio; Lee, Taebum; Kim, Seokhwi; Vilaplana, Veronica; Fernandez, Pedro Luis; Pons, Laura; Hochman, Sapir; Suárez-Franck, Mauricio Eduardo; Laurinavicius, Arvydas; Drachneris, Julius; Augulis, Renaldas; Petroska, Donatas; Montezuma, Diana; Oliveira, Domingos; Vos, Shoko; Balkenhol, Maschenka; van Ipenburg, Jolique; Vos, Anne-Marie; van Midden, Dominique; Bouwmeester, Anouk; van der Laak, Jeroen; Khalili, Nadieh; Meeuwsen, Frederique; Ciompi, Francesco

doi:10.5281/zenodo.18609450

Published May 6, 2026 | Version v1

Dataset Open

DALPHIN: A multicentric open benchmark for digital pathology AI copilots

1. Radboud University Medical Center
2. Biopticka Laborator (Czechia)
3. Charles University
4. Lunit
5. Ajou University School of Medicine
6. Technical University of Catalonia
7. Universitat Autònoma de Barcelona
8. Hospital Universitari Germans Trias i Pujol
9. Vilnius University Hospital Santariskiu Klinikos
10. Vilnius University
11. NATIONAL CENTRE OF PATHOLOGY, Affiliate of Vilnius University Hospital Santariskiu Klinikos
12. IMP Diagnostics (Portugal)
13. Canisius-Wilhelmina Ziekenhuis
14. Linköping University

The digital pathology AI copilot benchmark (DALPHIN) dataset is a multicentric, open benchmark for evaluating AI copilots in digital pathology. DALPHIN consists of 300 cases collected across six healthcare institutions in six countries, covering 130 diagnoses from 14 pathology subspecialties, including non-neoplastic entities and rare cancers.

The benchmark includes 1,236 histopathology images (low-resolution whole-slide images and higher-resolution regions of interest) and 1,757 questions across six tasks: tissue/organ recognition, neoplastic status, neoplastic behavior (benign, malignant, in situ, or uncertain), diagnosis, and case-specific multiple-choice and free-response questions.

The images and questions are publicly available via this Zenodo record. Example code to run models on the benchmark and generate responses is provided in the associated GitHub repository. The reference answers are not publicly released but are sequestered and indirectly accessible on the Grand Challenge platform, where submissions are evaluated and ranked on public leaderboards to ensure fair and reproducible evaluation of pathology AI copilots.

Technical info

Repository content

This repository contains one ZIP file and one CSV file:

'images.zip' - images for the benchmark, including low-resolution overviews of whole-slide images (WSIs) and higher-resolution regions of interest (ROIs), all provided in PNG format.
'dalphin_metadata.csv' - benchmark questions and associated metadata. The table below describes the contents of each column.

Column name	Contents
'case-id'	Unique anonymous case ID, see 'File ID nomenclature'
'question-id'	Unique question ID, see 'File ID nomenclature'
'preamble'	Text prepended to the actual question (e.g., contextual instructions and disclaimers provided to the model)
'question'	The actual question
'overviews'	Comma-separated filename(s) of the low-resolution WSI overview image(s) used to answer the question, see 'File ID nomenclature'
'rois'	Comma-separated filename(s) of the higher-resolution ROI(s) used to answer the question, see 'File ID nomenclature'

File ID nomenclature

Each case is assigned a unique anonymous case ID, incrementing from 1. Because a single case may include multiple WSIs and ROIs, image files follow the naming convention: <anonymous_case_id>_<wsi_id>_<roi_id>.png, for example 'case001_wsi1_roi1.png'. Low-resolution WSI overviews follow the convention: <anonymous_case_id>_<wsi_id>_overview.png, for example 'case001_wsi1_overview.png'. Question IDs are named according to case ID and task ID ('tissue', 'neoplasm', 'diagnosis', 'behavior', 'mc', 'open'), following the convention: <anonymous_case_id>_<task_id>, for example 'case001_tissue'.

Other

Terms of use

This dataset is provided for evaluation purposes only. Users should not use any part of the dataset (including images, questions, or associated metadata) for training, pretraining, fine-tuning, or otherwise developing machine learning models.

Files

dalphin_metadata.csv

Files (1.2 GB)

Name	Size	Download all
dalphin_metadata.csv md5:19491476beb98926cfef63c22fed1284	1.2 MB	Preview Download
images.zip md5:582f3863d4db9e67b7cc54c7cbf8fc6a	1.2 GB	Preview Download

Additional details

Is described by: Preprint: arXiv:2605.03544 (arXiv)

Repository URL: https://github.com/computationalpathologygroup/DALPHIN
Programming language: Python
Development Status: Active

	All versions	This version
Views	19	19
Downloads	3	3
Data volume	1.2 GB	1.2 GB

dalphin_metadata.csv

Files (1.2 GB)

Related works

Software

DALPHIN: A multicentric open benchmark for digital pathology AI copilots

Authors/Creators

Description

Technical info

Other

Files

dalphin_metadata.csv

Files (1.2 GB)

Additional details

Related works

Software