DALPHIN: A multicentric open benchmark for digital pathology AI copilots
Authors/Creators
-
Lems, Carlijn M.1
-
Moonemans, Sander1
-
Klubíčková, Natálie2, 3
-
Brattoli, Biagio4
-
Lee, Taebum4
-
Kim, Seokhwi5
-
Vilaplana, Veronica6
-
Fernandez, Pedro Luis7, 8
-
Pons, Laura8
- Hochman, Sapir8
- Suárez-Franck, Mauricio Eduardo8
-
Laurinavicius, Arvydas9
-
Drachneris, Julius9, 10
-
Augulis, Renaldas11
-
Petroska, Donatas9
-
Montezuma, Diana12
-
Oliveira, Domingos12
-
Vos, Shoko1
-
Balkenhol, Maschenka1, 13
-
van Ipenburg, Jolique1
-
Vos, Anne-Marie1
-
van Midden, Dominique1
- Bouwmeester, Anouk1
-
van der Laak, Jeroen1, 14
-
Khalili, Nadieh1
-
Meeuwsen, Frederique1
-
Ciompi, Francesco1
-
1.
Radboud University Medical Center
-
2.
Biopticka Laborator (Czechia)
-
3.
Charles University
- 4. Lunit
- 5. Ajou University School of Medicine
- 6. Technical University of Catalonia
-
7.
Universitat Autònoma de Barcelona
- 8. Hospital Universitari Germans Trias i Pujol
-
9.
Vilnius University Hospital Santariskiu Klinikos
-
10.
Vilnius University
- 11. NATIONAL CENTRE OF PATHOLOGY, Affiliate of Vilnius University Hospital Santariskiu Klinikos
-
12.
IMP Diagnostics (Portugal)
-
13.
Canisius-Wilhelmina Ziekenhuis
-
14.
Linköping University
Description
The digital pathology AI copilot benchmark (DALPHIN) dataset is a multicentric, open benchmark for evaluating AI copilots in digital pathology. DALPHIN consists of 300 cases collected across six healthcare institutions in six countries, covering 130 diagnoses from 14 pathology subspecialties, including non-neoplastic entities and rare cancers.
The benchmark includes 1,236 histopathology images (low-resolution whole-slide images and higher-resolution regions of interest) and 1,757 questions across six tasks: tissue/organ recognition, neoplastic status, neoplastic behavior (benign, malignant, in situ, or uncertain), diagnosis, and case-specific multiple-choice and free-response questions.
The images and questions are publicly available via this Zenodo record. Example code to run models on the benchmark and generate responses is provided in the associated GitHub repository. The reference answers are not publicly released but are sequestered and indirectly accessible on the Grand Challenge platform, where submissions are evaluated and ranked on public leaderboards to ensure fair and reproducible evaluation of pathology AI copilots.
Technical info
Repository content
This repository contains one ZIP file and one CSV file:
- 'images.zip' - images for the benchmark, including low-resolution overviews of whole-slide images (WSIs) and higher-resolution regions of interest (ROIs), all provided in PNG format.
- 'dalphin_metadata.csv' - benchmark questions and associated metadata. The table below describes the contents of each column.
| Column name | Contents |
| 'case-id' | Unique anonymous case ID, see 'File ID nomenclature' |
| 'question-id' | Unique question ID, see 'File ID nomenclature' |
| 'preamble' | Text prepended to the actual question (e.g., contextual instructions and disclaimers provided to the model) |
| 'question' | The actual question |
| 'overviews' | Comma-separated filename(s) of the low-resolution WSI overview image(s) used to answer the question, see 'File ID nomenclature' |
| 'rois' | Comma-separated filename(s) of the higher-resolution ROI(s) used to answer the question, see 'File ID nomenclature' |
File ID nomenclature
Each case is assigned a unique anonymous case ID, incrementing from 1. Because a single case may include multiple WSIs and ROIs, image files follow the naming convention: <anonymous_case_id>_<wsi_id>_<roi_id>.png, for example 'case001_wsi1_roi1.png'. Low-resolution WSI overviews follow the convention: <anonymous_case_id>_<wsi_id>_overview.png, for example 'case001_wsi1_overview.png'. Question IDs are named according to case ID and task ID ('tissue', 'neoplasm', 'diagnosis', 'behavior', 'mc', 'open'), following the convention: <anonymous_case_id>_<task_id>, for example 'case001_tissue'.
Other
Terms of use
This dataset is provided for evaluation purposes only. Users should not use any part of the dataset (including images, questions, or associated metadata) for training, pretraining, fine-tuning, or otherwise developing machine learning models.
Files
dalphin_metadata.csv
Files
(1.2 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:19491476beb98926cfef63c22fed1284
|
1.2 MB | Preview Download |
|
md5:582f3863d4db9e67b7cc54c7cbf8fc6a
|
1.2 GB | Preview Download |
Additional details
Related works
- Is described by
- Preprint: arXiv:2605.03544 (arXiv)
Software
- Repository URL
- https://github.com/computationalpathologygroup/DALPHIN
- Programming language
- Python
- Development Status
- Active