CURVAS-PDACVI dataset

Riera-Marín, Meritxell; O K, SIKHA; DUH, MARIA MONTSERRAT; Aubanell, Anton; Ruben, de Figueiredo Cardoso; Saskia, Egger-Hackenschmidt; Rodríguez-Comas, Júlia; González Ballester, Miguel Ángel; Garcia López, Javier

doi:10.5281/zenodo.15593628

Published June 2025 | Version v2

Dataset Open

CURVAS-PDACVI dataset

1. Sycai Technologies SL
2. Pompeu Fabra University
3. Hospital de Mataró
4. Hospital de Sant Pau
5. Universitätsklinikum Erlangen
6. Sycai Medical
7. Sycai technologies

This challenge will be hosted soon in Grand Challenge. Currently under construction.

Clinical Problem

In medical imaging, DL models are often tasked with delineating structures or abnormalities within complex anatomical structures, such as tumors, blood vessels, or organs. Uncertainty arises from the inherent complexity and variability of these structures, leading to challenges in precisely defining their boundaries. This uncertainty is further compounded by interrater variability, as different medical experts may have varying opinions on where the true boundaries lie. DL models must grapple with these discrepancies, leading to inconsistencies in segmentation results across different annotators and potentially impacting diagnosis and treatment decisions. Addressing interrater variability in DL for medical segmentation involves the development of robust algorithms capable of capturing and quantifying uncertainty, as well as standardizing annotation practices and promoting collaboration among medical experts to reduce variability and improve the reliability of DL-based medical image analysis. Interrater variability poses significant challenges in the field of DL for medical image segmentation.

This challenge is designed to promote awareness of the impact uncertainty has on clinical applications of medical image analysis. In our last-year edition, we proposed a competition based on modeling the uncertainty of segmenting three abdominal organs, namely kidney, liver and pancreas, focusing on organ volume as a clinical quantity of interest. This year, we go one step further and propose to segment pancreatic pathological structures, namely Pancreatic Ductal Adenocarcinoma (PDAC), with the clinical goal of understanding vascular involvement, a key measure of tumor resectability. In this above context, uncertainty quantification is a much more challenging task, given the wildly varying contours that different PDAC instances show.

This year, we will provide a richer dataset, in which we start from an already existing dataset of clinically verified contrast-enhanced abdominal CT scans with a single set of manual annotations (provided by the PANORAMA organization), and make an effort to construct four extra manual annotations per PDAC case. In this way, we will assemble a unique dataset that creates a notable opportunity to analyze the impact of multi-rater annotations in several dimensions, e.g. different annotation protocols or different annotator experiences, to name a few.

CURVAS Challenge Goal

This challenge aims to advance deep learning methods for medical image segmentation by focusing on the critical issue of interrater variability, particularly in the context of pancreatic cancer. Building on last year's focus on organ segmentation uncertainty, this edition shifts to the more complex task of segmenting Pancreatic Ductal Adenocarcinoma (PDAC) to assess vascular involvement—a key indicator of tumor resectability. By providing a unique, richly annotated dataset with multiple expert annotations per case, the challenge encourages participants to develop robust models that can quantify and manage uncertainty arising from differing expert opinions, ultimately improving the clinical reliability of AI-based image analysis.

For more information about the challenge, visit our website to join CURVAS-PDACVI (Calibration and Uncertainty for multiRater Volume Assessment in multistructure Segmentation - Pancreatic Ductal AdenoCarcinoma Vascular Invasion). This challenge will be held in MICCAI 2025.

Dataset Cohort

The challenge cohort comprises upper-abdominal axial, portal-venous CECT 125 CT scans selected from a subset of the PANORAMA challenge dataset. The selection process will prioritize CT scans with manually generated labels, excluding those with automatically derived annotations. Additionally, only cases with a conclusive diagnostic test (e.g., pathology, cytology, histopathology) are included, while patients with radiology-based diagnoses have been excluded.

To ensure the subset is representative of common real-world scenarios, lesion sizes have been analyzed, and a diverse range of cases have been selected. Furthermore, patient demographics, including sex and age, have been considered to enhance the cohort's representativeness.

Finally, a preliminary visual analysis have been conducted before sending the image to radiologists for segmentation. This ensures the tumor's location, size, and relevance, helping maintain the dataset's representativeness for the challenge.

The previously indicated cohort of 125 CT scans is splitted in the following way:

Training Phase cohort:

40 CT scans with the respective annotations is given. It is encouraged to leverage publicly available external data annotated by multiple raters. The idea of giving a small amount of data for the training set and giving the opportunity of using a public dataset for training is to make the challenge more inclusive, giving the option to develop a method by using data that is in anyone's hands. Furthermore, by using this data to train and using other data to evaluate, it makes it more robust to shifts and other sources of variability between datasets.

Validation Phase cohort:

5 CT scans will be used for this phase.

Test Phase cohort:

85 CT scans will be used for evaluation.

Both validation and testing CT scans cohorts will not be published until the end of the challenge. Furthermore, to which group each CT scan belongs will not be revealed until after the challenge.

Each folder containing a study is named with a unique ID (CURVASPDAC_XXXX) so it cannot be directy related to the PANORAMA ID and has the following structure:

annotation_X.nii.gz: contains the Pancreatic Ductal Adenocarcinoma (PDAC) segmentations (X=1 being the PANORAMA segmentation, X=2,..,5 being the other experts segmentations)
vascular_annotation.nii.gz: contains the vascular structures that will be analyzed in the challenge. They are the following: Porta (=1), Superior Mesenteric Vein (SMV) (=2), Aorta (=3), Celiac Trunk (=4) and Superior Mesenteric Artery (SMA) (=5).
image.nii.gz: CT volume

The four additional annotations are done from radiologists at Universitätsklinikum Erlangen, Hospital de Sant Pau, and Hospital de Mataró. Hence, four new annotations plus the PANORAMA annotation are provied. Another clinician, focused on modifying the annotations from the vascular structures of the PANORAMA dataset and separated veins and arteries in single strcutures segmentations. This structures are the ones considered highly relevant for the study of Vascular Invasion (VI): Porta, Superior Mesenteric Vein (SMV), Superior Mesenteric Artery (SMA), Aorta and Celiac Trunk. The vascular annotations will be made public later in the challenge, so the participants can try out the evaluation code.

A balance to ensure representiveness within the subsets have been performed as well. Factors such as devices, sex, and patient age have been considered to improve the cohort's representativeness. Efforts have been made to balance bias as evenly as possible across these variables. For age distribution, the target percentages are as follows: below 50 years (5%), 50–59 years (15%), 60–69 years (20%), 70–79 years (30%), and 80–89 years (30%) [1,2,3,4]. While these percentages are approximate and have been rounded for simplicity, the balance aims to be as close to these proportions as feasible. For the sex, 40-50% for females and 50-60% for males [5]. For location of the PDAC, 60-70% head, 15-25% body and 10-15% tail [6]. The size of the lesions has been analyzed and a subset will be selected and this values will be published in the future with the entire dataset.

Data from PANORAMA Batch 1 (https://zenodo.org/records/13715870), Batch 2 (https://zenodo.org/records/13742336), and Batch 3 (https://zenodo.org/records/11034011)), are not allowed for training the models. Batch 4 (https://zenodo.org/records/10999754) can be used.

For more technical information about the dataset visit the platform: https://panorama.grand-challenge.org/datasets-imaging-labels/

Ethical Approval and Data Usage Agreement

No other information that is not already public about the patient will be released since the CT images and their corresponding information are already publicly available.

References

[1] Lee, K.S.; Sekhar, A.; Rofsky, N.M.; Pedrosa, I. Prevalence of Incidental Pancreatic Cysts in the Adult Population on MR Imaging. Am J Gastroenterol 2010, 105, 2079–2084, doi:10.1038/ajg.2010.122.

[2] Canakis, A.; Lee, L.S. State-of-the-Art Update of Pancreatic Cysts. Dig Dis Sci 2021.

[3] De Oliveira, P.B.; Puchnick, A.; Szejnfeld, J.; Goldman, S.M. Prevalence of Incidental Pancreatic Cysts on 3 Tesla Magnetic Resonance. PLoS One 2015, 10, doi:10.1371/JOURNAL.PONE.0121317.

[4] Kimura, W.; Nagai, H.; Kuroda, A.; Muto, T.; Esaki, Y. Analysis of Small Cystic Lesions of the Pancreas. Int J Pancreatol 1995, 18, 197–206, doi:10.1007/BF02784942.

[5] Natalie Moshayedi et al. Race, sex, age, and geographic disparities in pancreatic cancer incidence. JCO 40, 520-520(2022). DOI:10.1200/JCO.2022.40.4_suppl.520

[6] Avo Artinyan, Perry A. Soriano, Christina Prendergast, Tracey Low, Joshua D.I. Ellenhorn, Joseph Kim, The anatomic location of pancreatic cancer is a prognostic factor for survival, HPB, Volume 10, Issue 5, 2008, Pages 371-376, ISSN 1365-182X, https://doi.org/10.1080/13651820802291233.

The challenge has been co-funded by Proyectos de Colaboración Público-Privada (CPP2021-008364), funded by MCIN/AEI, and the European Union through the NextGenerationEU/PRTR and by the Program Doctorats Industrials de Catalunya, specifically the industrial doctorate AGAUR 2021-063, in collaboration with Sycai Technologies SL.

Files

training_set.zip

Files (4.6 GB)

Name	Size	Download all
training_set.zip md5:be67dabc6fbbbe14b499d9b83300a51f	4.6 GB	Preview Download

	All versions	This version
Views	710	185
Downloads	224	38
Data volume	2.6 TB	220.5 GB

CURVAS-PDACVI dataset

Creators

Description

Clinical Problem

CURVAS Challenge Goal

Dataset Cohort

Files

training_set.zip

Files (4.6 GB)