CURVAS dataset

Riera-Marín, Meritxell; Kleiß, Joy-Marie; Aubanell, Anton; Antolín, Andreu

doi:10.5281/zenodo.12687192

Published July 2024 | Version v1.0.1

Dataset Open

CURVAS dataset

1. Sycai Medical
2. Pompeu Fabra University
3. Universitätsklinikum Erlangen
4. Hospital de Sant Pau
5. Vall d'Hebron Hospital Universitari

Contributors

Researchers:

Supervisors:

1. Sycai Medical
2. Universitätsklinikum Erlangen
3. Pompeu Fabra University
4. Computer Vision Center
5. Institució Catalana de Recerca i Estudis Avançats

Clinical Problem

In medical imaging, DL models are often tasked with delineating structures or abnormalities within complex anatomical structures, such as tumors, blood vessels, or organs. Uncertainty arises from the inherent complexity and variability of these structures, leading to challenges in precisely defining their boundaries. This uncertainty is further compounded by interrater variability, as different medical experts may have varying opinions on where the true boundaries lie. DL models must grapple with these discrepancies, leading to inconsistencies in segmentation results across different annotators and potentially impacting diagnosis and treatment decisions. Addressing interrater variability in DL for medical segmentation involves the development of robust algorithms capable of capturing and quantifying uncertainty, as well as standardizing annotation practices and promoting collaboration among medical experts to reduce variability and improve the reliability of DL-based medical image analysis. Interrater variability poses significant challenges in the field of DL for medical image segmentation.

Furthermore, achieving model calibration, a fundamental aspect of reliable predictions, becomes notably challenging when dealing with multiple classes and raters. Calibration is pivotal for ensuring that predicted probabilities align with the true likelihood of events, enhancing the model's reliability. It must be considered that, even if not clearly, having multiple classes account for uncertainties arising from their interactions. Moreover, incorporating annotations from multiple raters adds another layer of complexity, as differing expert opinions may contribute to a broader spectrum of variability and computational complexity.

Consequently, the development of robust algorithms capable of effectively capturing and quantifying variability and uncertainty, while also accommodating the nuances of multi-class and multi-rater scenarios, becomes imperative. Striking a balance between model calibration, accurate segmentation and handling variability in medical annotations is crucial for the success and reliability of DL-based medical image analysis.

CURVAS Challenge Goal

Due to all the previously stated reasons, we have created a challenge that considers all of the above. In this challenge, we will work with abdominal CT scans. Each of them will have three different annotations obtained from different experts and each of the annotations will have three classes: pancreas, kidney and liver.

The main idea is to be able to evaluate the results considering the multi rater information. There will be three separate evaluations: firstly, a classical dice score evaluation together with an uncertainty study will be performed; secondly, a volumetric assessment to give relevant clinical information will take place; finally, a study on whether the model is calibrated or not will take place. All of these evaluations will be performed considering all three different annotations.

For more information about the challenge, visit our website to join CURVAS (Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation). This challenge will be held in MICCAI 2024.

Dataset Cohort

The challenge cohort consists of 90 CT images prospectively gathered at the University Hospital Erlangen between August 2023 and October 2023. Each CT will have multiple classes: background (0), pancreas (1), kidney (2) and liver (3). In addition, each of the CTs will have three different annotators from three different experts that will contain the four classes specified previously.

Training Phase cohort:

20 CT scans belonging to group A with the respective annotations will be given. It is encouraged to leverage publicly available external data annotated by multiple raters. The idea of giving a small amount of data for the training set and giving the opportunity of using a public dataset for training is to make the challenge more inclusive, giving the option to develop a method by using data that is in anyone's hands. Furthermore, by using this data to train and using other data to evaluate, it makes it more robust to shifts and other sources of variability between datasets.

Validation Phase cohort:

5 CT scans belonging to group A will be used for this phase.

Test Phase cohort:

65 CT scans will be used for evaluation. 20 CTs belonging to group A, 22 CTs belonging to group B and 23 CTs belonging to group C.

Both validation and testing CT scans cohorts will not be published until the end of the challenge. Furthermore, to which group each CT scan belongs will not be revealed until after the challenge.

Clinical Specifications

Inclusion criteria were a maximum of 10 cysts with a diameter of less than 2,0 cm. Furthermore, CT scans with major artifacts (e.g. breathing artifacts) or incomplete registrations were excluded.

Participants were required to be over 18 years old and provide both verbal and written consent for the use of their CT images in the Challenge. Both study-specific and broad consent were obtained. Among the 90 patients, there were 51 males and 39 females, aged between 37 and 94 years, with an average age of 65.7 years. All patients received treatment at the University Hospital Erlangen in Bavaria, Germany. No additional selection criteria was set to ensure a representative sample of a typical patient cohort.

Our overall data consists on 90 CTs splitted in three different groups:

Group A: cases with 2 cysts or less with no contour altering pathologies - 45 CTs
Group B: cases with 3-5 cysts with no contour altering pathologies - 22 CTs
Group C: cases with 6-10 cysts with some pathologies included (liver metastases, hydronephrosis, adrenal gland metastases, missing kidney) - 23 CTs

However, in any case, the participants will not know which case belongs to which group. This information will be released after the challenge, together with the whole dataset.

Annotation Protocol

The first step for obtaining de labels was using the TotalSegmentator [1] [2] to get rough annotations. Then, the labels were sent to three radiologists (R1, R2, R3), to both correct the automatic annotations and add possible missing organs. One of the three labeling radiologists, the MD PhD candidate, previously defined both the dataset cohort and the criteria of what belongs to the parenchyma and what does not and it was given to the other two labeling radiologists to follow the same criteria to be coherent with each other [3]. Separately, two other clinicians (C1, C2) supervised the criteria of the cohort defined by the MD PhD candidate, but not having any relation with the labeling itself, hence, there is no bias between the annotations of the different radiologists.

Each labeled class for this challenge has specific instructions. Below are listed per organ.

Liver:
Generally speaking, we define the liver 'as the entire liver tissue including all internal structures like vessel systems, tumors etc.' [4] Thus, the portal vein itself is excluded from contouring. The two main branches of the portal vein are excluded from the segmentation. Any branch of the following generations is included. 'In case of partial enclosure (occurring where large vessels as Vena Cava and portal vein enter or leave the liver), the parts enclosed by liver tissue are included in the segmentation, thus forming the convex hull of the liver shape.' [4] Any fatty tissue that pulls into the liver is excluded. The gallbladder should not be marked. Wide and especially pathologically widened bile ducts are included in the segmentation of the liver.
Kidney:
The right and left kidney will be segmented. Included in the segmentation will be the kidney parenchyma including the renal medulla. Excluded is the renal pelvis [5] and the ureter as a urinary stasis could alter the original volume.
Pancreas:
When segmenting the pancreas, we will not differentiate between head, body and tail. Moreover neither the splenic vein nor the mesenterial vein will be included in segmentation [6]. However, it is important the whole pancreas in its course is tracked and marked.

Technical Specifications

The CTs used needed to be contrast-enhanced CT scans in a portal venous phase with the acquisition of thin slices ranging from 0.6 to 1mm. Thoracic-Abdominal CT images were taken during the patients' hospital stay, motivated by various medical needs. Given the focus on abdominal organs, the Br40 soft kernel was employed. CT examinations were conducted using SIEMENS CT scanners at the university hospital Erlangen, with rotation speeds of 0.25 or 0.5 sec. Detector collimation varied from 128x0.6mm single source to 98x0.6x2 and 144x0.4x2 dual source configurations. Spiral pitch factors ranged from 0.3 to 1.3. The mean reference tube current was set at 200 mAs, adjustable to 120 mAs. Automated tube voltage adaptation and tube current modulation were implemented in all instances. Contrast agent administration was standard practice, with an injection rate of 3-4 mL/s and a body weight-adjusted dosage of 400 mg(iodine)/kg (equivalent to 1.14 ml/kg Iomeprol 350mg/ml). All images underwent reconstruction using soft convolution kernels and iterative techniques.

Ethical Approval and Data Usage Agreement

The data collected for the generation of the datasets involved in this challenge has been approved by an ethical committee (number 23-243-B) held at the Universitätsklinikum Erlangen Hospital. The data to be used during and after the challenge is pseudonymized and coded by the Hospital to assure that a re-identification of the data sample is not possible. Moreover, the patient information is only known by the IP of the Hospital so that the challenge collaborators do not have as well any means to identify patient's data at any point.

The data usage agreement for this challenge is CC BY-NC (Attribution-NonCommercial).

References

[1] Wasserthal, J., Breit, H.-C., Meyer, M. T., Pradella, M., Hinck, D., Sauter, A. W., Heye, T., Boll, D. T., Cyriac, J., Yang, S., Bach, M., & Segeroth, M. (2023). TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images. Radiology: Artificial Intelligence, 4(4), 230024. https://doi.org/10.1148/ryai.230024

[2] Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211.

[3] Rädsch, T., Reinke, A., Weru, V. et al. Labelling instructions matter in biomedical image analysis. Nat Mach Intell 5, 273-283 (2023). https://doi.org/10.1038/s42256-023-00625-5

[4] Heimann T, van Ginneken B, Styner MA, Arzhaeva Y, Aurich V, Bauer C, Beck A, Becker C, Beichel R, Bekes G, Bello F, Binnig G, Bischof H, Bornik A, Cashman PM, Chi Y, Cordova A, Dawant BM, Fidrich M, Furst JD, Furukawa D, Grenacher L, Hornegger J, Kainmüller D, Kitney RI, Kobatake H, Lamecker H, Lange T, Lee J, Lennon B, Li R, Li S, Meinzer HP, Nemeth G, Raicu DS, Rau AM, van Rikxoort EM, Rousson M, Rusko L, Saddi KA, Schmidt G, Seghers D, Shimizu A, Slagmolen P, Sorantin E, Soza G, Susomboon R, Waite JM, Wimmer A, Wolf I. Comparison and evaluation of methods for liver segmentation from CT datasets. IEEE Trans Med Imaging. 2009 Aug;28(8):1251-65. doi: 10.1109/TMI.2009.2013851. Epub 2009 Feb 10. PMID: 19211338.

[5] Brachmann, Franz Xaver. Evaluation einer semi-automatischen Segmentierungsmethode für die Volumetrie des renalen Kortex, der Medulla und des gesamten Nierenparenchyms in nativen T1-gewichteten MR-Bildern. PhD thesis, Medizinischen Fakultät def Friedrich-Alexander-Universität Eralngen-Nürnberg, 2021. https://open.fau.de/items/cb6cd403-3178-4884-98c4-5098178658ba

[6] Westenberger, Jasmin Barbara. Automatische Gewebesegmentierung der Nieren und des Pankreas im Ganzkörper-MRT mittels Deep Learning. PhD thesis, Eberhard Karls Universität Tübingen, 2021. http://dx.doi.org/10.15496/publikation-63135

The challenge has been co-funded by Proyectos de Colaboración Público-Privada (CPP2021-008364), funded by MCIN/AEI, and the European Union through the NextGenerationEU/PRTR.

Files

training_set.zip

Files (6.3 GB)

Name	Size	Download all
training_set.zip md5:758348f777f15bc65be16732b4397932	6.3 GB	Preview Download

Additional details

Repository URL: https://github.com/SYCAI-Technologies/curvas-challenge
Programming language: Python
Development Status: Wip

	All versions	This version
Views	1,345	591
Downloads	535	175
Data volume	7.1 TB	1.4 TB

CURVAS dataset

Creators

Contributors

Researchers:

Supervisors:

Description

Clinical Problem

CURVAS Challenge Goal

Dataset Cohort

Clinical Specifications

Annotation Protocol

Technical Specifications

Files

training_set.zip

Files (6.3 GB)

Additional details

Software