SynthRAD2023 Grand Challenge dataset: synthetizing computed tomography for radiotherapy
- 1. UMC Groningen
- 2. Radboud UMC
- 3. UMC Utrecht
Contributors
Contact person:
Data collector:
Data curator:
- 1. UMC Groningen
- 2. Radboud UMC
- 3. UMC Utrecht
Description
DATASET STRUCTURE
The dataset can be downloaded from https://doi.org/10.5281/zenodo.7260705 and a detailed description is offered at "synthRAD2023_dataset_description.pdf".
The training datasets for Task1 is in Task1.zip, while for Task2 in Task2.zip. After unzipping, each Task is organized according to the following folder structure:
Task1.zip/
├── Task1
├── brain
├── 1Bxxxx
├── mr.nii.gz
├── ct.nii.gz
└── mask.nii.gz
├── ...
└── overview
├── 1_brain_train.xlsx
├── 1Bxxxx_train.png
└── ...
└── pelvis
├── 1Pxxxx
├── mr.nii.gz
├── ct.nii.gz
├── mask.nii.gz
├── ...
└── overview
├── 1_pelvis_train.xlsx
├── 1Pxxxx_train.png
└── ....
Task2.zip/
├──Task2
├── brain
├── 2Bxxxx
├── cbct.nii.gz
├── ct.nii.gz
└── mask.nii.gz
├── ...
└── overview
├── 2_brain_train.xlsx
├── 2Bxxxx_train.png
└── ...
└── pelvis
├── 2Pxxxx
├── cbct.nii.gz
├── ct.nii.gz
├── mask.nii.gz
├── ...
└── overview
├── 2_pelvis_train.xlsx
├── 2Pxxxx_train.png
└── ....
Each patient folder has a unique name that contains information about the task, anatomy, center and a patient ID. The naming follows the convention below:
[Task] [Anatomy] [Center] [PatientID]
1 B A 001
In each patient folder, three files can be found:
-
ct.nii.gz: CT image
-
mr.nii.gz or cbct.nii.gz (depending on the task): CBCT/MR image
-
mask.nii.gz:image containing a binary mask of the dilated patient outline
For each task and anatomy, an overview folder is provided which contains the following files:
-
[task]_[anatomy]_train.xlsx: This file contains information about the image acquisition protocol for each patient.
-
[task][anatomy][center][PatientID]_train.png: For each patient a png showing axial, coronal and sagittal slices of CBCT/MR, CT, mask and the difference between CBCT/MR and CT is provided. These images are meant to provide a quick visual overview of the data.
DATASET DESCRIPTION
This challenge dataset contains imaging data of patients who underwent radiotherapy in the brain or pelvis region. Overall, the population is predominantly adult and no gender restrictions were considered during data collection. For Task 1, the inclusion criteria were the acquisition of a CT and MRI during treatment planning while for task 2, acquisitions of a CT and CBCT, used for patient positioning, were required. Datasets for task 1 and 2 do not necessarily contain the same patients, given the different image acquisitions for the different tasks.
Data was collected at 3 Dutch university medical centers:
-
Radboud University Medical Center
-
University Medical Center Utrecht
-
University Medical Center Groningen
For anonymization purposes, from here on, institution names are substituted with A, B and C, without specifying which institute each letter refers to.
The following number of patients is available in the training set.
Training
Brain |
Pelvis |
|||||||
Center A |
Center B |
Center C |
Total |
Center A |
Center B |
Center C |
Total |
|
Task 1 |
60 |
60 |
60 |
180 |
120 |
0 |
60 |
180 |
Task 2 |
60 |
60 |
60 |
180 |
60 |
60 |
60 |
180 |
Each subset generally contains equal amounts of patients from each center, except for task 1 brain, where center B had no MR scans available. To compensate for this, center A provided twice the number of patients than in other subsets.
Validation
Brain |
Pelvis |
|||||||
Center A |
Center B |
Center C |
Total |
Center A |
Center B |
Center C |
Total |
|
Task 1 |
10 |
10 |
10 |
30 |
20 |
0 |
10 |
30 |
Task 2 |
10 |
10 |
10 |
30 |
10 |
10 |
10 |
30 |
Testing
Brain |
Pelvis |
|||||||
Center A |
Center B |
Center C |
Total |
Center A |
Center B |
Center C |
Total |
|
Task 1 |
20 |
20 |
20 |
60 |
40 |
0 |
20 |
60 |
Task 2 |
20 |
20 |
20 |
60 |
20 |
20 |
20 |
60 |
In total, for all tasks and anatomies combined, 1080 image pairs (720 training, 120 validation, 240 testing) are available in this dataset. This repository only contains the training data.
All images were acquired with the clinically used scanners and imaging protocols of the respective centers and reflect typical images found in clinical routine. As a result, imaging protocols and scanner can vary between patients. A detailed description of the imaging protocol for each image, can be found in spreadsheets that are part of the dataset release (see dataset structure).
Data was acquired with the following scanners:
-
Center A:
-
MRI: Philips Ingenia 1.5T/3.0T
-
CT: Philips Brilliance Big Bore or Siemens Biograph20 PET-CT
-
CBCT: Elekta XVI
-
-
Center B:
-
MRI: Siemens MAGNETOM Aera 1.5T or MAGNETOM Avanto_fit 1.5T
-
CT: Siemens SOMATOM Definition AS
-
CBCT: IBA Proteus+ or Elekta XVI
-
-
Center C:
-
MRI: Siemens Avanto fit 1.5T or Siemens MAGNETOM Vida fit 3.0T
-
CT: Philips Brilliance Big Bore
-
CBCT: Elekta XVI
-
For task 1, MRIs were acquired with a T1-weighted gradient echo or an inversion prepared - turbo field echo (TFE) sequence and collected along with the corresponding planning CTs for all subjects. The exact acquisition parameters vary between patients and centers. For centers B and C, selected MRIs were acquired with Gadolinium contrast, while the selected MRIs of center A were acquired without contrast.
For task 2, the CBCTs used for image-guided radiotherapy ensuring accurate patient position were selected for all subjects along with the corresponding planning CT.
The following pre-processing steps were performed on the data:
-
Conversion from dicom to compressed nifti (nii.gz)
-
Rigid registration between CT and MR/CBCT
-
Anonymization (face removal, only for brain patients)
-
Patient outline segmentation (provided as a binary mask)
-
Crop MR/CBCT, CT and mask to remove background and reduce file sizes
The code used to preprocess the images can be found at: https://github.com/SynthRAD2023/. Detailed information about the dataset are provided in SynthRAD2023_dataset_description.pdf published here along with the data and will also be submitted to Medical Physics.
ETHICAL APPROVAL
Each institution received ethical approval from their internal review board/Medical Ethical committee:
-
UMC Utrecht approved not-WMO on 4/03/2022 with number 22/474 entitled: “Synthetizing computed tomography for radiotherapy Grand Challenge (SynthRAD)”.
-
UMC Groningen approved not-WMO on 20/07/2022 with number 202200310 entitled: “Synthesizing computed tomography for radiotherapy - Grand Challenge”.
-
Radboud UMC declared the study not-WMO on 17/10/2022 with number 2022-15950 entitled “Synthetizing computed tomography for radiotherapy Grand Challenge”.
CHALLENGE DESIGN
The overall challenge design can be found at https://doi.org/10.5281/zenodo.7746020.
Notes
Files
synthRAD2023_dataset_article.pdf
Additional details
Related works
- Cites
- Software: https://github.com/SynthRAD2023/ (URL)
- Other: https://ewuu.nl/en/collaboration/seed-fund/ (URL)
- Other: 10.5281/zenodo.7746020 (DOI)
- Is cited by
- Other: https://synthrad2023.grand-challenge.org/ (URL)
- Software: https://github.com/SynthRAD2023/ (URL)