Published April 1, 2023 | Version 0.1
Dataset Open

SynthRAD2023 Grand Challenge dataset: synthetizing computed tomography for radiotherapy

  • 1. UMC Groningen
  • 2. Radboud UMC
  • 3. UMC Utrecht
  • 1. UMC Groningen
  • 2. Radboud UMC
  • 3. UMC Utrecht

Description

DATASET STRUCTURE

The dataset can be downloaded from https://doi.org/10.5281/zenodo.7260705 and a detailed description is offered at "synthRAD2023_dataset_description.pdf".

The training datasets for Task1 is in Task1.zip, while for Task2 in Task2.zip. After unzipping, each Task is organized according to the following folder structure:

Task1.zip/

├── Task1

   ├── brain

    ├── 1Bxxxx

       ├── mr.nii.gz

       ├── ct.nii.gz

       └── mask.nii.gz

    ├── ...

└── overview

       ├── 1_brain_train.xlsx

       ├── 1Bxxxx_train.png

      └── ...    

 └── pelvis

    ├── 1Pxxxx

       ├── mr.nii.gz

       ├── ct.nii.gz

       ├── mask.nii.gz

   ├── ...

└── overview

       ├── 1_pelvis_train.xlsx

       ├── 1Pxxxx_train.png

       └── ....

Task2.zip/

├──Task2

   ├── brain

    ├── 2Bxxxx

       ├── cbct.nii.gz

       ├── ct.nii.gz

       └── mask.nii.gz

    ├── ...

└── overview

       ├── 2_brain_train.xlsx

       ├── 2Bxxxx_train.png

      └── ...    

   └── pelvis

    ├── 2Pxxxx

       ├── cbct.nii.gz

       ├── ct.nii.gz

       ├── mask.nii.gz

├── ...

└── overview

      ├── 2_pelvis_train.xlsx

      ├── 2Pxxxx_train.png

      └── ....

Each patient folder has a unique name that contains information about the task, anatomy, center and a patient ID. The naming follows the convention below:

[Task]    [Anatomy]    [Center]    [PatientID]

1    B        A        001

In each patient folder, three files can be found: 

  • ct.nii.gz: CT image 

  • mr.nii.gz or cbct.nii.gz (depending on the task): CBCT/MR image

  • mask.nii.gz:image containing a binary mask of the dilated patient outline 

For each task and anatomy, an overview folder is provided which contains the following files:

  • [task]_[anatomy]_train.xlsx: This file contains information about the image acquisition protocol for each patient.

  • [task][anatomy][center][PatientID]_train.png: For each patient a png showing axial, coronal and sagittal slices of CBCT/MR, CT, mask and the difference between CBCT/MR and CT is provided. These images are meant to provide a quick visual overview of the data.

DATASET DESCRIPTION

This challenge dataset contains imaging data of patients who underwent radiotherapy in the brain or pelvis region. Overall, the population is predominantly adult and no gender restrictions were considered during data collection. For Task 1, the inclusion criteria were the acquisition of a CT and MRI during treatment planning while for task 2, acquisitions of a CT and CBCT, used for patient positioning, were required. Datasets for task 1 and 2 do not necessarily contain the same patients, given the different image acquisitions for the different tasks.

Data was collected at 3 Dutch university medical centers:

  • Radboud University Medical Center

  • University Medical Center Utrecht

  • University Medical Center Groningen

For anonymization purposes, from here on, institution names are substituted with A, B and C, without specifying which institute each letter refers to.

The following number of patients is available in the training set.

Training

 

Brain

Pelvis

 

Center A

Center B

Center C

Total

Center A

Center B

Center C

Total

Task 1

60

60

60

180

120

0

60

180

Task 2

60

60

60

180

60

60

60

180

Each subset generally contains equal amounts of patients from each center, except for task 1 brain, where center B had no MR scans available. To compensate for this, center A provided twice the number of patients than in other subsets.

Validation

 

Brain

Pelvis

 

Center A

Center B

Center C

Total

Center A

Center B

Center C

Total

Task 1

10

10

10

30

20

0

10

30

Task 2

10

10

10

30

10

10

10

30

Testing

 

Brain

Pelvis

 

Center A

Center B

Center C

Total

Center A

Center B

Center C

Total

Task 1

20

20

20

60

40

0

20

60

Task 2

20

20

20

60

20

20

20

60

In total, for all tasks and anatomies combined, 1080 image pairs (720 training, 120 validation, 240 testing) are available in this dataset. This repository only contains the training data.

All images were acquired with the clinically used scanners and imaging protocols of the respective centers and reflect typical images found in clinical routine. As a result, imaging protocols and scanner can vary between patients. A detailed description of the imaging protocol for each image, can be found in spreadsheets that are part of the dataset release (see dataset structure).

Data was acquired with the following scanners:

  • Center A:

    • MRI: Philips Ingenia 1.5T/3.0T

    • CT: Philips Brilliance Big Bore or Siemens Biograph20 PET-CT

    • CBCT: Elekta XVI

  • Center B:

    • MRI: Siemens MAGNETOM Aera 1.5T or MAGNETOM Avanto_fit 1.5T

    • CT: Siemens SOMATOM Definition AS

    • CBCT: IBA Proteus+ or Elekta XVI

  • Center C:

    • MRI: Siemens Avanto fit 1.5T or Siemens MAGNETOM Vida fit 3.0T

    • CT: Philips Brilliance Big Bore

    • CBCT: Elekta XVI

For task 1, MRIs were acquired with a T1-weighted gradient echo or an inversion prepared - turbo field echo (TFE) sequence and collected along with the corresponding planning CTs for all subjects. The exact acquisition parameters vary between patients and centers. For centers B and C, selected MRIs were acquired with Gadolinium contrast, while the selected MRIs of center A were acquired without contrast.

For task 2, the CBCTs used for image-guided radiotherapy ensuring accurate patient position were selected for all subjects along with the corresponding planning CT.

The following pre-processing steps were performed on the data:

  • Conversion from dicom to compressed nifti (nii.gz)

  • Rigid registration between CT and MR/CBCT

  • Anonymization (face removal, only for brain patients)

  • Patient outline segmentation (provided as a binary mask)

  • Crop MR/CBCT, CT and mask to remove background and reduce file sizes

The code used to preprocess the images can be found at: https://github.com/SynthRAD2023/. Detailed information about the dataset are provided in SynthRAD2023_dataset_description.pdf published here along with the data and will also be submitted to Medical Physics.

ETHICAL APPROVAL

Each institution received ethical approval from their internal review board/Medical Ethical committee:

  • UMC Utrecht approved not-WMO on 4/03/2022 with number 22/474 entitled: “Synthetizing computed tomography for radiotherapy Grand Challenge (SynthRAD)”.

  • UMC Groningen approved not-WMO on 20/07/2022 with number 202200310 entitled: “Synthesizing computed tomography for radiotherapy - Grand Challenge”.

  • Radboud UMC declared the study not-WMO on 17/10/2022 with number 2022-15950 entitled “Synthetizing computed tomography for radiotherapy Grand Challenge”.

CHALLENGE DESIGN

The overall challenge design can be found at https://doi.org/10.5281/zenodo.7746020

Notes

FUNDING BODIES: The challenge has been funded thanks to the support of the Seed Fund provided by the " EWUU Alliance TU/e, WUR, UU, UMCU" https://ewuu.nl/en/collaboration/seed-fund/.

Files

synthRAD2023_dataset_article.pdf

Files (25.4 GB)

Name Size Download all
md5:90e596aa18fafba1c34f6376644bdb9c
2.0 MB Preview Download
md5:360bc61e2320d5c5a7145454a81c566a
14.5 GB Preview Download
md5:6d6d90d9851cc424356ef0c0564c92c5
10.9 GB Preview Download

Additional details