LoDoPaB-CT Dataset

Leuschner, Johannes; Schmidt, Maximilian; Otero Baguer, Daniel

doi:10.5281/zenodo.3384092

Published October 4, 2019 | Version 1.0.0

Dataset Open

LoDoPaB-CT Dataset

1. University of Bremen

A Benchmark Dataset for Low-Dose CT Reconstruction Methods.

The following Data Descriptor article provides full documentation:

Leuschner, J., Schmidt, M., Baguer, D.O. et al. LoDoPaB-CT, a benchmark dataset for low-dose computed tomography reconstruction. Sci Data 8, 109 (2021). https://www.nature.com/articles/s41597-021-00893-z

The python library DIVαℓ (github.com/jleuschn/dival) can be used to download and access the dataset.

Reconstructions from the LIDC/IDRI dataset are used as a basis for this dataset.

The ZIP files included in the LoDoPaB dataset contain multiple HDF5 files. Each HDF5 file contains one HDF5 dataset named "data", that provides a number of samples (128 except for the last file in each ZIP file). For example, the n-th training sample pair is stored in the files "observation_train_%03d.hdf5" and "ground_truth_train_%03d.hdf5" where "%03d" is floor(n / 128), at row (n mod 128) of "data".

Note: each last ground truth file (i.e. ground_truth_train_279.hdf5, ground_truth_validation_027.hdf5 and ground_truth_test_027.hdf5) still contains a HDF5 dataset of shape (128, 362, 362), although it contains less than 128 valid samples. Thus, the number of valid samples needs to be determined from the total samples numbers in the part (i.e. "train": 35820, "validation": 3522, "test": 3553), or from the corresponding observation file, for which the first dimension of the HDF5 dataset matches the number of valid samples in the file.

The randomized patient IDs of the samples are provided as CSV files. The patient IDs of the train, validation and test parts are integers in the range of 0–631, 632–691 and 692–751, respectively. The ID of each sample is stored in a single row.

Acknowledgements

Johannes Leuschner, Maximilian Schmidt and Daniel Otero Baguer acknowledge the support by the Deutsche
Forschungsgemeinschaft (DFG) within the framework of GRK 2224/1 “π3: Parameter Identification – Analysis,
Algorithms, Applications”. We thank Simon Arridge, Ozan Öktem, Carola-Bibiane Schönlieb and Christian
Etmann for the fruitful discussion about the procedure, and Felix Lucka and Jonas Adler for their ideas and
helpful feedback on the simulation setup. The authors acknowledge the National Cancer Institute and the Foundation for the National Institutes of Health, and their critical role in the creation of the free publicly available LIDC/IDRI Database used in this study.

Files

ground_truth_test.zip

Files (55.0 GB)

Name	Size	Download all
ground_truth_test.zip md5:ecc655767fbe3d40908ca823921f4c7f	1.6 GB	Preview Download
ground_truth_train.zip md5:f06829ccf2b9bb817abd093ce490b2c7	15.9 GB	Preview Download
ground_truth_validation.zip md5:666c36f403734842f14ca8811a63b8f7	1.6 GB	Preview Download
observation_test.zip md5:9ae6b053bb1faa94d573311af8ec67b2	3.0 GB	Preview Download
observation_train.zip md5:c4fde721ac5862469812de49f98fb0a3	29.9 GB	Preview Download
observation_validation.zip md5:3cff406e09c59774912655eb7a72cfcf	2.9 GB	Preview Download
patient_ids_rand_test.csv md5:e86068312ad8a039e03f7f929352f7fd	14.2 kB	Preview Download
patient_ids_rand_train.csv md5:4fee01b076920f85fc92e0de774dc277	137.0 kB	Preview Download
patient_ids_rand_validation.csv md5:a387e619074f49573ae376c60a948db4	14.1 kB	Preview Download

Additional details

Is documented by: Preprint: arXiv:1910.01113 (arXiv)
Is supplemented by: Software: 10.5281/zenodo.3970517 (DOI); Software: 10.5281/zenodo.3957744 (DOI); Dataset: 10.5281/zenodo.3874937 (DOI)

	All versions	This version
Views	12,390	12,330
Downloads	20,589	20,518
Data volume	991.3 TB	990.4 TB

LoDoPaB-CT Dataset

Authors/Creators

Description

Files

ground_truth_test.zip

Files (55.0 GB)

Additional details

Related works