Dataset Open Access
Leuschner, Johannes; Schmidt, Maximilian; Otero Baguer, Daniel
A Benchmark Dataset for Low-Dose CT Reconstruction Methods.
The following paper provides full documentation:
Leuschner, Johannes and Schmidt, Maximilian and Otero Baguer, Daniel and Maaß, Peter. "The LoDoPaB-CT Dataset: A Benchmark Dataset for Low-Dose CT Methods". 2019. arXiv:1910.01113
The python library DIVαℓ (github.com/jleuschn/dival) can be used to download and access the dataset.
Reconstructions from the LIDC/IDRI dataset are used as a basis for this dataset.
The ZIP files included in the LoDoPaB dataset contain multiple HDF5 files. Each HDF5 file contains one HDF5 dataset named
"data", that provides a number of samples (128 except for the last file in each ZIP file). For example, the
n-th training sample pair is stored in the files
floor(n / 128), at row
(n mod 128) of
Note: each last ground truth file (i.e.
ground_truth_test_027.hdf5) still contains a HDF5 dataset of shape (128, 362, 362), although it contains less than 128 valid samples. Thus, the number of valid samples needs to be determined from the total samples numbers in the part (i.e. "train": 35820, "validation": 3522, "test": 3553), or from the corresponding observation file, for which the first dimension of the HDF5 dataset matches the number of valid samples in the file.
The randomized patient IDs of the samples are provided as CSV files. The patient IDs of the train, validation and test parts are integers in the range of 0–631, 632–691 and 692–751, respectively. The ID of each sample is stored in a single row.