Dataset Open Access

LoDoPaB-CT Dataset

Leuschner, Johannes; Schmidt, Maximilian; Otero Baguer, Daniel

Citation Style Language JSON Export

  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.3384092", 
  "language": "eng", 
  "title": "LoDoPaB-CT Dataset", 
  "issued": {
    "date-parts": [
  "abstract": "<p>A Benchmark Dataset for Low-Dose CT Reconstruction Methods.</p>\n\n<p>The following Data Descriptor article provides full documentation:</p>\n\n<p>Leuschner, J., Schmidt, M., Baguer, D.O.&nbsp;<em>et al.</em>&nbsp;LoDoPaB-CT, a benchmark dataset for low-dose computed tomography reconstruction.&nbsp;<em>Sci Data</em>&nbsp;<strong>8,&nbsp;</strong>109 (2021). <a href=\"\"></a></p>\n\n<p>&nbsp;</p>\n\n<p>The python library DIV&alpha;\u2113 (<a href=\"\"></a>)&nbsp;can be used to download and access the dataset.</p>\n\n<p>Reconstructions from the&nbsp;<a href=\"\">LIDC/IDRI dataset</a>&nbsp;are used as a basis for this dataset.</p>\n\n<p>&nbsp;</p>\n\n<p>The ZIP files included in the LoDoPaB dataset contain multiple <a href=\"\">HDF5</a>&nbsp;files. Each HDF5 file contains one&nbsp;HDF5 dataset named <code>&quot;data&quot;</code>, that provides a number of&nbsp;samples (128 except for the last file in each ZIP file). For example, the <code>n</code>-th training sample pair&nbsp;is stored in the files <code>&quot;observation_train_%03d.hdf5&quot;</code> and <code>&quot;ground_truth_train_%03d.hdf5&quot;</code>&nbsp;where <code>&quot;%03d&quot;</code> is <code>floor(n / 128)</code>, at row <code>(n mod 128)</code>&nbsp;of <code>&quot;data&quot;</code>.</p>\n\n<p><em>Note:</em> each last ground truth file (i.e.&nbsp;<code>ground_truth_train_279.hdf5</code>,&nbsp;<code>ground_truth_validation_027.hdf5</code> and&nbsp;<code>ground_truth_test_027.hdf5</code>) still contains a HDF5 dataset of&nbsp;shape (128, 362, 362), although it contains&nbsp;less than 128 valid samples. Thus, the number of valid samples needs to&nbsp;be determined from the total samples numbers in the part (i.e. &quot;train&quot;: 35820, &quot;validation&quot;: 3522, &quot;test&quot;: 3553), or from the corresponding observation&nbsp;file, for&nbsp;which the first dimension of the HDF5 dataset matches&nbsp;the&nbsp;number of valid samples in the file.</p>\n\n<p>The randomized patient IDs of the&nbsp;samples are provided as&nbsp;CSV files. The patient IDs of the train, validation and test parts are integers in the range of 0&ndash;631, 632&ndash;691 and 692&ndash;751, respectively. The ID of each sample is stored in a single row.</p>\n\n<p><em>Acknowledgements</em></p>\n\n<p>Johannes Leuschner, Maximilian Schmidt and Daniel Otero Baguer acknowledge the support by the Deutsche<br>\nForschungsgemeinschaft (DFG) within the framework of GRK 2224/1 &ldquo;&pi;3: Parameter Identification &ndash; Analysis,<br>\nAlgorithms, Applications&rdquo;. We thank Simon Arridge, Ozan &Ouml;ktem, Carola-Bibiane Sch&ouml;nlieb and Christian<br>\nEtmann for the fruitful discussion about the procedure, and Felix Lucka and Jonas Adler for their ideas and<br>\nhelpful feedback on the simulation setup. The authors acknowledge the National Cancer Institute and the Foundation for the National Institutes of Health, and their critical role in the creation of the free publicly available LIDC/IDRI Database used in this study.</p>", 
  "author": [
      "family": "Leuschner, Johannes"
      "family": "Schmidt, Maximilian"
      "family": "Otero Baguer, Daniel"
  "version": "1.0.0", 
  "type": "dataset", 
  "id": "3384092"
All versions This version
Views 2,5982,598
Downloads 30,30830,308
Data volume 592.5 TB592.5 TB
Unique views 2,2532,253
Unique downloads 4,3124,312


Cite as