There is a newer version of the record available.

Published April 28, 2021 | Version v-1
Dataset Restricted

NODE21

Description

We provide a NODE21 public CXR training dataset. This dataset consists of 1134 frontal chest radiographs with annotated bounding boxes around nodules (1476 nodules). The images in this set are from public datasets that allow us to remix and redistribute. The images from this dataset come from the following sources:

  • JSRT [1]
  • PadChest [2]
  • Chestx-ray14 [3]
  • Open-I [4]

 The annotations were taken from the data sources and checked by our chest radiologists, or they were provided by our chest radiologists. We also provide a list with links to the original identifiers from the source datasets in case you would want to work with the original data. This dataset is named training_data.zip. 

training_data.zip file contains a folder called images and a CSV file named metadata.csvimages folder contains 1134 chest X-ray images resized to 1024 x 1024 and metadata.csv contains the bounding box locations associated with each image file.

Additionally, for the generation track, we provide a public set of NODE21 CT patches (see luna16.zip). These are patches of nodules from CT scans. The patches are 50 x 50 x 50 mm resampled to voxels of 1 x 1 x 1 mm. The patches originate from the LUNA16 dataset [5][6]. These patches can be used to create artificial nodules in given chest radiographs.

 

[1] Shiraishi, J., Katsuragawa, S., Ikezoe, J., Matsumoto, T., Kobayashi, T., Komatsu, K.i., Matsui, M., Fujita, H., Kodera, Y., Doi, K., 2000. Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists’ detection of pulmonary nodules. American Journal of Roentgenology 174, 71–74. doi:10.2214/ajr.174.1.1740071.

[2] Bustos, A., Pertusa, A., Salinas, J.M., de la Iglesia-Vaya, M., 2020. PadChest: ´ A large chest x-ray image dataset with multi-label annotated reports. Medical Image Analysis 66, 101797. doi:10.1016/j.media.2020.101797.

[3] Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M., 2017b. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases, in IEEE Conference on Computer Vision and Pattern Recognition, pp. 2097–2106. doi:10.1109/cvpr.2017.369.

[4] Demner-Fushman, D., Antani, S., Simpson, M., Thoma, G.R., 2012. Design and Development of a Multimodal Biomedical Information Retrieval System. Journal of Computing Science and Engineering 6, 168–177. doi:10.5626/JCSE.2012.6.2.168.

[5] Andrey Fedorov, Matthew Hancock, David Clunie, Mathias Brochhausen, Jonathan Bona, Justin Kirby, John Freymann, Steve Pieper, Hugo Aerts, Ron Kikinis1, Fred Prior, 2019. Standardized representation of the LIDC annotations using DICOM. The Cancer Imaging Archive. doi: 10.7937/TCIA.2018.H7UMFURQ

[6] Setio et al., Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images:: The LUNA16 challenge, Medical Image Analysis 42, doi:: 10.1016/j.media.2017.06.015

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

References

  • Shiraishi, J., Katsuragawa, S., Ikezoe, J., Matsumoto, T., Kobayashi, T., Komatsu, K.i., Matsui, M., Fujita, H., Kodera, Y., Doi, K., 2000. Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists' detection of pulmonary nodules. American Journal of Roentgenology 174, 71–74. doi:10.2214/ajr.174.1.1740071
  • Bustos, A., Pertusa, A., Salinas, J.M., de la Iglesia-Vaya, M., 2020. PadChest: ´ A large chest x-ray image dataset with multi-label annotated reports. Medical Image Analysis 66, 101797. doi:10.1016/j.media.2020.101797.
  • Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M., 2017b. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases, in: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2097–2106. doi:10.1109/cvpr.2017.369.
  • Demner-Fushman, D., Antani, S., Simpson, M., Thoma, G.R., 2012. Design and Development of a Multimodal Biomedical Information Retrieval System. Journal of Computing Science and Engineering 6, 168–177. doi:10.5626/JCSE.2012.6.2.168.
  • Andrey Fedorov, Matthew Hancock, David Clunie, Mathias Brochhausen, Jonathan Bona, Justin Kirby, John Freymann, Steve Pieper, Hugo Aerts, Ron Kikinis1, Fred Prior, 2019. Standardized representation of the LIDC annotations using DICOM. The Cancer Imaging Archive. doi: 10.7937/TCIA.2018.H7UMFURQ.
  • Andrey Fedorov, Matthew Hancock, David Clunie, Mathias Brochhausen, Jonathan Bona, Justin Kirby, John Freymann, Steve Pieper, Hugo Aerts, Ron Kikinis1, Fred Prior, 2019. Standardized representation of the LIDC annotations using DICOM. The Cancer Imaging Archive. doi: 10.7937/TCIA.2018.H7UMFURQ
  • Setio et al., Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images:: The LUNA16 challenge, Medical Image Analysis 42, doi:: 10.1016/j.media.2017.06.015