Published August 26, 2024 | Version v1
Dataset Open

DICOM converted Slide Microscopy images for the Cancer Moonshot Biobank initiative collections

  • 1. ROR icon Institute for Systems Biology
  • 2. ROR icon General Dynamics (United States)
  • 3. ROR icon Frederick National Laboratory for Cancer Research
  • 4. ROR icon National Cancer Institute
  • 5. Brigham and Women's Hospital Department of Radiology

Description

This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

Collection description

The Cancer Moonshot Biobank (CMB) is a National Cancer Institute initiative to support current and future investigations into drug resistance and sensitivity and other NCI-sponsored cancer research initiatives, with an aim of improving researchers' understanding of cancer and how to intervene in cancer initiation and progression. During the course of this study, biospecimens (blood and tissue removed during medical procedures) and associated data will be collected longitudinally from at least 1000 patients across at least 10 cancer types, who represent the demographic diversity of the U.S. and receiving standard of care cancer treatment at multiple NCI Community Oncology Research Program (NCORP) sites.

CMB program is organized into multiple cancer-specific collections. Digital pathology images for each of those collections were converted into DICOM representation by the IDC team and are shared via IDC.

1. CMB-AML (acute myeloid leukemia cancer)
2. CMB-CRC (colorectal cancer)
3. CMB-GEC (gastroesophageal cancer)
4. CMB-LCA (lung cancer)
5. CMB-MEL (melanoma)
6. CMB-MEL (multiple myeloma)
7. CMB-PCA (prostate cancer)

Digital pathology images, augmented with the metadata describing their content, were converted into DICOM Whole Slide Microscopy (SM) representation [2,3] using custom open source scripts and tools as described in [4]. 

Files included

A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

For each of the collections, the following manifest files are provided:

  1. <collection_id>-idc_v19-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
  2. <collection_id>-idc_v19-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
  3. <collection_id>-idc_v19-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

Download instructions

Each of the manifests include instructions in the header on how to download the included files.

To download the files using .s5cmd manifests:

  1. install idc-index package: pip install --upgrade idc-index
  2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd

To download the files using .dcf manifest, see manifest header.

Acknowledgments

Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

References

[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023).

[2] National Electrical Manufacturers Association (NEMA). DICOM PS3.3 - Information Object Definitions: A.32.8 VL Whole Slide Microscopy Image IOD. at <https://dicom.nema.org/medical/dicom/current/output/html/part03.html#sect_A.32.8>

[3] Herrmann, M. D., Clunie, D. A., Fedorov, A., Doyle, S. W., Pieper, S., Klepeis, V., Le, L. P., Mutter, G. L., Milstone, D. S., Schultz, T. J., Kikinis, R., Kotecha, G. K., Hwang, D. H., Andriole, K. P., John Lafrate, A., Brink, J. A., Boland, G. W., Dreyer, K. J., Michalski, M., Golden, J. A., Louis, D. N. & Lennerz, J. K. Implementing the DICOM standard for digital pathology. J. Pathol. Inform. 9, 37 (2018).

[4] Clunie, D., Fedorov, A. & Herrmann, M. D. ImagingDataCommons/idc-wsi-conversion: Initial release. (Zenodo, 2023). doi:10.5281/ZENODO.8240154

Files

Files (218.9 kB)

Name Size Download all
md5:69a322a1f25b2191f58d2aade5f75470
3.1 kB Download
md5:e04357dbd5ca686b5f5c36952bde68f1
9.7 kB Download
md5:4ae4428f6e1249833c5ea875b67bc068
3.4 kB Download
md5:b5266a1f52eae37fcda8fe2e7e1dea59
8.7 kB Download
md5:7e164802a12006bd64bbaa6cb7d64bda
28.3 kB Download
md5:1d2075986bd53d623dea29e46f5b23b2
9.5 kB Download
md5:6b36ec8dbf1032036ed0cc92b544ff88
1.0 kB Download
md5:dc95a56fd77d0bfc906579d6f4bb5f58
2.6 kB Download
md5:89b9dd173df889ecc91d0e20655f618d
1.1 kB Download
md5:0e240a849146e429d145a5279d9366d5
5.9 kB Download
md5:db8fae68b859ebce5d35a6f3d91c8059
19.5 kB Download
md5:e19bdfed8a4c71dced2e18e925b22198
6.5 kB Download
md5:1764ba2a86ce8a5e8bfef189bdd97249
5.8 kB Download
md5:df1b6c553ab3eb647651585a01d9563d
20.3 kB Download
md5:62c9c1a726821a740de9f5fad6b9f9b5
6.3 kB Download
md5:ce51bf97bd22c8e3d77f529d56a8b841
14.1 kB Download
md5:0d8b92c61f63c41599f2439deea59c46
51.6 kB Download
md5:182f621bc20b1f4ca2cd0dd55dc84c9c
15.4 kB Download
md5:242edd8f1ee5271828fff4327f29f264
1.2 kB Download
md5:8b7c56c56e44a4b70cac101f0a688710
3.7 kB Download
md5:f268a97d96406121d501f0d605817b00
1.3 kB Download

Additional details