Published November 25, 2024 | Version v2
Dataset Open

CCDI-MCI: DICOM converted whole slide hematoxylin and eosin stained images from the Molecular Characterization Initiative of the National Cancer Institute's Childhood Cancer Data Initiative

  • 1. PixelMed Publishing
  • 2. ROR icon Seattle Children's Hospital
  • 3. ROR icon National Institutes of Health
  • 4. ROR icon Indiana University
  • 5. ROR icon Nationwide Children's Hospital
  • 6. ROR icon Baylor College of Medicine
  • 7. ROR icon Children's Hospital of Philadelphia
  • 8. UT Southwestern
  • 9. Sick Kids
  • 10. ROR icon Institute for Systems Biology
  • 11. ROR icon General Dynamics (United States)
  • 12. ROR icon Frederick National Laboratory for Cancer Research
  • 13. ROR icon National Cancer Institute
  • 14. Brigham and Women's Hospital Department of Radiology

Description

This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: CCDI-MCI. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

Collection description

The Molecular Characterization Initiative (MCI) [2] is a component of the National Cancer Institute’s (NCI) Childhood Cancer Data Initiative (CCDI). It offers state-of-the-art molecular testing at no cost to newly diagnosed children, adolescents, and young adults (AYAs) with central nervous system (CNS) tumors, soft tissue sarcomas (STS), certain rare childhood cancers (RAR), and certain neuroblastomas (NBL) treated at a Children’s Oncology Group (COG)–affiliated hospital. The goal of MCI is to enhance the understanding of genetic factors in pediatric cancers and to provide timely, clinically relevant findings to doctors and families to aid in treatment decisions and determine eligibility for certain planned COG clinical trials.

The original images in vendor-specific format were collected on IRB-approved clinical trials or tissue banking studies from Children’s Oncology Group (COG) patients enrolled in EveryChild APEC14B1 protocol. 

Those images, augmented with the metadata describing their content, were provided to the IDC team for the purposes of archival, and were converted into DICOM Whole Slide Microscopy (SM) representation [3,4] using custom open source scripts and tools as described in [5]. The resulting converted images were initially released in IDC in the CCDI-MCI collection with the IDC data release v19, with the updates to the content in v20.

To learn how to access related clinical and genomic data accompanying this collection please see the CCDI-MCI page and CCDI Hub.

Files included

A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

  1. ccdi_mci-idc_v20-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
  2. ccdi_mci-idc_v20-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
  3. ccdi_mci-idc_v20-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

Download instructions

Each of the manifests include instructions in the header on how to download the included files.

To download the files using .s5cmd manifests:

  1. install idc-index package: pip install --upgrade idc-index
  2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd

To download the files using .dcf manifest, see manifest header.

Acknowledgments

Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

References

[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023).

[2] https://www.cancer.gov/research/areas/childhood/childhood-cancer-data-initiative/programs/molecular-characterization

[3] National Electrical Manufacturers Association (NEMA). DICOM PS3.3 - Information Object Definitions: A.32.8 VL Whole Slide Microscopy Image IOD. at <https://dicom.nema.org/medical/dicom/current/output/html/part03.html#sect_A.32.8>

[4] Herrmann, M. D., Clunie, D. A., Fedorov, A., Doyle, S. W., Pieper, S., Klepeis, V., Le, L. P., Mutter, G. L., Milstone, D. S., Schultz, T. J., Kikinis, R., Kotecha, G. K., Hwang, D. H., Andriole, K. P., John Lafrate, A., Brink, J. A., Boland, G. W., Dreyer, K. J., Michalski, M., Golden, J. A., Louis, D. N. & Lennerz, J. K. Implementing the DICOM standard for digital pathology. J. Pathol. Inform. 9, 37 (2018).

[5] Clunie, D., Fedorov, A. & Herrmann, M. D. ImagingDataCommons/idc-wsi-conversion: Initial release. (Zenodo, 2023). doi:10.5281/ZENODO.8240154

Files

ccdi_mci-idc_v20-dcf.csv

Files (638.1 kB)

Name Size Download all
md5:aa43012e9ff1525d411509f470f7f9b1
115.8 kB Download
md5:c7a5c4d60a6fe32a539b29b804b2812c
406.4 kB Preview Download
md5:04717d38cdbcd87692690673ca0a8868
115.8 kB Download

Additional details

Related works

Cites
10.1148/rg.230180 (DOI)
Is published in
10.25504/FAIRsharing.0b5a1d (DOI)