Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published November 25, 2024 | Version v2
Dataset Open

Pan-Cancer-Nuclei-Seg-DICOM: DICOM converted Dataset of Segmented Nuclei in Hematoxylin and Eosin Stained Histopathology Images

  • 1. ROR icon Massachusetts General Hospital
  • 2. PixelMed Publishing
  • 3. Brigham and Women's Hospital Department of Radiology

Description

This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: Pan-Cancer-Nuclei-Seg-DICOM. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

Collection description

This collection contains automatic nucleus segmentation data of 5,060 whole slide tissue images of 10 cancer types earlier published in [2] (https://doi.org/10.7937/TCIA.2019.4A4DKP9U) stored in DICOM Bulk Annotation and DICOM Segmentation formats.
 
DICOM Bulk Annotation nuclei annotations are stored as closed polygons along with the area of each nuclei. DICOM Segmentation version contains binary segmentations obtained by rasterizing the polygon contours. 
 
The annotations correspond to digital pathology images from the TCGA-BLCA,TCGA-BRCA,TCGA-CESC,TCGA-COAD,TCGA-GBM,TCGA-LUAD,TCGA-LUSC,TCGA-PAAD,TCGA-PRAD,TCGA-READ,TCGA-SKCM,TCGA-STAD,TCGA-UCEC,TCGA-UVM collections available in NCI Imaging Data Commons.
 
To learn how these files are organized and how to access the content programmatically, see this documentation page: https://highdicom.readthedocs.io/en/latest/ann.html.
 
Conversion of the nuclei segmentations from the original format into DICOM ANN and SEG representations was done using the code available in 10.5281/zenodo.10632181.

Files included

A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, pan_cancer_nuclei_seg_dicom-collection_id-idc_v19-aws.s5cmd corresponds to the annotations for th eimages in the collection_id collection introduced in IDC data release v19. DICOM Binary segmentations were introduced in IDC v20. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

For each of the collections, the following manifest files are provided:

  1. pan_cancer_nuclei_seg_dicom-<collection_id>-idc_v20-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
  2. pan_cancer_nuclei_seg_dicom-<collection_id>-idc_v20-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
  3. pan_cancer_nuclei_seg_dicom-<collection_id>-idc_v20-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

Download instructions

Each of the manifests include instructions in the header on how to download the included files.

To download the files using .s5cmd manifests:

  1. install idc-index package: pip install --upgrade idc-index
  2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd

To download the files using .dcf manifest, see manifest header.

Acknowledgments

Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

References

[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023).
 
[2] Hou, L., Gupta, R., Van Arnam, J. S., Zhang, Y., Sivalenka, K., Samaras, D., Kurc, T., & Saltz, J. H. (2019). Dataset of Segmented Nuclei in Hematoxylin and Eosin Stained Histopathology Images of 10 Cancer Types [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/TCIA.2019.4A4DKP9U

Files

pan_cancer_nuclei_seg_dicom-tcga_blca-idc_v20-dcf.csv

Files (3.1 MB)

Name Size Download all
md5:245b6059e99e95e6e501efd2336d1640
48.9 kB Download
md5:ed369d9a05a68332c659ebdadcdba633
102.2 kB Preview Download
md5:a54c39943f0cd1e98e052fc15ee9063b
48.9 kB Download
md5:1ddf2df463118ecfca6d9e610afbf54d
139.5 kB Download
md5:27e7f1acf478cdab847bde4ab2746c92
288.3 kB Preview Download
md5:5856acddee9bd0043790014e5aa5d749
139.5 kB Download
md5:7cfa1a119d12fe4de6dd9e81bc06455e
32.3 kB Download
md5:266f558bdc6ff2937e21ccd4c8193da1
65.2 kB Preview Download
md5:87fa650692b0d712e4eb24ef0351094b
32.3 kB Download
md5:0dee2887b9ba9d9ef5a5d018dd063b80
57.6 kB Download
md5:09f520fa8e03c84831ad73551924df76
117.6 kB Preview Download
md5:c0f095a7f6c1380691f7bb8879f559f3
57.6 kB Download
md5:2b9e473ab59147f157366dd2a90a9e14
99.1 kB Download
md5:45518133831e37b8e158593ad9ae0c45
180.3 kB Preview Download
md5:ce1bf3b5bcca684cae6123d4f5aefc49
99.1 kB Download
md5:deab7101c07708d337cb76a782385339
70.9 kB Download
md5:26e2a8cfc07f07f2ffcfe5c59b93c3ff
143.3 kB Preview Download
md5:24cb867fc8d76ed7a987fa4017fb57b6
70.9 kB Download
md5:4df6d63013a052314f240f03acb866ca
55.5 kB Download
md5:4a1ec7b6199483d1417ab94d1376f5aa
111.7 kB Preview Download
md5:cb88f975947acd8607a6525a32669392
55.5 kB Download
md5:2d6b31e6ed948789b39a01768d225162
24.7 kB Download
md5:85f5c86bf0d560a44c5dd3c5a86e4709
50.3 kB Preview Download
md5:c11d136d77ebcf6c40161af6ae75064a
24.7 kB Download
md5:653a953934616bfb52b8b5357ef8089d
50.0 kB Download
md5:9190b69516d42106c8d9b79f3f0d670f
102.1 kB Preview Download
md5:f35229fefa7140f5879d63e351f27dea
50.0 kB Download
md5:28a4fdaaaf48cb2f8881b9e10253a63b
21.7 kB Download
md5:bca2705a987521af7df4fa4682b99c43
43.6 kB Preview Download
md5:fc2ee65e9987504b83909f1a03bc3277
21.7 kB Download
md5:c493aecb68c21bf23ce2d443d06308fb
57.6 kB Download
md5:4285bdd6da0f578714d2fa0bb34ef4fe
117.0 kB Preview Download
md5:ad1d5adc8ae5838e5352e1f206c5454e
57.6 kB Download
md5:49ee7451a6ba263e5f58bed77381cdce
47.5 kB Download
md5:301ce0c5228444556f69dae19b403627
97.2 kB Preview Download
md5:c88b172fc55a541a5d01ad0c9b102407
47.5 kB Download
md5:51ed6ced529c2a9049c8802a1978e1a9
69.7 kB Download
md5:51ea80b5e54e5900d1fc842fc0510e04
143.4 kB Preview Download
md5:7054d595f5d9bff1cdfccbe51662ad78
69.7 kB Download
md5:b6a77170bceb517e27372293bc7dbd2d
8.4 kB Download
md5:e6675d0a23c3c12ed27f5137c00ccc11
16.7 kB Preview Download
md5:dd1aa1e95a84ea8db7311ff597f6bd56
8.4 kB Download

Additional details

Related works

Cites
10.1148/rg.230180 (DOI)
Is derived from
10.7937/TCIA.2019.4A4DKP9U (DOI)
Is supplemented by
Software: 10.5281/zenodo.10632181 (DOI)