There is a newer version of the record available.

Published August 20, 2024 | Version v1
Dataset Open

DICOM converted Slide Microscopy images for the TCGA-CHOL collection

  • 1. PixelMed Publishing
  • 2. Institute for Systems Biology
  • 3. General Dynamics IT
  • 4. Frederick National Laboratory
  • 5. National Cancer Institute
  • 6. Brigham and Women's Hospital

Description

This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-CHOL. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

Collection description

Cholangiocarcinoma is a cancer that develops in the bile duct. The bile duct is a network of tubes that carry bile from the liver and gallbladder to the small intestine. Tumors that start in bile duct branches that lie inside the liver are called intrahepatic bile duct cancer, while those that form outside the liver are called extrahepatic bile duct cancer. About 10% of all cholangiocarcinoma are intrahepatic and 90% are extrahepatic. TCGA studied both subtypes of cholangiocarcinoma.

Although cholangiocarcinoma is a rare cancer, the incidence and mortality rates for the disease have been increasing worldwide in the last three decades. Between 2,000 and 3,000 Americans are diagnosed with cholangiocarcinoma each year, the majority of them with tumors at advanced stages. This cancer is more prevalent in Asia and the Middle East, where parasitic infection of the bile duct increases the risk of cholangiocarcinoma. Other diseases of the bile duct or liver, such as bile duct stones and liver disease, obesity, diabetes, and smoking are also risk factors. When intrahepatic and extrahepatic cholangiocarcinoma spread to other parts of the body, only 2% of patients survive five years after diagnosis.

Please see the TCGA-CHOL information page to learn more about the images and to obtain any supporting metadata for this collection.

Citation guidelines can be found on the Citing TCGA in Publications and Presentations information page.

Files included

A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

  1. tcga_chol-idc_v8-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets
  2. tcga_chol-idc_v8-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets
  3. tcga_chol-idc_v8-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

Download instructions

Each of the manifests include instructions in the header on how to download the included files.

To download the files using .s5cmd manifests:

  1. install idc-index package: pip install --upgrade idc-index
  2. download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

To download the files using .dcf manifest, see manifest header.

Acknowledgments

Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

References

[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

Files

Files (39.8 kB)

Name Size Download all
md5:888078b28b84d2a8fc68200a60746066
7.4 kB Download
md5:0132ece305405f6f459db4f8a69b1a11
24.3 kB Download
md5:a927228aaf05b8cc0cf636084d745d6b
8.1 kB Download

Additional details

Related works

Cites
10.1148/rg.230180 (DOI)
Is published in
10.25504/FAIRsharing.0b5a1d (DOI)