CGCI-HTMCP-DLBCL: DICOM converted whole slide images from the Cancer Genome Characterization Initiative (CGCI) HIV+ Tumor Molecular Characterization Project (HTMCP) - Diffuse Large B-CellLymphoma
Authors/Creators
Description
This dataset corresponds to a collection of images and/or image-derived data available from the National Cancer Institute Imaging Data Commons (IDC). This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using the IDC Portal. You can use the manifests included in this Zenodo record to download the collection following the Download instructions below.
The Office of Cancer Genomics at the National Cancer Institute sponsored a series of studies as part of the Cancer Genome Characterization Initiative (CGCI) to assess novel emerging sequencing technologies in cancer. The CGCI program included comprehensive characterization of the genetic aberrations found in different pediatric and/or adult tumors.
As part of CGCI, the HIV+ Tumor Molecular Characterization Project (HTMCP) was a joint effort of the Office of Cancer Genomics (OCG) and the Office of HIV and AIDS Malignancy (OHAM). Its goals were to characterize HIV-associated cancers obtained from HIV-infected patients and compare them to the same types of cancers from patients without HIV infection. Approximately 34.2 million people are living with HIV worldwide. People infected with HIV have an elevated risk of cancer and mortality, and cancer is a ranking cause of death among people with HIV/AIDS. The Genome Sciences Center at the British Columbia Cancer Agency performed whole genome sequencing of 100 cases of paired tumor and germline DNA, along with transcriptome sequencing of HIV+ tumors.
Incidence of diffuse large B-cell lymphoma (DLBCL) is significantly increased among HIV-positive patients, a trend that continues to rise despite highly active antiretroviral therapy (HAART). A significant proportion of these malignancies are not known to be caused by an oncogenic DNA virus, resulting in many questions about both its pathogenesis and high rate of incidence associated with HIV infection.
This collection contains DICOM converted whole slide images from 43 of the 70 cases in the GDC CGCI-HTMCP-DLBCL project (dbGaP accession phs000529). The proprietary format whole slide images were obtained from GDC and converted to DICOM Slide Microscopy (SM) format using idc-wsi-conversion. The 496 slides include specimens stained with H&E (hematoxylin and eosin, 42 slides) and various immunohistochemistry and in situ hybridization markers: BCL2 (43), BCL6 (43), CD10 (43), CD20 (43), CD3 (43), CD79a (43), EBER (43), Ki-67 (43), MUM1 (43), TP53 (43), INT (21), CD138 (1), plus additional H&E sections (S1-HE, S2-HE; 1 each).
Data organization: DICOM PatientIDs correspond to GDC case IDs and can be used to link to genomic, transcriptomic, and clinical data in the GDC portal. Of 70 GDC cases, 43 have Tissue Slide images; the remaining 27 have no slides and are not represented in this collection. Most patients have 11+ slides representing different stains (H&E, BCL2, BCL6, CD10, CD20, CD3, CD79a, EBER, Ki-67, MUM1, TP53).
HTMCP-DLBCL data is accessible at the NCI's Genomic Data Commons (GDC) via the GDC Data Portal. Please see the CGCI Use and Publication Guidelines for updated details on the sharing of any CGCI substudy data, including how to cite CGCI.
Files included
A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, cgci_htmcp_dlbcl-idc_v22-aws.s5cmd corresponds to the contents of the cgci_htmcp_dlbcl collection introduced in IDC data release v22.
cgci_htmcp_dlbcl-idc_v24-aws.s5cmd: AWS download manifestcgci_htmcp_dlbcl-idc_v24-gcs.s5cmd: GCS download manifestcgci_htmcp_dlbcl-idc_v24-dcf.dcf: DCF download manifest
Manifest files ending in -aws.s5cmd reference files in Amazon Web Services (AWS) buckets; -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and mirrored between AWS and GCP.
Download instructions
Each manifest file includes instructions in its header on how to download the included files.
To download the files using .s5cmd manifests:
- Install idc-index:
pip install --upgrade idc-index - Download the files referenced by a manifest included in this dataset:
idc download manifest.s5cmd
To download files using a .dcf manifest, see the manifest header.
For questions or help, contact support@canceridc.dev or post on the IDC Forum.
Files
Files
(174.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:650afea73488587902187ccd8bd9e48f
|
32.1 kB | Download |
|
md5:79d3c5884bfd29e913af50f2a8024bd9
|
109.7 kB | Download |
|
md5:60dc3016196c8ffe9d5cd24dad234cd4
|
32.1 kB | Download |
Additional details
Related works
- Cites
- Publication: 10.1148/rg.230180 (DOI)
- Is published in
- Other: 10.25504/FAIRsharing.0b5a1d (DOI)