A Pan-Cancer PDX Histology Image Repository with Genomic and Pathologic Annotation
Authors/Creators
-
White, Brian S.
-
Woo, Xing Yi
-
Koc, Soner
-
Sheridan, Todd
-
Neuhauser, Steven B.
-
Wang, Shidan
-
Evrard, Yvonne A.
-
Chen, Li
-
Foroughi pour, Ali
-
Landua, John D.
-
Mashl, R. Jay
-
Davies, Sherri R.
-
Fang, Bingliang
-
Raso, Maria Gabriela
-
Evans, Kurt W.
-
Bailey, Matthew H.
-
Chen, Yeqing
-
Xiao, Min
-
Rubinstein, Jill C.
-
Sanderson, Brian J.
-
Lloyd, Michael W.
-
Domanskyi, Sergii
-
Dobrolecki, Lacey E.
-
Fujita, Maihi
-
Fujimoto, Junya
-
Xiao, Guanghua
-
Fields, Ryan C.
-
Mudd, Jacqueline L.
-
Xu, Xiaowei
-
Hollingshead, Melinda G.
-
Jiwani, Shahanawaz
-
Acevedo, Saul
-
Davis-Dusenbery, Brandi N.
-
Robinson, Peter N.
-
Moscow, Jeffrey A.
-
Doroshow, James H.
-
Mitsiades, Nicholas
-
Kaochar, Salma
-
Pan, Chong-xian
-
Carvajal-Carmona, Luis G.
-
Welm, Alana L.
-
Welm, Bryan E.
-
Govindan, Ramaswamy
-
Li, Shunqiang
-
Davies, Michael A.
-
Roth, Jack A.
-
Meric-Bernstam, Funda
-
Xie, Yang
-
Herlyn, Meenhard
-
Ding, Li
-
Lewis, Michael T.
-
Bult, Carol J.
-
Dean II, Dennis A.
-
Chuang, Jeffrey H.
-
Clifford, William
-
Clunie, David
-
Fedorov, Andrey
Description
This dataset corresponds to a collection of images and/or image-derived data available from the National Cancer Institute Imaging Data Commons (IDC). This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using the IDC Portal. You can use the manifests included in this Zenodo record to download the collection following the Download instructions below.
A patient-derived xenograft (PDX) involves implanting a human tumor biopsy into an immunodeficient mouse. PDXs model human intra- and inter-tumoral heterogeneity within the intact tissue of the mouse. This dataset contains histologic hematoxylin and eosin (H&E) whole slide images of PDX samples and, in some cases, the human progenitor samples from which they are derived. These images were curated as part of the National Cancer Institute's PDX Development and Trial Centers Research Network (PDXNet) program, a collaborative initiative focused on pre-clinical model development and testing of targeted therapeutic agents. They were contributed by Baylor College of Medicine (BCM), Huntsman Cancer Institute, MD Anderson Cancer Center (MDACC), The Wistar Institute (WISTAR), Washington University in St Louis (WUSTL), and The Jackson Laboratory (JAX).
Images are provided in Digital Imaging and Communications in Medicine (DICOM) format and are available from the National Cancer Institute Imaging Data Commons (IDC). The original images in TIFF and SVS format were provided to the IDC team for archival purposes and were converted into DICOM Whole Slide Microscopy (SM) representation using custom open source scripts and tools at idc-wsi-conversion. Clinical data accompanying the images are available at pdxnet-image-analysis-aacr2022.
The repository contains >1,000 PDX H&E images and >100 matched human progenitor tumor images. The dataset encompasses a wide variety of cancer types and anatomic sites, including breast, lung, colorectal, pancreatic, melanoma, sarcoma, prostate, ovarian, and many others. Cancer types span adenocarcinomas, squamous cell carcinomas, sarcomas, neuroendocrine tumors, and other histologies. Most images include pathologic assessment of tumor stage and slide-level proportions of cancer, stromal, and necrotic regions. A subset of images has associated HoVer-Net cell segmentations and detailed pathologic annotations of neoplastic, stromal, and necrotic regions.
Genomic and transcriptomic data (RNA-seq, WES) and clinical metadata are linked to the images via the PDXNet Portal.
Files included
A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, pdxnet-idc_v22-aws.s5cmd corresponds to the contents of the pdxnet collection introduced in IDC data release v22.
pdxnet-idc_v24-aws.s5cmd: AWS download manifestpdxnet-idc_v24-gcs.s5cmd: GCS download manifestpdxnet-idc_v24-dcf.dcf: DCF download manifest
Manifest files ending in -aws.s5cmd reference files in Amazon Web Services (AWS) buckets; -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and mirrored between AWS and GCP.
Download instructions
Each manifest file includes instructions in its header on how to download the included files.
To download the files using .s5cmd manifests:
- Install idc-index:
pip install --upgrade idc-index - Download the files referenced by a manifest included in this dataset:
idc download manifest.s5cmd
To download files using a .dcf manifest, see the manifest header.
For questions or help, contact support@canceridc.dev or post on the IDC Forum.
Files
Files
(362.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:7a5e40b558c248cffce986903b90567e
|
59.2 kB | Download |
|
md5:667dd9489b410d4beae34b52b1ad64e7
|
244.3 kB | Download |
|
md5:fa8737f3562fad8464c4e69e500f91ca
|
59.2 kB | Download |
Additional details
Related works
- Cites
- Publication: 10.1148/rg.230180 (DOI)
- Is described by
- Publication: 10.1158/0008-5472.CAN-23-1349 (DOI)
- Is published in
- Other: 10.25504/FAIRsharing.0b5a1d (DOI)