CATCH: CAnine CuTaneous Cancer Histology Dataset
Authors/Creators
Description
This dataset corresponds to a collection of images and/or image-derived data available from the National Cancer Institute Imaging Data Commons (IDC). This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using the IDC Portal. You can use the manifests included in this Zenodo record to download the collection following the Download instructions below.
This collection contains 350 hematoxylin and eosin (H&E) stained whole slide images from the Pan-tumor CAnine CuTaneous Cancer Histology (CATCH) dataset, representing 282 canine patients across seven cutaneous tumor subtypes: melanoma, mast cell tumor (MCT), squamous cell carcinoma (SCC), peripheral nerve sheath tumor (PNST), trichoblastoma, histiocytoma, and plasmacytoma (50 slides per subtype). Specimens were retrospectively selected from the histopathology archive of the Institute of Veterinary Pathology, Freie Universitat Berlin. Slides were digitized at 40X magnification on two Leica scanners: Aperio ScanScope CS2 (303 slides, 0.2533 um/pixel) and Aperio AT2 (47 slides, 0.2524 um/pixel), producing pyramidal SVS files with three resolution levels. The original SVS images were obtained from TCIA and converted to DICOM Slide Microscopy (SM) format for IDC using idc-wsi-conversion. Note: the upstream CATCH dataset also includes 12,424 expert polygon annotations across 13 histologic classes (seven tumor subtypes plus six non-neoplastic tissue classes). Those annotations were not converted to DICOM and are not included in this collection. The original annotations are available from the TCIA CATCH collection page. Due to morphological parallels between canine and human cutaneous tumors, the dataset is relevant to both veterinary and human pathology research.
Files included
A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, catch-idc_v22-aws.s5cmd corresponds to the contents of the catch collection introduced in IDC data release v22.
catch-idc_v24-aws.s5cmd: AWS download manifestcatch-idc_v24-gcs.s5cmd: GCS download manifestcatch-idc_v24-dcf.dcf: DCF download manifest
Manifest files ending in -aws.s5cmd reference files in Amazon Web Services (AWS) buckets; -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and mirrored between AWS and GCP.
Download instructions
Each manifest file includes instructions in its header on how to download the included files.
To download the files using .s5cmd manifests:
- Install idc-index:
pip install --upgrade idc-index - Download the files referenced by a manifest included in this dataset:
idc download manifest.s5cmd
To download files using a .dcf manifest, see the manifest header.
For questions or help, contact support@canceridc.dev or post on the IDC Forum.
Files
Files
(123.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:ffca21bf971f639052128a16bf088066
|
22.8 kB | Download |
|
md5:e4a4200a2d56bf85f4242bdd43ac2696
|
78.2 kB | Download |
|
md5:e9abc63d89f9484e252b8ac79359bfce
|
22.8 kB | Download |
Additional details
Related works
- Cites
- Publication: 10.1148/rg.230180 (DOI)
- Is derived from
- Other: 10.7937/TCIA.2M93-FX66 (DOI)
- Is described by
- Publication: 10.1038/s41597-022-01692-w (DOI)
- Publication: 10.1038/s41597-020-00756-z (DOI)
- Is published in
- Other: 10.25504/FAIRsharing.0b5a1d (DOI)