Histopathology images for end-to-end AI, based on TCGA-BRCA
Description
These are histopathological images which are derived from the TCGA-BRCA breast cancer histology dataset at https://portal.gdc.cancer.gov/ (please check this website for the original data license). They can be used for end-to-end artificial intelligence (AI) workflows such as DeepMed (https://github.com/KatherLab/deepmed) which aim to predict high-level features directly from digital images with weakly supervised transfer learning. Here, we use two subsets of these digitized images:
1) TCGA-BRCA-A2, these are all images from Walter Reed National Military Medical Center (tissue source site code A2, N=100 images) in the TCGA-BRCA database (tcga-brca-a2-deepmed-tiles.zip)
2) TCGA-BRCA-E2, these are all images from Roswell Park Comprehensive Cancer Center (tissue source site code E2, N=90 images) in the TCGA-BRCA database (tcga-brca-e2-deepmed-tiles.zip)
see also https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/tissue-source-site-codes
The images were preprocessed according to the Aachen Protocol for Deep Learning Histopathology which is available at https://zenodo.org/record/3694994. Specifically, digital whole slide images (SVS format) of hematoxylin & eosin (H&E) stained slides were tessellated (without manual annotations) into tiles of 256x256 px edge length at 1 µm/px. Then, images were color-normalized using the Macenko method as described before (https://www.nature.com/articles/s43018-020-0087-6) and saved as JPEG files. For the A2 cohort, an additional ZIP archive is provided in which only 100 random image tiles are saved for each patient (tcga-brca-a2-deepmed-tiles_100.zip). In addition, we provide a CLINI and a SLIDE table as defined in the "Aachen Protocol". The CLINI table contains clinico-pathological data for all included patients and it is derived from clinical information on www.cbioportal.org as well as from Thorsson et al. (https://pubmed.ncbi.nlm.nih.gov/29628290/). We recommend to use the A2 dataset for training and the E2 dataset for testing. Please cite the relevant papers if you re-use this dataset, more information is available on www.kather.ai
Files
TCGA-BRCA-A2-DEEPMED-TILES.zip
Files
(24.3 GB)
Name | Size | Download all |
---|---|---|
md5:c696f1b7defb581db1557ae769833d25
|
153.2 kB | Download |
md5:85cfc54af021e2956cb1f472bc55c451
|
12.8 GB | Preview Download |
md5:8e439785cb219c72d2b867e0411da577
|
464.0 MB | Preview Download |
md5:3692048b393552f44646ce4ffae97351
|
13.5 kB | Download |
md5:4651c9f2f987e0fdbe9c13d41f3daf28
|
138.6 kB | Download |
md5:9b37112c14682e6f42a51e0ca336a1b5
|
11.1 GB | Preview Download |
md5:62b6dd71e811b89f906fdc8d5907341b
|
13.0 kB | Download |