Dataset Open Access
This repository contains 218,578 unique image patches derived from histological images of colorectal cancer patients in the TCGA cohort (original whole slide SVS images are freely available at https://portal.gdc.cancer.gov/). All images in this repository are derived from snap-frozen tissue slides ("TS" or "BS" at the GDC data portal).
Preprocessing
All SVS slides were preprocessed as follows
1. automatic detection of tumor
2. resizing to 224 px x 224 px at a resolution of 0.5 µm/px
4. color normalization with the Macenko method (Macenko et al., 2009, http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf)
5. assignment of patients to either "MSS" (microsatellite stable) or "MSIMUT" (microsatellite unstable or hypermutated)
6. randomization of patients to training and testing sets (~70% and ~30%). Randomization was done on a patient level rather than on a slide or tile level
7. equilibration of training sets by undersampling (removing excess tiles in MSS class in a random way)
File description
1. STAD_TRAIN_MSS - training images (~70% of all patients) for gastric (stomach) cancer TCGA patients with MSS (microsatellite stable) tumors, 50285 unique image patches; FFPE samples
2. STAD_TRAIN_MSIMUT - training images ( (~70% of all patients) for gastric (stomach) cancer TCGA patients with MSI (microsatellite instable) or highly mutated tumors, 50285 unique image patches; FFPE samples
3. STAD_TEST_MSS - test images (~30% of all patients) for gastric (stomach) cancer TCGA patients with MSS (microsatellite stable) tumors, 90104 unique image patches; FFPE samples
4. STAD_TEST_MSIMUT - test images ( ~30% of all patients) for gastric (stomach) cancer TCGA patients with MSI (microsatellite instable) or highly mutated tumors, 27904 unique image patches; FFPE samples
Name | Size | |
---|---|---|
CRC_KR_TEST_MSIMUT.zip
md5:99e6e9434d23ffd58ee1e09f93624c21 |
1.9 GB | Download |
CRC_KR_TEST_MSS.zip
md5:dd747dae3c842ef01534d25a1deaf99d |
6.6 GB | Download |
CRC_KR_TRAIN_MSIMUT.zip
md5:8ab26b5ba2cfde43ad0a6c3d7e14a40e |
3.4 GB | Download |
CRC_KR_TRAIN_MSS.zip
md5:db3f5a846e8ebd08b097a82a44f6adda |
3.3 GB | Download |
All versions | This version | |
---|---|---|
Views | 2,862 | 2,863 |
Downloads | 13,018 | 13,018 |
Data volume | 52.3 TB | 52.3 TB |
Unique views | 2,476 | 2,477 |
Unique downloads | 1,516 | 1,516 |