Published February 7, 2019 | Version v1
Dataset Open

Histological images for MSI vs. MSS classification in gastrointestinal cancer, snap-frozen samples

  • 1. RWTH University Aachen

Description

This repository contains 218,578 unique image patches derived from histological images of colorectal cancer patients in the TCGA cohort (original whole slide SVS images are freely available at https://portal.gdc.cancer.gov/). All images in this repository are derived from snap-frozen tissue slides ("TS" or "BS" at the GDC data portal).

Preprocessing

All SVS slides were preprocessed as follows

1. automatic detection of tumor

2. resizing to 224 px x 224 px at a resolution of 0.5 µm/px

4. color normalization with the Macenko method (Macenko et al., 2009, http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf)

5. assignment of patients to either "MSS" (microsatellite stable) or "MSIMUT" (microsatellite unstable or hypermutated)

6. randomization of patients to training and testing sets (~70% and ~30%). Randomization was done on a patient level rather than on a slide or tile level

7. equilibration of training sets by undersampling (removing excess tiles in MSS class in a random way)

File description

1. STAD_TRAIN_MSS - training images (~70% of all patients) for gastric (stomach) cancer TCGA patients with MSS (microsatellite stable) tumors, 50285 unique image patches; FFPE samples

2. STAD_TRAIN_MSIMUT - training images ( (~70% of all patients) for gastric (stomach) cancer TCGA patients with MSI (microsatellite instable) or highly mutated tumors, 50285 unique image patches; FFPE samples

3. STAD_TEST_MSS - test images (~30% of all patients) for gastric (stomach) cancer TCGA patients with MSS (microsatellite stable) tumors, 90104 unique image patches; FFPE samples

4. STAD_TEST_MSIMUT - test images ( ~30% of all patients) for gastric (stomach) cancer TCGA patients with MSI (microsatellite instable) or highly mutated tumors, 27904 unique image patches; FFPE samples

Files

CRC_KR_TEST_MSIMUT.zip

Files (15.2 GB)

Name Size Download all
md5:99e6e9434d23ffd58ee1e09f93624c21
1.9 GB Preview Download
md5:dd747dae3c842ef01534d25a1deaf99d
6.6 GB Preview Download
md5:8ab26b5ba2cfde43ad0a6c3d7e14a40e
3.4 GB Preview Download
md5:db3f5a846e8ebd08b097a82a44f6adda
3.3 GB Preview Download