Dataset Open Access

Histological images for MSI vs. MSS classification in gastrointestinal cancer, snap-frozen samples

Kather, Jakob Nikolas

This repository contains 218,578 unique image patches derived from histological images of colorectal cancer patients in the TCGA cohort (original whole slide SVS images are freely available at https://portal.gdc.cancer.gov/). All images in this repository are derived from snap-frozen tissue slides ("TS" or "BS" at the GDC data portal).

Preprocessing

All SVS slides were preprocessed as follows

1. automatic detection of tumor

2. resizing to 224 px x 224 px at a resolution of 0.5 µm/px

4. color normalization with the Macenko method (Macenko et al., 2009, http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf)

5. assignment of patients to either "MSS" (microsatellite stable) or "MSIMUT" (microsatellite unstable or hypermutated)

6. randomization of patients to training and testing sets (~70% and ~30%). Randomization was done on a patient level rather than on a slide or tile level

7. equilibration of training sets by undersampling (removing excess tiles in MSS class in a random way)

File description

1. STAD_TRAIN_MSS - training images (~70% of all patients) for gastric (stomach) cancer TCGA patients with MSS (microsatellite stable) tumors, 50285 unique image patches; FFPE samples

2. STAD_TRAIN_MSIMUT - training images ( (~70% of all patients) for gastric (stomach) cancer TCGA patients with MSI (microsatellite instable) or highly mutated tumors, 50285 unique image patches; FFPE samples

3. STAD_TEST_MSS - test images (~30% of all patients) for gastric (stomach) cancer TCGA patients with MSS (microsatellite stable) tumors, 90104 unique image patches; FFPE samples

4. STAD_TEST_MSIMUT - test images ( ~30% of all patients) for gastric (stomach) cancer TCGA patients with MSI (microsatellite instable) or highly mutated tumors, 27904 unique image patches; FFPE samples

Files (15.2 GB)
Name Size
CRC_KR_TEST_MSIMUT.zip
md5:99e6e9434d23ffd58ee1e09f93624c21
1.9 GB Download
CRC_KR_TEST_MSS.zip
md5:dd747dae3c842ef01534d25a1deaf99d
6.6 GB Download
CRC_KR_TRAIN_MSIMUT.zip
md5:8ab26b5ba2cfde43ad0a6c3d7e14a40e
3.4 GB Download
CRC_KR_TRAIN_MSS.zip
md5:db3f5a846e8ebd08b097a82a44f6adda
3.3 GB Download
2,862
13,018
views
downloads
All versions This version
Views 2,8622,863
Downloads 13,01813,018
Data volume 52.3 TB52.3 TB
Unique views 2,4762,477
Unique downloads 1,5161,516

Share

Cite as