Dataset Open Access

Histological images for MSI vs. MSS classification in gastrointestinal cancer, FFPE samples

Kather, Jakob Nikolas

This repository contains 411,890 unique image patches derived from histological images of colorectal cancer and gastric cancer patients in the TCGA cohort (original whole slide SVS images are freely available at https://portal.gdc.cancer.gov/). All images in this repository are derived from formalin-fixed paraffin-embedded (FFPE) diagnostic slides ("DX" at the GDC data portal). This is explained well in this blog: http://www.andrewjanowczyk.com/download-tcga-digital-pathology-images-ffpe/

Preprocessing

All SVS slides were preprocessed as follows

1. automatic detection of tumor

2. resizing to 224 px x 224 px at a resolution of 0.5 µm/px

4. color normalization with the Macenko method (Macenko et al., 2009, http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf)

5. assignment of patients to either "MSS" (microsatellite stable) or "MSIMUT" (microsatellite instable or highly mutated)

6. randomization of patients to training and testing sets (~70% and ~30%). Randomization was done on a patient level rather than on a slide or tile level

7. equilibration of training sets by undersampling (removing excess tiles in MSS class in a random way)

File description

1. STAD_TRAIN_MSS - training images (~70% of all patients) for gastric (stomach) cancer TCGA patients with MSS (microsatellite stable) tumors, 50285 unique image patches; FFPE samples

2. STAD_TRAIN_MSIMUT - training images ( (~70% of all patients) for gastric (stomach) cancer TCGA patients with MSI (microsatellite instable) or highly mutated tumors, 50285 unique image patches; FFPE samples

3. STAD_TEST_MSS - test images (~30% of all patients) for gastric (stomach) cancer TCGA patients with MSS (microsatellite stable) tumors, 90104 unique image patches; FFPE samples

4. STAD_TEST_MSIMUT - test images ( ~30% of all patients) for gastric (stomach) cancer TCGA patients with MSI (microsatellite instable) or highly mutated tumors, 27904 unique image patches; FFPE samples

5. CRC_DX_TEST_MSIMUT - test images (~30% of all patients) for colorectal cancer TCGA patients with MSI (microsatellite instable) or highly mutated tumors, 29335 unique image patches; FFPE samples

6. CRC_DX_TEST_MSS - test images (~30% of all patients) for colorectal cancer TCGA patients with MSS (microsatellite stable) tumors, 70569 unique image patches; FFPE samples

7. CRC_DX_TRAIN_MSIMUT - training images (~70% of all patients) for colorectal cancer TCGA patients with MSI (microsatellite instable) or highly mutated tumors, 46704 unique image patches; FFPE samples

8. CRC_DX_TRAIN_MSS - training images (~70% of all patients) for colorectal cancer TCGA patients with MSS (microsatellite stable) tumors, 46704 unique image patches; FFPE samples

Files (47.1 GB)
Name Size
CRC_DX_TEST_MSIMUT.zip
md5:006962fd57ffa71db43f3d768a63d9ec
3.3 GB Download
CRC_DX_TEST_MSS.zip
md5:3407a53dddec68e67f49b863a3520e88
7.9 GB Download
CRC_DX_TRAIN_MSIMUT.zip
md5:d98f4ee32f5923a09779d9aabd773e37
5.3 GB Download
CRC_DX_TRAIN_MSS.zip
md5:0dbb048590925920c383e423fa282f77
5.3 GB Download
STAD_TEST_MSIMUT.zip
md5:77941b8914a2fac469ce4843c75ee19a
3.2 GB Download
STAD_TEST_MSS.zip
md5:8f324496c6feffe5353f493ae5e8c9ff
10.5 GB Download
STAD_TRAIN_MSIMUT.zip
md5:c785506ccdf49d4d08b65c5e7cdd28ae
5.8 GB Download
STAD_TRAIN_MSS.zip
md5:273295ed37ebccc3a574ab9106145e32
5.8 GB Download
8,631
42,454
views
downloads
All versions This version
Views 8,6318,644
Downloads 42,45442,454
Data volume 271.6 TB271.6 TB
Unique views 7,3347,345
Unique downloads 5,5985,598

Share

Cite as