Histological image tiles for TCGA-CRC-DX, color-normalized, sorted by MSI status, train/test split
Description
These are histological images of colorectal cancer, derived from the TCGA database at https://portal.cdc.cancer.gov. Tumor tissue was outlined manually and the tumor region was cut into tiles of 256 µm edge length, saved as 512 px images (effective magnification 0.5 µm/px). All image tiles were color-normalized with the Macenko method. Patients were split into training and test set in a 2:1 ratio. For all patients, MSI status was acquired (patients with MSI-H = MSIH; patients with MSI-L and MSS = NonMSIH) and all tiles inherited the label of the parent patient. Then, tiles in the training set were randomly undersampled to equalize classes. The test set was not undersampled. Further info: www.kather.ai