Technical note Open Access
Muti, Hannah Sophie; Loeffler, Chiara; Echle, Amelie; Heij, Lara R; Buelow, Roman D; Krause, Jeremias; Broderius, Laura; Niehues, Jan; Liapi, Georgia; Boor, Peter; Grabsch, Heike; Kochanny, Sara; Pearson, Alexander T; Kather, Jakob Nikolas
Background: Deep learning can predict clinically relevant features such as genetic alterations directly from H&E stained histology images. In practice, many clinically relevant questions are limited by availability of clinical data and by the lack of standardized preprocessing pipelines. In our research projects, we strive to keep a consistent data format across projects to facilitate downstream analysis.
Workflow: We analyze cohorts of cancer patients and try to predict clinically relevant labels directly from whole slide images (WSI). To achieve this, we manually or automatically detect tumor tissue in the WSI, tessellate the tumor into smaller image tiles and store these tiles in a cohort directory (Figure 1). We prepare a Slide Master Table, specifying which WSI belongs to which patient and a Patient Master Table, specifying the labels (target categories) for each patient. Our publicly available scripts automate the remaining workflow: Tiles are loaded, are matched to WSIs, which are matched to patients, which are matched to labels. Deep neural networks are trained to predict the labels and are evaluated on external cohorts.
Target audience: This is a best practice manual focused on practical aspects such as file names, ground truth data tables and ROI annotation. This document is intended for onboarding new team members and for our academic collaborators. We hope that beyond our teams, this consensus document might be useful for other groups in the deep learning histopathology community. Our data standards are is inspired by The Cancer Genome Atlas (TCGA) standards (http://portal.gdc.cancer.gov). Please give your feedback on http://kather.ai.
Aachen Protocol for Deep Learning Histopathology v0.2.pdf