Prediction of cancer treatment response from histopathology images through imputed transcriptomics
Creators
- Hoang, Danh-Tai1
- Dinstag, Gal2
- Hermida, C. Leandro3
- Ben-Zvi, S. Doreen2
- Elis, Efrat2
- Caley, Katherine1
- Sammut, Stephen-John4
- Sinha, Sanju5
- Sinha, Neelam5
- Dampier, H. Christopher6
- Stossel, Chani7
- Patil, Tejas8
- Rajan, Arun9
- Lassoued, Wiem10
- Strauss, Julius10
- Bailey, Shania10
- Allen, Clint11
- Redman, Jason12
- Beker, Tuvik2
- Jiang, Peng5
- Golan, Talia7
- Wilkinson, Scott13
- Sowalsky, G. Adam13
- Pine, R. Sharon8
- Caldas, Carlos14
- Gulley, L. James15
- Aldape, Kenneth6
- Aharonov, Ranit2
- Stone, A. Eric1
- Ruppin, Eytan5
- 1. Biological Data Science Institute, College of Science, Australian National University, Canberra, ACT, Australia
- 2. Pangea Biomed Ltd., Tel Aviv, Israel
- 3. Department of Immunology, University of Pittsburgh, Pittsburgh, PA, USA; Tumor Microenvironment Center, UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
- 4. Breast Cancer Now Toby Robins Research Centre, The Institute of Cancer Research, London, United Kingdom; The Royal Marsden Hospital NHS Foundation Trust, London, United Kingdom; Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, UK
- 5. Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
- 6. Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
- 7. Oncology Institute, Sheba Medical Center at Tel-Hashomer, Tel Aviv University, Tel Aviv, Israel
- 8. Division of Medical Oncology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- 9. Thoracic and GI Malignancies Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA.
- 10. Center for Immuno-Oncology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA.
- 11. Surgical Oncology Program, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
- 12. Center for Immuno-Oncology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
- 13. Laboratory of Genitourinary Cancer Pathogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
- 14. Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, UK
- 15. Genitourinary Malignancy Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
Description
DeepPT codes in the manuscript "Prediction of cancer treatment response from histopathology images through imputed transcriptomics" are uploaded here. Please see the README for details.
-----
Introduction:
DeepPT (Deep Pathology for Transcriptomics) is a deep learning framework that predicts gene expression from histopathology images. DeepPT consists of 4 main components:
1. Image pre-processing: Split each whole slide image into tiles/patches and select only tiles that contain tissue and exclude them from background. Color normalization was included to minimize staining variation (heterogeneity and batch effects).
2. Feature extraction: Use the pre-trained ResNet50 CNN model to extract image features from the tiles. Through this process, each image tile is represented by a vector of 2,048 derived features (pre-trained ResNet features).
3. Feature compression: Compress the 2,048 pre-trained ResNet features to 512 features using an autoencoder network. This helps to exclude noise, to avoid overfitting, and finally to reduce the computational demands.
4. Prediction: This component takes the AE features as input and gene expressions as output.
DeepPT computational pipeline:
- Step 1: Run “11slide_processing/1main_processing.py” to perform image pre-processing and feature extraction. This code will run on each slide simultaneously.
- Step 2: Run “11slide_processing/collect_mask.py” to collect mask files into a single file “mask.pdf” that will be used to evaluate slide quality.
- Step 3: Run “11slide_processing/collect_features.py” to create a file that contains features of image tiles.
- Step 4: Run “12AE/1main_AE.py” to compress the 2,048 pre-trained features to 512 AE features.
- Step 5: Run “13DeepPT_train/1main_train.py” to train and predict gene expression from the AE features.