AI-Driven Spatial Transcriptomics Unlocks Large-Scale Cancer Biomarker Discovery from Histopathology
Authors/Creators
Description
Path2Space
Overview
Code associated with Path2Space, a model for predicting spatial transcriptomics (ST) from Hematoxylin and eosin (H&E) stained slides.
1. Subdirectory: `1.ST_prediction`
This directory contains the primary scripts for spatial transcriptomics prediction tasks.
1.1.Feature_Extraction
- Main Script:
- `1main_feature_extraction.py`: Script to perform image pre-processing and feature extraction.
- func:
- `utils_preprocessing.py`: Functions for preprocessing the H&E image.
- `utils_color_norm.py`: Functions for color normalization of images.
- `ctrans_model`: folder containing helper functions for extracting CTransPath model features,
1.2.Regression
- `1main_regression.py`: Script to train and predict ST expression values from the CTransPath fetaures.
- `model_MLP.py`: Multi-layer perceptron model implementation.
- `utils.py`: Helper functions for training the MLP regressor.
1.3.Prediction
- `Prediction.py`: Script to predict ST expression values from CTransPath fetaures using trained model.
- `model_MLP.py`: Multi-layer perceptron model implementation.
- `utils.py`: Helper functions for the prediction task.
2. Subdirectory: `2.Cell_type_fraction_model`
- `2.1.main_cell_type_model.py`: Script to train and predict cell type fraction from predicted ST values.
- `2.1.prediction.py`: Script to predict cell type fraction from predicted ST values using a trained model.
3. Subdirectory: `SPAND`
- `SPAND.py`: Function for calculating SPAND for a given slide and a given gene.
Usage
1.1. Feature Extraction:
- Navigate to `1.ST_prediction/1.1.Feature_extraction`.
- Run `1main_feature_extraction.py` for extracting features from H&E slide image.
1.2. Regression Tasks:
- Navigate to `1.ST_prediction/1.2.Regression`.
- Use `1main_regression.py` to train a regression model for ST prediction from HE image features (output of 1.1.).
1.3. Prediction Tasks:
- Navigate to `1.ST_prediction/1.3.Prediction`.
- Use `Prediction.py` to predict ST from features (output of 1.1.) from a new dataset using a previously trained model (output of 1.2.).
2.1. Cell Type Fraction Model Training:
- Navigate to `2.Cell_type_fraction_model/`.
- Use `2.1.main_cell_type_model.py` to train a cell type fraction prediction model from inferred ST values (output of 1.2. or 1.3.).
2.2. Cell Tpe Fraction Prediction:
- Navigate to `2.Cell_type_fraction_model/`.
- Use `2.2.prediction.py` to predict cell type fraction from inferred ST values (output of 1.2. or 1.3.) using a trained model (output of 2.1.)
3. SPAND Analysis:
- Open `SPAND/SPAND.py` to calculate SPAND for a given slide and a given gene from inferred ST values (output of 1.2. or 1.3.).
Dependencies
Ensure you have the following installed:
- Python 3.10.8
- NumPy (1.24.4)
- Pandas (2.2.2)
- scikit-learn (1.5.1)
- Matplotlib (3.7.2)
- Seaborn (0.13.2)
- OpenSlide (1.3.1; for TGGA slides)
- OpenCV (4.6.0)
- Pillow (9.2.0f, for ST slides)
- PyTorch (2.4.0+cu121)
4. License and Terms of use
This model and its associated code have been filed for a provisional US patent (application no. 63/703,060, United States, 2024) and are permitted solely for non-commercial, academic research purposes. Commercial use, sale, or any form of monetization of the DEPLOY model is strictly prohibited without prior approval. Commercial entities interested in utilizing the model should contact the corresponding authors for authorization.
Files
Additional details
Software
- Programming language
- Python