Published January 16, 2025 | Version v1
Software Restricted

AI-Driven Spatial Transcriptomics Unlocks Large-Scale Cancer Biomarker Discovery from Histopathology

  • 1. ROR icon National Institutes of Health
  • 2. ROR icon National Cancer Institute

Description

Path2Space

Overview

Code associated with Path2Space, a model for predicting spatial transcriptomics (ST) from Hematoxylin and eosin (H&E) stained slides. 

1. Subdirectory: `1.ST_prediction`

This directory contains the primary scripts for spatial transcriptomics prediction tasks.

1.1.Feature_Extraction

- Main Script:
  - `1main_feature_extraction.py`: Script to perform image pre-processing and feature extraction.

- func:
  - `utils_preprocessing.py`: Functions for preprocessing the H&E image.
  - `utils_color_norm.py`: Functions for color normalization of images.
  -  `ctrans_model`: folder containing helper functions for extracting CTransPath model features,

1.2.Regression

- `1main_regression.py`: Script to train and predict ST expression values from the CTransPath fetaures.
- `model_MLP.py`: Multi-layer perceptron model implementation.
- `utils.py`: Helper functions for training the MLP regressor.

1.3.Prediction

- `Prediction.py`: Script to predict ST expression values from CTransPath fetaures using trained model.
- `model_MLP.py`: Multi-layer perceptron model implementation.
- `utils.py`: Helper functions for the prediction task.

2. Subdirectory: `2.Cell_type_fraction_model`

- `2.1.main_cell_type_model.py`: Script to train and predict cell type fraction from predicted ST values.
- `2.1.prediction.py`: Script to predict cell type fraction from predicted ST values using a trained model.

3. Subdirectory: `SPAND`

- `SPAND.py`: Function for calculating SPAND for a given slide and a given gene.

Usage

1.1. Feature Extraction:
   - Navigate to `1.ST_prediction/1.1.Feature_extraction`.
   - Run `1main_feature_extraction.py` for extracting features from H&E slide image.

1.2. Regression Tasks:
   - Navigate to `1.ST_prediction/1.2.Regression`.
   - Use `1main_regression.py` to train a regression model for ST prediction from HE image features (output of 1.1.). 

1.3. Prediction Tasks:
   - Navigate to `1.ST_prediction/1.3.Prediction`.
   - Use `Prediction.py` to predict ST from features (output of 1.1.) from a new dataset using a previously trained model (output of 1.2.). 

2.1. Cell Type Fraction Model Training:
   - Navigate to `2.Cell_type_fraction_model/`.
   - Use `2.1.main_cell_type_model.py` to train a cell type fraction prediction model from inferred ST values (output of 1.2. or 1.3.). 

2.2. Cell Tpe Fraction Prediction:
   - Navigate to `2.Cell_type_fraction_model/`.
   - Use `2.2.prediction.py` to predict cell type fraction from inferred ST values (output of 1.2. or 1.3.) using a trained model (output of 2.1.)

3. SPAND Analysis:
   - Open `SPAND/SPAND.py` to calculate SPAND for a given slide and a given gene from inferred ST values (output of 1.2. or 1.3.).

Dependencies

Ensure you have the following installed:
- Python 3.10.8 
- NumPy (1.24.4)
- Pandas (2.2.2)
- scikit-learn (1.5.1)
- Matplotlib (3.7.2)
- Seaborn (0.13.2)
- OpenSlide (1.3.1; for TGGA slides)
- OpenCV (4.6.0)
- Pillow (9.2.0f, for ST slides)
- PyTorch (2.4.0+cu121)

4. License and Terms of use

This model and its associated code have been filed for a provisional US patent (application no. 63/703,060, United States, 2024) and are permitted solely for non-commercial, academic research purposes. Commercial use, sale, or any form of monetization of the DEPLOY model is strictly prohibited without prior approval. Commercial entities interested in utilizing the model should contact the corresponding authors for authorization.

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/14675220">Log in</a> to check if you have access.

Additional details

Software

Programming language
Python