SenCat machine learning determination of senescence markers
Authors/Creators
Description
Machine learning guided identification of senescence markers
Update from previous version - updated markers coefficients files.
This repository provides a reproducible workflow for identifying robust senescence markers from SenCat transcriptomic and proteomic data and for validating marker-based scoring in external IMR90 fibroblast datasets.
Workflow
The workflow standardizes transcriptomic and proteomic measurements, applies consistent preprocessing, and uses a cross-cell-type machine learning strategy to identify markers that remain informative across biological contexts. A refined marker set is then used to derive stable marker weights and generate sample-level senescence scores in both reference and external validation datasets, with all primary outputs written to the analysis and plotting directories.
Inputs
- SenCat transcriptomic data: primary RNA-level input used for marker discovery.
- SenCat proteomic data: primary protein-level input used for marker discovery.
- External validation data: IMR90 fibroblast datasets used to evaluate score transferability.
- Workflow configuration: centralized settings for inputs, analysis profiles, and output locations.
ML markers
- Transcriptomics markers:
analysis/transcriptomics.transcriptomics_loose5000f_tuned_common_features.results.csv. - Proteomics markers:
analysis/proteomics.proteomics_loose5000f_tuned_common_features.results.csv.
Using ML markers for senescence scoring
You can use our senescence markers to score your data for senescence.
Prepare data
The scoring is performed on h5ad file containing normalized transcriptomics or proteomics counts. Expected h5ad structure:
adata.X: sample-by-feature expression matrixadata.var_names: feature identifiers matching the marker IDs in the marker CSV index
If your data are not normalized, you can use normalize_counts.py script:
python workflow/scripts/data/normalize_counts.py \
--input-h5ad INPUT_H5AD \
--design DESIGN_FACTORS \
--output-h5ad NORMALIZED_H5AD \
--log logs/my_data.normalize.log \
--log-level INFO
INPUT_H5ADspecifies a path to your inputh5adfileDESIGN_FACTORSspecifies design factors for DESeq2, in the formatx + zor~x+z.NORMALIZED_H5ADspecifies a path where your normalized data will be saved
Get senescence scores
python workflow/scripts/cls/marker_classifier.py \
--markers PATH_TO_ML_MARKERS \
--input-h5ad NORMALIZED_H5AD \
--output-results-csv OUTPUT_CSV
PATH_TO_ML_MARKERSspecifies path to ML markers. Useanalysis/transcriptomics.transcriptomics_loose5000f_tuned_common_features.results.csvfor transcriptomics andanalysis/proteomics.proteomics_loose5000f_tuned_common_features.results.csvfor proteomicsNORMALIZED_H5ADspecifies path to your normalizedh5addataOUTPUT_CSVspecifies path to output csv file with per-samplescorevalues (higher values indicate stronger similarity to the senescence-associated signature).
Notes:
marker_classifier.pyapplieslog1pinternally.- Marker matching is based on
adata.var_names; non-overlapping markers are skipped automatically.
Files
Files
(136.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:b7fd9beb0ba162ec345a10fe67826363
|
136.8 kB | Download |
Additional details
Software
- Repository URL
- https://github.com/maragkakislab/wf-ml-markers-senescence
- Programming language
- Python
- Development Status
- Active