Published May 3, 2026 | Version v2
Software Open

SenCat machine learning determination of senescence markers

Authors/Creators

Description

Machine learning guided identification of senescence markers

Update from previous version - updated markers coefficients files. 

This repository provides a reproducible workflow for identifying robust senescence markers from SenCat transcriptomic and proteomic data and for validating marker-based scoring in external IMR90 fibroblast datasets.

Workflow

The workflow standardizes transcriptomic and proteomic measurements, applies consistent preprocessing, and uses a cross-cell-type machine learning strategy to identify markers that remain informative across biological contexts. A refined marker set is then used to derive stable marker weights and generate sample-level senescence scores in both reference and external validation datasets, with all primary outputs written to the analysis and plotting directories.

Inputs

  • SenCat transcriptomic data: primary RNA-level input used for marker discovery.
  • SenCat proteomic data: primary protein-level input used for marker discovery.
  • External validation data: IMR90 fibroblast datasets used to evaluate score transferability.
  • Workflow configuration: centralized settings for inputs, analysis profiles, and output locations.

ML markers

  • Transcriptomics markersanalysis/transcriptomics.transcriptomics_loose5000f_tuned_common_features.results.csv.
  • Proteomics markersanalysis/proteomics.proteomics_loose5000f_tuned_common_features.results.csv.

Using ML markers for senescence scoring

You can use our senescence markers to score your data for senescence.

Prepare data

The scoring is performed on h5ad file containing normalized transcriptomics or proteomics counts. Expected h5ad structure:

  • adata.X: sample-by-feature expression matrix
  • adata.var_names: feature identifiers matching the marker IDs in the marker CSV index

If your data are not normalized, you can use normalize_counts.py script:

python workflow/scripts/data/normalize_counts.py \
    --input-h5ad INPUT_H5AD \
    --design DESIGN_FACTORS \
    --output-h5ad NORMALIZED_H5AD \
    --log logs/my_data.normalize.log \
    --log-level INFO
 
  • INPUT_H5AD specifies a path to your input h5ad file
  • DESIGN_FACTORS specifies design factors for DESeq2, in the format x + z or ~x+z.
  • NORMALIZED_H5AD specifies a path where your normalized data will be saved

Get senescence scores

python workflow/scripts/cls/marker_classifier.py \
    --markers PATH_TO_ML_MARKERS \
    --input-h5ad NORMALIZED_H5AD \
    --output-results-csv OUTPUT_CSV 
 
  • PATH_TO_ML_MARKERS specifies path to ML markers. Use analysis/transcriptomics.transcriptomics_loose5000f_tuned_common_features.results.csv for transcriptomics and analysis/proteomics.proteomics_loose5000f_tuned_common_features.results.csv for proteomics
  • NORMALIZED_H5AD specifies path to your normalized h5ad data
  • OUTPUT_CSV specifies path to output csv file with per-sample score values (higher values indicate stronger similarity to the senescence-associated signature).

Notes:

  • marker_classifier.py applies log1p internally.
  • Marker matching is based on adata.var_names; non-overlapping markers are skipped automatically.

Files

Files (136.8 kB)

Name Size Download all
md5:b7fd9beb0ba162ec345a10fe67826363
136.8 kB Download

Additional details

Software

Repository URL
https://github.com/maragkakislab/wf-ml-markers-senescence
Programming language
Python
Development Status
Active