SenCat machine learning determination of senescence markers

Gresova, Katarina

doi:10.5281/zenodo.20008425

Published May 3, 2026 | Version v2

Software Open

SenCat machine learning determination of senescence markers

Gresova, Katarina

Machine learning guided identification of senescence markers

Update from previous version - updated markers coefficients files.

This repository provides a reproducible workflow for identifying robust senescence markers from SenCat transcriptomic and proteomic data and for validating marker-based scoring in external IMR90 fibroblast datasets.

Workflow

The workflow standardizes transcriptomic and proteomic measurements, applies consistent preprocessing, and uses a cross-cell-type machine learning strategy to identify markers that remain informative across biological contexts. A refined marker set is then used to derive stable marker weights and generate sample-level senescence scores in both reference and external validation datasets, with all primary outputs written to the analysis and plotting directories.

Inputs

SenCat transcriptomic data: primary RNA-level input used for marker discovery.
SenCat proteomic data: primary protein-level input used for marker discovery.
External validation data: IMR90 fibroblast datasets used to evaluate score transferability.
Workflow configuration: centralized settings for inputs, analysis profiles, and output locations.

ML markers

Transcriptomics markers: analysis/transcriptomics.transcriptomics_loose5000f_tuned_common_features.results.csv.
Proteomics markers: analysis/proteomics.proteomics_loose5000f_tuned_common_features.results.csv.

Using ML markers for senescence scoring

You can use our senescence markers to score your data for senescence.

Prepare data

The scoring is performed on h5ad file containing normalized transcriptomics or proteomics counts. Expected h5ad structure:

adata.X: sample-by-feature expression matrix
adata.var_names: feature identifiers matching the marker IDs in the marker CSV index

If your data are not normalized, you can use normalize_counts.py script:

python workflow/scripts/data/normalize_counts.py \
    --input-h5ad INPUT_H5AD \
    --design DESIGN_FACTORS \
    --output-h5ad NORMALIZED_H5AD \
    --log logs/my_data.normalize.log \
    --log-level INFO

INPUT_H5AD specifies a path to your input h5ad file
DESIGN_FACTORS specifies design factors for DESeq2, in the format x + z or ~x+z.
NORMALIZED_H5AD specifies a path where your normalized data will be saved

Get senescence scores

python workflow/scripts/cls/marker_classifier.py \
    --markers PATH_TO_ML_MARKERS \
    --input-h5ad NORMALIZED_H5AD \
    --output-results-csv OUTPUT_CSV

PATH_TO_ML_MARKERS specifies path to ML markers. Use analysis/transcriptomics.transcriptomics_loose5000f_tuned_common_features.results.csv for transcriptomics and analysis/proteomics.proteomics_loose5000f_tuned_common_features.results.csv for proteomics
NORMALIZED_H5AD specifies path to your normalized h5ad data
OUTPUT_CSV specifies path to output csv file with per-sample score values (higher values indicate stronger similarity to the senescence-associated signature).

Notes:

marker_classifier.py applies log1p internally.
Marker matching is based on adata.var_names; non-overlapping markers are skipped automatically.

Files

Files (136.8 kB)

Name	Size	Download all
wf-ml-markers-senescence-1.0.1.tar.gz md5:b7fd9beb0ba162ec345a10fe67826363	136.8 kB	Download

Additional details

Repository URL: https://github.com/maragkakislab/wf-ml-markers-senescence
Programming language: Python
Development Status: Active

	All versions	This version
Views	26	15
Downloads	5	3
Data volume	461.6 kB	410.4 kB

SenCat machine learning determination of senescence markers

Authors/Creators

Description

Machine learning guided identification of senescence markers

Workflow

Inputs

ML markers

Using ML markers for senescence scoring

Prepare data

Get senescence scores

Files

Files (136.8 kB)

Additional details

Software