Data and code for: "A conserved aging protein toolkit is deployed under lineage-specific regulatory environments across animal phyla"
Description
Summary: We compared protein sequence conservation with splice site evolutionary constraint across 100 genes in five functional categories, 140 species, and three animal phyla (Nematoda, Insecta, Mammalia). Aging pathway genes show high protein identity concordance across phyla yet low splice site constraint concordance — a protein–regulation inversion (Wilcoxon p = 0.019). Each phylum prioritizes different aging subcategories: nematodes invest in DNA damage response, mammals in mTOR nutrient sensing (Friedman p = 0.007). These results indicate that aging proteins are conserved as a biochemical toolkit, but each lineage deploys them within different regulatory environments.
Contents:
data/— Gene categories (100 genes, 5 categories) and phylogenetic distance matrices (3 phyla)scripts/— 22 Python analysis scripts (ANOVA, Mantel tests, Kendall W, dN/dS, figure generation)results/— All analysis outputs including splice site sequences (52,280 + 8,757 sequences), per-gene Mantel r values, protein identity (perc_id), dN/dS (PAML yn00), and statistical test results
Companion study: The splice site dataset (87 genes, 140 species) was originally compiled in a companion study (Tanigawa & Iwaki, submitted to Molecular Ecology). The present deposit includes the full shared dataset to ensure self-contained reproducibility.
Requirements: Python 3.8+, numpy, pandas, scipy, matplotlib, h5py, requests. Evo2 Docker image required for embedding computation (script #14). PAML + MAFFT required for dN/dS analysis (script #17). See README.md for details.
Large files not included: Evo2 HDF5 embeddings (4.4 GB) available upon request from the corresponding author.
Files
Additional details
Dates
- Submitted
-
2026-04-01Date when the dataset was created and first deposited to Zenodo.