# Source Data Directory

This directory contains the input data files used in the analysis.

## Files Not Included (Too Large for Zenodo)

Some source files exceed Zenodo file-size limits and are downloaded directly from public databases:

- **GSE106780_series_matrix.txt.gz**: Available from GEO at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE106780
- **GPL21572_annotation.txt**: Available from GEO platform page at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL21572
- **miRTarBase_Functional_MTI.csv**: Available from https://mirtarbase.cuhk.edu.cn/ (2025 release)
- **CellAge_genes.csv**: Available from https://genomics.senescence.info/cell_age/ (Avelar et al. 2020, Genome Biol)
- **GenAge_genes.csv**: Available from https://genomics.senescence.info/genes/ (de Magalhaes et al. 2024, Nucleic Acids Res)
- **MSigDB_Hallmark_all50.gmt** and **MSigDB_C2_KEGG_MEDICUS.gmt**: Available from https://www.gsea-msigdb.org/ (v2026.1)
- **STRING_interaction_export.tsv**: Generated via STRING v12.0 web interface at https://string-db.org/ (combined score >= 0.7)

## Download Dates

All external databases were accessed between 2026-04-25 and 2026-04-27. Exact access dates are recorded in `manifests/input_manifest.csv`.

## Reproducibility

The `manifests/input_manifest.csv` file provides exact version identifiers and access dates for each source file. Combined with the R and Python scripts in `../scripts/` and the environment documentation in `../ENVIRONMENT.md`, users can reproduce the full analysis pipeline by downloading the source files listed above.
