Published April 28, 2026 | Version v1.0
Software Open

AlzScPred: Prediction of Alzheimer's Disease from Single‑Cell Transcriptomics Using Deep Learning

  • 1. ROR icon Indraprastha Institute of Information Technology Delhi

Description

Title:
AlzScPred Dataset – Single‑cell/nucleus transcriptomics profiles for Alzheimer’s disease classification using deep learning

Description:

Project: AlzScPred – Prediction of Alzheimer’s Disease from Single‑Cell Transcriptomics Using Deep Learning

Publication: Srivastava, A., Dhall, A., Patiyal, S., Arora, A., Jarwal, A., & Raghava, G.P.S. (2023). Prediction of Alzheimer's Disease from Single Cell Transcriptomics Using Deep Learning. bioRxiv. https://doi.org/10.1101/2023.07.07.548171

Overview:
This repository accompanies the AlzScPred publication and provides curated single‑nucleus RNA‑sequencing (snRNA‑seq) data, feature‑selected gene sets, and trained deep learning models for classifying Alzheimer’s disease (AD) versus normal control (NC) samples. Unlike bulk‑tissue analyses, this work leverages single‑cell resolution to identify cell‑type‑specific transcriptional biomarkers.

Data Curation & Preprocessing:
Raw 10x snRNA‑seq data were obtained from GEO (GSE157827). Preprocessing performed using scanpy:

  • Conversion of sparse Cell Ranger output to feature‑barcode matrices

  • Removal of NaN/zero cells and irrelevant genes

  • Normalization (scanpy.pp.normalize_total)

  • Final selection of 5,401 expressed genes present in both AD and NC samples

  • 80/20 training/validation split (stratified by patient)

Model Architecture:
ANN with 1 input layer (size = number of selected genes), 3 hidden layers (dropout 0.3 each), and 1 output layer (binary classification). Implemented in TensorFlow/Keras. Performance on independent validation set:

  • Top 100 genes: Accuracy 82%, AUROC 0.84

  • Top 35 genes: Accuracy 74%, AUROC 0.75

Gene Ontology (GO) Analysis:
The 35‑gene panel is enriched in binding activity (GO:0005488), catalytic activity (GO:0003824), ATP‑dependent activity (GO:0140657), and transporter activity (GO:0005215). Notably, 21 of the 35 genes have prior literature evidence linking them to AD or neurodegeneration (e.g., CHD7, FGF17, FOXN2, CDK18, UBE2Z), while 14 are novel candidates requiring further validation.

Usage:
These datasets and models are designed for:

  • Benchmarking machine/deep learning classifiers on single‑cell AD transcriptomic data

  • Validating the 35‑gene biomarker panel in independent snRNA‑seq cohorts

  • Transfer learning or feature selection in other neurodegenerative disease studies

  • Developing web‑based diagnostic tools for single‑cell RNA‑seq analysis

Related Resources:
The complete code and end‑to‑end Python package will be made available at the project’s GitHub repository (link to be added upon publication). For inquiries about model re‑training or custom feature selection, please contact the corresponding author.

License: CC BY 4.0 (as stated in the bioRxiv preprint)

Contact:
Prof. Gajendra P. S. Raghava

Files

raghavagps/AlzScPred-v1.0.zip

Files (997.6 kB)

Name Size Download all
md5:81427f3075e3916364977192f1536c07
997.6 kB Preview Download

Additional details

Related works