AlzScPred: Prediction of Alzheimer's Disease from Single‑Cell Transcriptomics Using Deep Learning
Authors/Creators
Description
Title:
AlzScPred Dataset – Single‑cell/nucleus transcriptomics profiles for Alzheimer’s disease classification using deep learning
Description:
Project: AlzScPred – Prediction of Alzheimer’s Disease from Single‑Cell Transcriptomics Using Deep Learning
Publication: Srivastava, A., Dhall, A., Patiyal, S., Arora, A., Jarwal, A., & Raghava, G.P.S. (2023). Prediction of Alzheimer's Disease from Single Cell Transcriptomics Using Deep Learning. bioRxiv. https://doi.org/10.1101/2023.07.07.548171
Overview:
This repository accompanies the AlzScPred publication and provides curated single‑nucleus RNA‑sequencing (snRNA‑seq) data, feature‑selected gene sets, and trained deep learning models for classifying Alzheimer’s disease (AD) versus normal control (NC) samples. Unlike bulk‑tissue analyses, this work leverages single‑cell resolution to identify cell‑type‑specific transcriptional biomarkers.
Data Curation & Preprocessing:
Raw 10x snRNA‑seq data were obtained from GEO (GSE157827). Preprocessing performed using scanpy:
-
Conversion of sparse Cell Ranger output to feature‑barcode matrices
-
Removal of NaN/zero cells and irrelevant genes
-
Normalization (
scanpy.pp.normalize_total) -
Final selection of 5,401 expressed genes present in both AD and NC samples
-
80/20 training/validation split (stratified by patient)
Model Architecture:
ANN with 1 input layer (size = number of selected genes), 3 hidden layers (dropout 0.3 each), and 1 output layer (binary classification). Implemented in TensorFlow/Keras. Performance on independent validation set:
-
Top 100 genes: Accuracy 82%, AUROC 0.84
-
Top 35 genes: Accuracy 74%, AUROC 0.75
Gene Ontology (GO) Analysis:
The 35‑gene panel is enriched in binding activity (GO:0005488), catalytic activity (GO:0003824), ATP‑dependent activity (GO:0140657), and transporter activity (GO:0005215). Notably, 21 of the 35 genes have prior literature evidence linking them to AD or neurodegeneration (e.g., CHD7, FGF17, FOXN2, CDK18, UBE2Z), while 14 are novel candidates requiring further validation.
Usage:
These datasets and models are designed for:
-
Benchmarking machine/deep learning classifiers on single‑cell AD transcriptomic data
-
Validating the 35‑gene biomarker panel in independent snRNA‑seq cohorts
-
Transfer learning or feature selection in other neurodegenerative disease studies
-
Developing web‑based diagnostic tools for single‑cell RNA‑seq analysis
Related Resources:
The complete code and end‑to‑end Python package will be made available at the project’s GitHub repository (link to be added upon publication). For inquiries about model re‑training or custom feature selection, please contact the corresponding author.
License: CC BY 4.0 (as stated in the bioRxiv preprint)
Contact:
Prof. Gajendra P. S. Raghava
Files
raghavagps/AlzScPred-v1.0.zip
Files
(997.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:81427f3075e3916364977192f1536c07
|
997.6 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/raghavagps/AlzScPred/tree/v1.0 (URL)
Software
- Repository URL
- https://github.com/raghavagps/AlzScPred