Systematically Fused Multimodal Dataset and Explainable ML Results for Alzheimer's Risk Prediction

YOGESH DHANGAR

doi:10.5281/zenodo.19666728

Published April 20, 2026 | Version 1.0

Dataset Open

Systematically Fused Multimodal Dataset and Explainable ML Results for Alzheimer's Risk Prediction

YOGESH DHANGAR (Other)

This repository contains the datasets and cross-validation results associated with our research on predicting Alzheimer's risk.

*DISCLAIMER: Please note that the primary dataset provided here does not contain real, continuous patient records. It is a systematic, synthetically fused dataset created by mathematically combining metabolic indicators (from a diabetes dataset) and cognitive biometric data (from the DARWIN handwriting dataset) strictly for research, experimental validation, and academic purposes. It is not intended for direct clinical application.*

The repository includes:
1. Advanced Fused Dataset (Primary): The systematically generated dataset combining diabetes health indicators with cognitive handwriting features, used for training the predictive models.
2. DARWIN Raw Dataset: The original public handwriting dataset used as a foundation for cognitive feature extraction.
3. Model Cross-Validation Results: Excel files detailing the 5-fold cross-validation performance metrics (Accuracy, F1-Score, RMSE, etc.) for our XGBoost and Linear Regression models.
4. Feature Importance Analysis: Files containing SHAP values and feature importance scores that explain the contribution of specific metabolic and cognitive features to the model's predictions.

This data is provided to ensure full reproducibility of the results presented in our IEEE manuscript and to aid future research in explainable AI for clinical decision support systems.

Files

advanced_fused_dataset.csv

Files (1.1 MB)

Name	Size	Download all
advanced_fused_dataset.csv md5:928c7b6a01af3326905a5b4d1d7342de	308.6 kB	Preview Download
DARWIN.csv md5:202937dece4d933216e9161ffe56097d	740.5 kB	Preview Download
feature_importance.csv md5:af06ea62e683aba9cc0caf94ac737307	589 Bytes	Preview Download
model_cv_results.csv md5:58b2190763ea69554cd6a7b2f577cf01	1.3 kB	Preview Download

Additional details

Is derived from: Dataset: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database (URL); Dataset: https://www.kaggle.com/datasets/ninadaithal/imagesoasis?utm_source=chatgpt.com (URL)

	All versions	This version
Views	16	16
Downloads	8	8
Data volume	3.4 MB	3.4 MB

Systematically Fused Multimodal Dataset and Explainable ML Results for Alzheimer's Risk Prediction

Authors/Creators

Description

Files

advanced_fused_dataset.csv

Files (1.1 MB)

Additional details

Related works