Systematically Fused Multimodal Dataset and Explainable ML Results for Alzheimer's Risk Prediction
Authors/Creators
Description
This repository contains the datasets and cross-validation results associated with our research on predicting Alzheimer's risk.
*DISCLAIMER: Please note that the primary dataset provided here does not contain real, continuous patient records. It is a systematic, synthetically fused dataset created by mathematically combining metabolic indicators (from a diabetes dataset) and cognitive biometric data (from the DARWIN handwriting dataset) strictly for research, experimental validation, and academic purposes. It is not intended for direct clinical application.*
The repository includes:
1. Advanced Fused Dataset (Primary): The systematically generated dataset combining diabetes health indicators with cognitive handwriting features, used for training the predictive models.
2. DARWIN Raw Dataset: The original public handwriting dataset used as a foundation for cognitive feature extraction.
3. Model Cross-Validation Results: Excel files detailing the 5-fold cross-validation performance metrics (Accuracy, F1-Score, RMSE, etc.) for our XGBoost and Linear Regression models.
4. Feature Importance Analysis: Files containing SHAP values and feature importance scores that explain the contribution of specific metabolic and cognitive features to the model's predictions.
This data is provided to ensure full reproducibility of the results presented in our IEEE manuscript and to aid future research in explainable AI for clinical decision support systems.
Files
advanced_fused_dataset.csv
Additional details
Related works
- Is derived from
- Dataset: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database (URL)
- Dataset: https://www.kaggle.com/datasets/ninadaithal/imagesoasis?utm_source=chatgpt.com (URL)