Comprehensive methodology for sample enrichment in EEG biomarker studies for Alzheimer's risk classification
Description
This repository contains the derived EEG feature matrices used as direct input to the machine learning models reported in the article:
“Comprehensive methodology for sample enrichment in EEG biomarker studies for Alzheimer’s risk classification”.
The files are provided in Apache Feather format and correspond to the final, cleaned, and harmonized feature sets obtained after EEG preprocessing, feature extraction, and correlation-based feature reduction. These feature matrices were used as input to train and evaluate Random Forest classification models under different class imbalance ratios.
Each Feather file contains:
-
One row per subject
-
Columns corresponding to EEG-derived features, demographic covariates (e.g., age, sex), and the target group label
The machine learning pipeline includes:
-
Removal of highly correlated features
-
Stratified train–test splitting
-
Hyperparameter optimization using randomized grid search
-
Cross-validation–based performance evaluation
-
Feature importance ranking and incremental feature selection
The full analysis and training code used to generate the results is publicly available at:
Raw EEG data are publicly available in their original repositories and are referenced in the associated manuscript.
The datasets UdeA1 and UdeA2 are fully publicly available at: https://openneuro.org/datasets/ds007427
This Zenodo deposit provides the minimal anonymized dataset required to reproduce the reported analyses, in compliance with PLOS ONE data availability requirements.
Files
Files
(5.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:2f938668e082651187c21a4b5f46a84a
|
1.5 MB | Download |
|
md5:4f740e586351168b3cc8cda5fc43c4f7
|
1.9 MB | Download |
|
md5:872a4f2015e8a1a2c283e21c95f54a80
|
1.6 MB | Download |
Additional details
Related works
- Is supplemented by
- Journal article: 10.1371/journal.pone.0343722 (DOI)
Dates
- Created
-
2024
Software
- Repository URL
- https://github.com/GRUNECO/Data_analysis_ML_Harmonization_Proyect
- Programming language
- Python
- Development Status
- Active
References
- Henao Isaza, V., Aguillon, D., Tobón Quintero, C. A., Lopera, F., & Ochoa Gómez, J. F. (2026). Comprehensive methodology for sample enrichment in EEG biomarker studies for Alzheimer's risk classification. PLOS ONE. https://doi.org/10.1371/journal.pone.0343722