Published February 18, 2026 | Version v1

Comprehensive methodology for sample enrichment in EEG biomarker studies for Alzheimer's risk classification

  • 1. ROR icon Universidad de Antioquia

Contributors

  • 1. ROR icon Universidad de Antioquia

Description

This repository contains the derived EEG feature matrices used as direct input to the machine learning models reported in the article:

“Comprehensive methodology for sample enrichment in EEG biomarker studies for Alzheimer’s risk classification”.

The files are provided in Apache Feather format and correspond to the final, cleaned, and harmonized feature sets obtained after EEG preprocessing, feature extraction, and correlation-based feature reduction. These feature matrices were used as input to train and evaluate Random Forest classification models under different class imbalance ratios.

Each Feather file contains:

  • One row per subject

  • Columns corresponding to EEG-derived features, demographic covariates (e.g., age, sex), and the target group label

The machine learning pipeline includes:

  • Removal of highly correlated features

  • Stratified train–test splitting

  • Hyperparameter optimization using randomized grid search

  • Cross-validation–based performance evaluation

  • Feature importance ranking and incremental feature selection

The full analysis and training code used to generate the results is publicly available at:

Raw EEG data are publicly available in their original repositories and are referenced in the associated manuscript.

The datasets UdeA1 and UdeA2 are fully publicly available at: https://openneuro.org/datasets/ds007427

This Zenodo deposit provides the minimal anonymized dataset required to reproduce the reported analyses, in compliance with PLOS ONE data availability requirements.

Files

Files (5.1 MB)

Name Size Download all
md5:2f938668e082651187c21a4b5f46a84a
1.5 MB Download
md5:4f740e586351168b3cc8cda5fc43c4f7
1.9 MB Download
md5:872a4f2015e8a1a2c283e21c95f54a80
1.6 MB Download

Additional details

Related works

Is supplemented by
Journal article: 10.1371/journal.pone.0343722 (DOI)

Dates

Created
2024

Software

Repository URL
https://github.com/GRUNECO/Data_analysis_ML_Harmonization_Proyect
Programming language
Python
Development Status
Active

References

  • Henao Isaza, V., Aguillon, D., Tobón Quintero, C. A., Lopera, F., & Ochoa Gómez, J. F. (2026). Comprehensive methodology for sample enrichment in EEG biomarker studies for Alzheimer's risk classification. PLOS ONE. https://doi.org/10.1371/journal.pone.0343722