Published August 30, 2025 | Version v2
Journal article Open

Integrated Gene Expression Resource for Multi-Subtype Acute Myeloid Leukemia: A Global Reference Dataset

  • 1. Independent Researcher

Description

Acute Myeloid Leukemia (AML) is a highly heterogeneous hematologic malignancy with multiple clinically relevant subtypes. Despite the availability of numerous gene expression datasets, there is no comprehensive, balanced, and multi-cohort resource integrating all major AML subtypes. Here, we present an integrated gene expression resource for seven key AML subtypes, curated from eight publicly available GEO datasets (GSE13159, GSE6891, GSE14468, GSE15434, GSE10358, GSE61804, GSE71014, GSE12417). The dataset comprises 338 samples with standardized metadata, consistent labeling, and batch-corrected expression matrices. Advanced preprocessing, label harmonization, and balanced sampling ensure equal representation across subtypes. This resource is ready for machine learning, biomarker discovery, and cross-study validation, providing a global reference dataset for AML research.

Files

Integrated Gene Expression Resource for Multi-Subtype Acute Myeloid Leukemia A Global Reference Dataset.pdf

Additional details

References