Published May 27, 2025 | Version v1.0
Publication Open

CausalBiome: Invariance‐Driven Discovery of Microbial Biomarkers

Description

Identifying causal microbial biomarkers in high-dimensional observational data is challenging due to confounding and spurious correlations. We propose CausalBiome, a scalable framework that combines three invariance-based metrics, namely gradient-stability (consistency of per-feature loss gradients across random partitions), permutation-magnitude (mean AUC drop upon feature shuffling), and permutation-stability (variance of that drop) into a unified importance score. CausalBiome requires only a single ensemble model and simple variance calculations, avoiding costly graph estimation or bi-level optimization. On four merged Type-2 Diabetes microbiome cohorts (n = 746, p = 1,991), CausalBiome filtered to 52 prevalent taxa and achieved the highest Spearman correlation (ρ = 0.91) with held-out Area Under the ROC Curve (AUC) and lowest MSE (0.22) compared to permutation importance, LIME, Gini, and univariate rankings. Top candidates such as Collinsella aerofaciens, Faecalibacterium prausnitzii, and Blautia wexlerae align with known mechanistic roles in glucose metabolism and inflammation. CausalBiome thus offers a practical, interpretable tool for robust causal feature discovery in microbiome and other biomedical studies.

Files

SCS.pdf

Files (345.4 kB)

Name Size Download all
md5:b00630f6b15eba1bf742f304df79c486
345.4 kB Preview Download