CausalBiome: Invariance‐Driven Discovery of Microbial Biomarkers
Description
Identifying causal microbial biomarkers in high-dimensional observational data is challenging due to confounding and spurious correlations. We propose CausalBiome, a scalable framework that combines three invariance-based metrics, namely gradient-stability (consistency of per-feature loss gradients across random partitions), permutation-magnitude (mean AUC drop upon feature shuffling), and permutation-stability (variance of that drop) into a unified importance score. CausalBiome requires only a single ensemble model and simple variance calculations, avoiding costly graph estimation or bi-level optimization. On four merged Type-2 Diabetes microbiome cohorts (n = 746, p = 1,991), CausalBiome filtered to 52 prevalent taxa and achieved the highest Spearman correlation (ρ = 0.91) with held-out Area Under the ROC Curve (AUC) and lowest MSE (0.22) compared to permutation importance, LIME, Gini, and univariate rankings. Top candidates such as Collinsella aerofaciens, Faecalibacterium prausnitzii, and Blautia wexlerae align with known mechanistic roles in glucose metabolism and inflammation. CausalBiome thus offers a practical, interpretable tool for robust causal feature discovery in microbiome and other biomedical studies.
Files
SCS.pdf
Files
(345.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:b00630f6b15eba1bf742f304df79c486
|
345.4 kB | Preview Download |