Published May 17, 2026 | Version v3.0.0
Software Open

Forest-Guided Clustering — Shedding Light into the Random Forest Black Box

Description

Random Forests (RF), despite their widespread use and strong performance on tabular data, remain difficult to interpret due to their ensemble nature. We present Forest-Guided Clustering (FGC), a model-specific explainability method that reveals both local and global structure in RFs by grouping instances according to shared decision paths. FGC produces human-interpretable clusters aligned with the model's internal logic and computes cluster-specific and global feature importance scores to derive decision rules underlying RF predictions. FGC accurately recovered latent subclass structure on a benchmark dataset and outperformed classical clustering and post-hoc explanation methods. Applied to an AML transcriptomic dataset, FGC uncovered biologically coherent subpopulations, disentangled disease-relevant signals from confounders, and recovered known and novel gene expression patterns. FGC bridges the gap between performance and interpretability by providing structure-aware insights that go beyond feature-level attribution.

Notes

If you use this software in your research or applications, please cite the associated preprint below.

Files

HelmholtzAI-Consultants-Munich/fg-clustering-v3.0.0.zip

Files (64.7 MB)

Additional details