Published November 17, 2025 | Version v1
Preprint Open

Rethinking Statistics and Causality: Why Mechanisms Cannot Be Inferred from Data Distributions

  • 1. ROR icon National Taiwan University

Description

Statistical and causal inference have become universal currencies of explanation across the sciences, particularly in domains where underlying mechanisms remain opaque. Their apparent rigor—spanning psychology, economics and biomedicine—rests on the assumption that patterns within data can reveal the processes that generate them. Yet persistent mismatches between empirical predictions and real-world behaviour expose a deeper limitation: mechanisms cannot be inferred from data distributions alone. To address this limitation, we revisit the foundations of both paradigms, showing how statistical inference reduces explanation to geometric alignment, while causal inference, evolved from Bayes’ theorem and graphical models, extends this misstep by conflating probabilistic structure with causal truth. Both expose the same epistemic gap: data encode a lower-dimensional projection of structure, not the mechanism that generates it. We argue that understanding the world follows two routes: one is data-driven, expanding models toward richer function classes to achieve high-precision prediction, as exemplified by modern deep learning; the other is mechanism-driven, proposing and testing structural hypotheses as in the physical sciences. A robust framework requires both: data-driven models for high-precision prediction, and mechanistic models for reconstructing how the world produces the data we observe.

Files

Rethinking_Statistics.pdf

Files (2.1 MB)

Name Size Download all
md5:9750c00795cc071f8a9a4fe533495678
2.1 MB Preview Download