Approximating the Shapley Value with Many Players: A Hybrid Exact/Monte Carlo Estimator
Description
Computing the Shapley value exactly requires evaluating v(·) over all 2^n coalitions, which is intractable for n ≳ 25. This paper addresses the question of how to draw coalitions efficiently when exact computation is infeasible. We establish a level-stratified framework that clarifies the variance ordering of all existing drawing strategies, and propose two new estimators. The Hybrid Exact/Sampling estimator enumerates coalition-size levels with few coalitions exactly and samples only the large middle levels, achieving strictly lower variance than any purely stochastic method. The Neyman optimal-allocation variant minimises variance when within-level dispersion is heterogeneous. A unified comparative analysis covers the full spectrum of current methods: permutation sampler (Castro et al., 2009), level-stratified sampler (Maleki et al., 2013), KernelSHAP family (Lundberg & Lee, 2017; Covert & Lee, 2021; Olsen & Jullum, 2025), Owen/antithetic samplers (KhademSohi et al., 2025), and the non-asymptotic framework of Chen et al. (NeurIPS 2025). The bounded marginal impact index C(v) precisely characterises the conditions under which each method dominates. For the broad class of pseudo-continuous functions (C ≪ n), the hybrid reduces RMSE by 36–92% over the permutation sampler and one to two orders of magnitude over KernelSHAP at equal computational cost. A scaling experiment at n = 60 using a characteristic function with a closed-form Shapley value confirms that the advantage is preserved at larger n, and establishes a practical threshold rule: setting τ = C(n−1, 2) maintains six exact levels regardless of n.
Files
Shapley_Hybrid_Araar_April_2026.pdf
Files
(412.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:41f96d2a8ee8dceb407a2862707a1507
|
412.1 kB | Preview Download |
Additional details
Dates
- Created
-
2026-04-16