Gaussian Mixture Models in R
Description
Gaussian mixture models (GMMs) are widely used for modelling stochastic problems. Indeed, a wide diversity of packages have been developed in R. However, no recent review describing the main features offered by these packages and comparing their performances has been performed. In this article, we first introduce GMMs and the EM algorithm used to retrieve the parameters of the model and analyse the main features implemented among seven of the most widely used R packages. We then empirically compare their statistical and computational performances in relation with the choice of the initialisation algorithm and the complexity of the mixture.
We demonstrate that the best estimation with well-separated components or with a small number of components with distinguishable modes is obtained with REBMIX initialisation, implemented in the \CRANpkg{rebmix} package, while the best estimation with highly overlapping components is obtained with *k*-means or random initialisation. Importantly, we show that implementation details in the EM algorithm yield differences in the parameters' estimation. Especially, packages \CRANpkg{mixtools} [@R-mixtools] and \CRANpkg{Rmixmod} [@R-Rmixmod] estimate the parameters of the mixture with smaller bias, while the RMSE and variability of the estimates is smaller with packages \CRANpkg{bgmm} [@R-bgmm] , \CRANpkg{EMCluster} [@R-EMCluster] , \CRANpkg{GMKMcharlie} [@R-GMKMcharlie], \CRANpkg{flexmix} [@R-flexmix] and \CRANpkg{mclust} [@R-mclust].
The comparison of these packages provides R users with useful recommendations for improving the computational and statistical performance of their clustering and for identifying common deficiencies. Additionally, we propose several improvements in the development of a future, unified mixture model package.
Files
2021-201-bastien-chassagnol-gaussian-mixtures-in-R.zip
Files
(69.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:d0c98d7d2307534566947c6c9847728e
|
69.3 MB | Preview Download |
Additional details
Dates
- Accepted
-
2023-11-11