Published November 14, 2023 | Version v1

Gaussian Mixture Models in R

Authors/Creators

  • 1. Les Laboratoires Servier SAS Recherche & Développement

Description

  Gaussian mixture models (GMMs) are widely used for modelling stochastic problems. Indeed, a wide diversity of packages have been developed in R. However, no recent review describing the main features offered by these packages and comparing their performances has been performed. In this article, we first introduce GMMs and the EM algorithm used to retrieve the parameters of the model and analyse the main features implemented among seven of the most widely used R packages. We then empirically compare their statistical and computational performances in relation with the choice of the initialisation algorithm and the complexity of the mixture.

 We demonstrate that the best estimation with well-separated components or with a small number of components with distinguishable modes is obtained with REBMIX initialisation, implemented in the \CRANpkg{rebmix} package, while the best estimation with highly overlapping components is obtained with *k*-means or random initialisation. Importantly, we show that implementation details in the EM algorithm yield differences in the parameters' estimation. Especially, packages \CRANpkg{mixtools} [@R-mixtools]  and \CRANpkg{Rmixmod} [@R-Rmixmod]  estimate the parameters of the mixture with smaller bias, while the RMSE and variability of the estimates is smaller with packages \CRANpkg{bgmm} [@R-bgmm] , \CRANpkg{EMCluster} [@R-EMCluster] , \CRANpkg{GMKMcharlie} [@R-GMKMcharlie], \CRANpkg{flexmix} [@R-flexmix]  and \CRANpkg{mclust} [@R-mclust].

 The comparison of these packages provides R users with useful recommendations for improving the computational and statistical performance of their clustering and for identifying  common deficiencies. Additionally, we propose several improvements in the development of a future, unified mixture model package.

Files

2021-201-bastien-chassagnol-gaussian-mixtures-in-R.zip

Files (69.3 MB)

Additional details

Dates

Accepted
2023-11-11