Extending GroupStruct2: a Bayesian and machine-learning framework for testing taxonomic hypotheses using morphometric data
Authors/Creators
- 1. Department of Integrative Biology, MSU Museum, Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, United States of America
- 2. Department of Biology, La Sierra University, Riverside, United States of America|Department of Herpetology, San Diego Natural History Museum, San Diego, United States of America|El Serpentario y C.E.M.A de Baja California Sur, La Paz, Mexico
Description
Despite considerable advances in statistical methods, taxonomic delimitation using morphometric data (morphometric delimitation) has not significantly progressed beyond the use of simple summary statistics or univariate tests to quantify differences among predefined operational taxonomic units (OTUs). These methods typically rely on visual inspection of graphs or p-value thresholds to determine if character means are statistically different. Tiburtini et al. (2025) introduced a conceptually different approach for morphometric delimitation using Bayesian model-testing and Gaussian Mixture Models (GMM). This approach can infer morphological clusters with or without a priori OTU groupings and jointly evaluates the fit of alternate taxonomic hypotheses to the data, providing a probabilistic, model-based framework that moves beyond traditional significance testing. Additionally, a machine-learning method was proposed to identify diagnostic characters based on a Random Forest classification algorithm. Initially developed for plant morphometrics, we adapted Tiburtini et al.'s approach for any morphometric dataset and integrated it into GroupStruct2, a Shiny R-based application with a full graphical user interface that also includes conventional statistical methods (e.g. univariate/multivariate tests, PCA, DAPC, MFA). We demonstrate that a more robust, nuanced, and comprehensive perspective on morphological variation and character diagnoses can be achieved using GroupStruct2's integrative workflow that combines classical statistical analyses with Bayesian GMM and machine-learning methods. The integration of frequentist and Bayesian methods within a user-friendly graphical interface democratizes access to robust statistical analyses and enables researchers to adopt quantitative rigor in taxonomic studies.
Files
ZK_article_182331.pdf
Files
(3.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:a717bf063846d1c65d3b39019f9ea67b
|
3.2 MB | Preview Download |
|
md5:591bff6116b8bd4cdb7378dfb75b6dec
|
76.3 kB | Preview Download |
Linked records
Additional details
References
- Blackith RE, Reyment RA (1971) Multivariate Morphometrics. Academic Press, London–New York, 412 pp.
- Chan KO, Grismer LL (2021) A standardized and statistically defensible framework for quantitative morphological analyses in taxonomic studies. Zootaxa 5023: 293–300. https://doi.org/10.11646/zootaxa.5023.2.9
- Chan KO, Grismer LL (2022) GroupStruct: an R package for allometric size correction. Zootaxa 5124: 471–482. https://doi.org/10.11646/zootaxa.5124.4.4
- Chan KO, Grismer LL (2025) GroupStruct2: a user-friendly graphical user interface for statistical and visual support in species diagnosis. Systematic Biology 2025: syaf090. https://doi.org/10.1093/sysbio/syaf090
- Grismer LL (2025) Introducing multiple factor analysis (MFA) as a diagnostic taxonomic tool complementing principal component analysis (PCA). ZooKeys 1248: 93–109. https://doi.org/10.3897/zookeys.1248.159516
- Grismer LL, del Pinto L, Quah ESHH, Anuar S, Cota MM, McGuire JA, Iskandar DT, Wood Jr PL, Grismer JL, Lee Grismer L, del Pinto L, Quah ESHH, Anuar S, Cota MM, McGuire JA, Iskandar DT, Wood PL, Grismer JL (2022) Phylogenetic and multivariate analyses of Gekko smithii Gray, 1842 recover a new species from Peninsular Malaysia and support the resurrection of G. albomaculatus (Giebel, 1861) from Sumatra. Vertebrate Zoology 72: 47–80. https://doi.org/10.3897/vz.72.e77702
- Grummer JA, Bryson RW, Reeder TW (2014) Species delimitation using Bayes Factors: simulations and application to the Sceloporus scalaris species group (Squamata: Phrynosomatidae). Systematic Biology 63: 119–133. https://doi.org/10.1093/sysbio/syt069
- Halsey LG (2019) The reign of the p-value is over: what alternative analyses could we employ to fill the power vacuum? Biology Letters 15: 20190174. https://doi.org/10.1098/rsbl.2019.0174
- Jeffreys H (1935) Some tests of significance, treated by the theory of probability. Mathematical Proceedings of the Cambridge Philosophical Society 31: 203–222. https://doi.org/10.1017/S030500410001330X
- Kass RE, Raftery AE (1995) Bayes Factors. Journal of the American Statistical Association 90: 773–795. https://doi.org/10.1080/01621459.1995.10476572
- Kornai D, Jiao X, Ji J, Flouri T, Yang Z (2024) Hierarchical heuristic species delimitation under the multispecies coalescent model with migration. Systematic Biology 73: 1015–1037. https://doi.org/10.1093/sysbio/syae050
- Kursa MB, Rudnicki WR (2010) Feature selection with the Boruta package. Journal of Statistical Software 36: 1–13. https://doi.org/10.18637/jss.v036.i11
- Leaché AD, Fujita MK, Minin VN, Bouckaert RR (2014) Species delimitation using genome-wide SNP data. Systematic Biology 63: 534–542. https://doi.org/10.1093/sysbio/syu018
- Michener CD, Sokal RR (1957) A quantitative approach to a problem in classification. Evolution 11: 130. https://doi.org/10.2307/2406046
- Nakagawa S, Cuthill IC (2007) Effect size, confidence interval and statistical significance: a practical guide for biologists. Biological Reviews 82: 591–605. https://doi.org/10.1111/j.1469-185X.2007.00027.x
- Padial JM, Miralles A, De la Riva I, Vences M (2010) The integrative future of taxonomy. Frontiers in Zoology 7: 16. https://doi.org/10.1186/1742-9994-7-16
- Posit Team (2025) RStudio: integrated development environment for R.
- Pyron RA, Connell KAO, Lamb JY, Beamer DA (2023) A new, narrowly endemic species of swamp-dwelling dusky salamander (Plethodontidae: Desmognathus) from the Gulf Coastal Plain of Mississippi and Alabama. Zootaxa 5133: 53–82. https://doi.org/10.11646/zootaxa.5133.1.3
- R Core Team (2025) R: a Language and Environment for Statistical Computing. https://doi.org/10.32614/r.manuals
- Rohlf FJ, Marcus LF (1993) A revolution morphometrics. Trends in Ecology and Evolution 8: 129–132. https://doi.org/10.1016/0169-5347(93)90024-J
- Schwarz G (1978) Estimating the dimension of a model. The Annals of Statistics 6: 461–464. https://doi.org/10.1214/aos/1176344136
- Scrucca L, Fop M, Murphy TB, Raftery AE (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. The R Journal 8: 289–317. https://doi.org/10.32614/rj-2016-021
- Sites JW, Marshall JC, Jr JWS, Marshall JC (2003) Delimiting species: a renaissance issue in systematic biology. Trends in Ecology and Evolution 18: 462–470. https://doi.org/10.1016/S0169-5347(03)00184-8
- Smith ML, Carstens BC (2022) Species delimitation using molecular data. In: Species Problems and Beyond: Contemporary Issues in Philosophy and Practice. CRC Press, Boca Raton, 145–160. https://doi.org/10.1201/9780367855604-9
- Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. The University of Kansas Science Bulletin 38(2): 1409–1438. https://doi.org/10.5281/zenodo.16435756
- Thorpe RS (1975) Quantitative handling of characters useful in snake systematics with particular reference to intraspecific variation in the Ringed Snake Natrix natrix. Biological Journal of the Linnean Society 7: 27–43. https://doi.org/10.1111/j.1095-8312.1975.tb00732.x
- Tiburtini M, Scrucca L, Peruzzi L (2025) Using Gaussian Mixture Models in plant morphometrics. Perspectives in Plant Ecology, Evolution, and Systematics 69: 125902. https://doi.org/10.1016/j.ppees.2025.125902
- Wickham H (2016) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag, New York, 260 pp. https://doi.org/10.1007/978-0-387-98141-3