Simple approaches for evaluation of OTU quality based on dissimilarity arrays

Cros, Marie-Josée; Frigerio, Jean-Marc; Peyrard, Nathalie; Franc, Alain

doi:10.3897/mbmg.8.108649

Published March 14, 2024 | Version v1

Journal article Open

Simple approaches for evaluation of OTU quality based on dissimilarity arrays

1. Université de Toulouse, Auzeville-Tolosane, France
2. Université de Bordeaux, Talence, France|Université de Bordeaux, Cestas, France
3. Université de Bordeaux, Cestas, France|Université de Bordeaux, Talence, France

An accurate and complete taxonomic description of the diversity present in an environmental sample is out of reach at this time. Instead, metabarcoding is used today and it is expected that OTUs represent a category relevant for biodiversity inventories on a molecular basis. However, artefacts in the production of OTUs can occur at different stages and may impact ecological conclusions. We propose to evaluate the quality of OTUs in a sample by characterising the deviation of each OTU's dissimilarity array from that of an ideal OTU where all sequences are at distances smaller than the barcoding gap. We consider two deviations: the creation of composed OTUs, corresponding to the artificial merging of several OTUs and the creation of noisy OTUs that contain some sequences that are loosely associated with the core sequence of the OTUs and that do not form a compact subgroup. We propose a simple and automatic 2-step method that successively categorises the OTUs of a sample as composed or single and then identifies OTUs with noise amongst the single ones. The associated code is available at https://forgemia.inra.fr/alain.franc/otu_shape. We applied the method on 32 samples of diatoms from Arcachon Bay (France) that represent contrasted environmental conditions and we obtained good agreement with expert categorisation of OTUs. We suggest that single OTUs without noise can be used as such for further ecological studies. Composed OTUs should be post-treated with classical clustering or community detection tools. The quality of single OTUs with noise remains to be further tested via supplementary studies on a diversity of organisms.

Files

MBMG_article_108649.pdf

Files (1.2 MB)

Name	Size	Download all
MBMG_article_108649.pdf md5:6070e4372026f1624ff84da347302a50	1.2 MB	Preview Download

System files (135.9 kB)

Name	Size	Download all
application/vnd.taxpub.v1+xml md5:a252f1d3b6486f6641c184d64c9acb6c	135.9 kB	Download

Linked records

Additional details

Cites: Publication: 10.57745/7T2UCB (DOI); Publication: 10.1016/j.tree.2011.11.010 (DOI); Publication: 10.1098/rstb.2005.1725 (DOI); Publication: 10.1007/BF00994018 (DOI); Publication: 10.1201/9780367801700 (DOI); Publication: 10.6084/m9.figshare.20764690.v3 (DOI); Publication: 10.1007/s11222-007-9046-7 (DOI); Publication: 10.1016/j.physrep.2009.11.002 (DOI); Publication: 10.1038/s41467-017-01312-x (DOI); Publication: 10.1073/pnas.122653799 (DOI); Publication: 10.2307/2346439 (DOI); Publication: 10.1017/CBO9780511574931 (DOI); Publication: 10.1371/journal.pone.0017497 (DOI); Publication: 10.1016/0378-8733(83)90021-7 (DOI); Publication: 10.1111/1755-0998.12105 (DOI); Publication: 10.1002/bimj.4710200506 (DOI); Publication: 10.1007/s41109-019-0232-2 (DOI); Publication: 10.7717/peerj.593 (DOI); Publication: 10.7717/peerj.1420 (DOI); Publication: 10.18637/jss.v053.i09 (DOI); Publication: 10.1002/ece3.4757 (DOI); Publication: 10.3389/fevo.2022.859099 (DOI); Publication: 10.1093/database/baw016 (DOI); Publication: 10.1111/j.1365-294X.2012.05542.x (DOI); Publication: 10.1111/2041-210X.13552 (DOI)
Has part: Other: 10.3897/mbmg.8.108649.suppl1 (DOI)

Auby I, Méteigner C, Rumebe M, Chancerel E, Salin F, Aluome C, Barraquand F, Carassou L, Del Amo Y, Meleder V, Petit A, Picoche C, Frigerio JM, Franc A (2022) Malabar datasets used in study "OTU quality from dissimilarity arrays". Recherche Data Gouv, V1. https://doi.org/10.57745/7T2UCB
Bik HM, Porazinska DL, Creer S, Caporaso JG, Knight R, Thomas WK (2012) Sequencing our way towards understanding global eukaryotic biodiversity. Trends in Ecology & Evolution 27(4): 233–243. https://doi.org/10.1016/j.tree.2011.11.010
Blaxter M, Mann J, Chapman T, Thomas F, Whitton C, Floyd R, Abebe E (2005) Defining operational taxonomic units using DNA barcode data. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 360(1462): 1935–1943. https://doi.org/10.1098/rstb.2005.1725
Cortes C, Vapnik V (1995) Support-vector networks. Machine Learning 20(3): 273–297. https://doi.org/10.1007/BF00994018
Cox T, Cox MAA (2001) Multidimensional Scaling. In: Chapman Hall/CRC (Eds) Monographs on Statistics and Applied Probability, 2nd edn., Vol. 88, 328 pp. https://doi.org/10.1201/9780367801700
Cros MJ, Frigerio JM, Peyrard N, Franc A (2022) Code, dataset and results for the study "OTU quality from dissimilarity arrays". Figshare. https://doi.org/10.6084/m9.figshare.20764690.v3
Daudin JJ, Picard F, Robin S (2008) A mixture model for random graphs. Statistics and Computing 18(2): 173–183. https://doi.org/10.1007/s11222-007-9046-7
Fortunato S (2010) Community detection in graphs. Physics Reports 486(3-5): 75–174. https://doi.org/10.1016/j.physrep.2009.11.002
Frigerio JM, Rimet F, Bouchez A, Chancerel E, Chaumeil P, Salin F, Thérond S, Kahlert M, Franc A (2016) Diagno-syst: a tool for accurate inventories in metabarcoding. arXiv. https://arxiv.org/abs/1611.09410
Froslev T, Kjoller R, Bruun H, Ejrnaes R, Brunbjerg A, Pietroni C, Hansen A (2017) Algorithm for post-clustering curation of DNA amplicon data yields reliable biodiversity estimates. Nature Communications 8(1): 1188. https://doi.org/10.1038/s41467-017-01312-x
Girvan M, Newman M (2002) Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America 99(12): 7821–7826. https://doi.org/10.1073/pnas.122653799
Gower JC, Ross GJS (1969) Minimum spanning trees and single linkage cluster analysis. Applied Statistics 18(1): 54–64. https://doi.org/10.2307/2346439
Gusfield D (1997) Algorithms on Strings, Trees and Sequences. Cambridge University Press, 534 pp. https://doi.org/10.1017/CBO9780511574931
Hajibabaei M, Shokralla S, Zhou X, Singer GAC, Baird DJ (2011) Environmental barcoding: A next generation sequencing approach for biomonitoring applications using river benthos. PLOS ONE 6(4): e17497. https://doi.org/10.1371/journal.pone.0017497
Holland P, Laskey K, Leinhardt S (1983) Stochastic blockmodels: First steps. Social Networks 5(2): 109–137. https://doi.org/10.1016/0378-8733(83)90021-7
Kermarrec L, Franc A, Rimet F, Chaumeil P, Humbert JF, Bouchez A (2013) Next-generation sequencing to inventory taxonomic diversity in eukaryotic communities: A test for freshwater diatoms. Molecular Ecology Resources 13(4): 607–619. https://doi.org/10.1111/1755-0998.12105
Kopp B (1978) Hierarchical Classification I. Biometrical Journal. Biometrische Zeitschrift 20(5): 495–501. https://doi.org/10.1002/bimj.4710200506
Lee C, Wilkinson D (2019) A review of stochastic block models and extensions for graph clustering. Applied Network Science 4: 122. https://doi.org/10.1007/s41109-019-0232-2
Mahé F, Rognes T, Quince C, de Vargas C, Dunthorn M (2014) Swarm: Robust and fast clustering method for amplicon-based studies. PeerJ 2: e593. https://doi.org/10.7717/peerj.593
Mahé F, Rognes T, Quince C, de Vargas C, Dunthorn M (2015) Swarm v2: Highly-scalable and high-resolution amplicon clustering. PeerJ 3: e1420. https://doi.org/10.7717/peerj.1420
Müllner D (2013) fastcluster: Fast hierarchical, agglomerative clustering routines for R and Python. Journal of Statistical Software 53(9): 1–18. https://doi.org/10.18637/jss.v053.i09
Phillips JD, Gillis DJ, Hanner RH (2018) Incomplete estimates of genetic diversity within species: Implications for DNA barcoding. Ecology and Evolution 9(5): 2996–3010. https://doi.org/10.1002/ece3.4757
Phillips JD, Gillis DJ, Hanner RH (2022) Lack of statistical rigor in DNA barcoding likely invalidates the presence of a true species' barcode gap. Frontiers in Ecology and Evolution 10: 859099. https://doi.org/10.3389/fevo.2022.859099
Rimet F, Chaumeil P, Keck F, Kermarrec L, Vasselon V, Kahlert M, Franc A, Bouchez A (2016) R-Syst:diatom: an open-access and curated barcode database for diatoms and freshwater monitoring. Database (Oxford) 2016: baw016. https://doi.org/10.1093/database/baw016
Taberlet P, Coissac E, Hajibabaei M, Rieseberg L (2012) Environmental DNA. Molecular Ecology 2(8): 1789–1793. https://doi.org/10.1111/j.1365-294X.2012.05542.x
Zinger L, Lionnet C, Benoiston AS, Donald J, Mercier C, Boyer F (2021) metabaR: An R package for the evaluation and improvement of DNA metabarcoding data quality. Methods in Ecology and Evolution 12(4): 586–592. https://doi.org/10.1111/2041-210X.13552

	All versions	This version
Views	70	70
Downloads	312	312
Data volume	376.6 MB	376.6 MB

Biodiversity Literature Repository

Biodiversity Literature Repository

MBMG_article_108649.pdf

Files (1.2 MB)

System files (135.9 kB)

Related works

References

About

Funded by

Biodiversity Literature Repository

Biodiversity Literature Repository

Liberated Data

Simple approaches for evaluation of OTU quality based on dissimilarity arrays

Authors/Creators

Description

Files

MBMG_article_108649.pdf

Files (1.2 MB)

System files (135.9 kB)

Linked records

Additional details

Related works

References