Published April 7, 2023 | Version v1
Dataset Open

General principles for assignments of communities from eDNA: Open versus closed taxonomic databases

  • 1. Swiss Federal Institute of Aquatic Science and Technology
  • 2. Swiss Federal Institute of Technology in Zurich
  • 3. University of Bern

Description

Metabarcoding of environmental DNA (eDNA) is a powerful tool for describing biodiversity, such as finding keystone species or detecting invasive species in environmental samples. Continuous improvements in the method and the advances in sequencing platforms over the last decade have meant this approach is now widely used in biodiversity sciences and biomonitoring. For its general use, the method hinges on a correct identification of taxa. However, past studies have shown how this crucially depends on important decisions during sampling, sample processing, and subsequent handling of sequencing data. With no clear consensus as to the best practice, particularly the latter has led to varied bioinformatic approaches and recommendations for data preparation and taxonomic identification. In this study, using a large freshwater fish eDNA sequence dataset, we compared the frequently used zero-radius Operational Taxonomic Unit (zOTUs) approach of our raw reads and assigned it taxonomically i) in combination with publicly available reference sequences (open databases) or ii) with an OSU (Operational Sequence Units) database approach, using a curated database of reference sequences generated from specimen barcoding (closed database). We show both approaches gave comparable results for common species. However, the commonalities between the approaches decreased with read abundance and were thus less reliable and not comparable for rare species. The success of the zOTU approach depended on the suitability, rather than the size, of a reference database. Contrastingly, the OSU approach used reliable DNA sequences and thus often enabled species-level identifications, yet this resolution decreased with the recent phylogenetic age of the species. We show the need to include target group coverage, outgroups and full taxonomic annotation in reference databases to avoid misleading annotations that can occur when using short amplicon sizes as commonly used in eDNA metabarcoding studies. Finally, we make general suggestions to improve the construction and use of reference databases for metabarcoding studies in the future.

Notes

The folder contains the eDNA metabarcoded amplicon paired-end sequences as raw data set from the Miseq sequencer. Once unzipped, all raw sequencing files are available as fasta files and can be opened with a text editor or used for further downstream bioinformatic workflow. 

Funding provided by: Bundesamt für Umwelt
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100003338
Award Number: contract 00.5058.PZ / 6B1725F08

Funding provided by: Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100001711
Award Number: grant nr. 31003A_173074

Funding provided by: Universität Zürich
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100006447
Award Number: Research Priority Programme in Global Change and Biodiversity

Funding provided by: Bundesamt für Umwelt
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100003338
Award Number: 00.5058.PZ/6B1725F08

Funding provided by: Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100001711
Award Number: 31003A_173074

Files

p677_Mifish_amplicon.zip

Files (409.1 MB)

Name Size Download all
md5:65957025e0bddc1f217b8d473c298544
409.1 MB Preview Download
md5:ba38c1e9e6e2e262fedde3536cb752eb
1.5 kB Preview Download