Published April 24, 2023 | Version v1
Journal article Open

Enhancing DNA barcode reference libraries by harvesting terrestrial arthropods at the Smithsonian's National Museum of Natural History

  • 1. National Museum of Natural History, Smithsonian Institution, Washington, United States of America|Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum National d'Histoire naturelle, CNRS, SU, EPHE, UA, Paris, France
  • 2. Centre for Biodiversity Genomics, University of Guelph, Guelph, Canada
  • 3. National Museum of Natural History, Smithsonian Institution, Washington, United States of America
  • 4. Systematic Entomology Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, U.S. Department of Agriculture, Washington, United States of America
  • 5. Centre for Biodiversity Genomics, University of Guelph, Guelph, Canada|Department of Integrative Biology, University of Guelph, Guelph, Canada
  • 6. Centre for Biodiversity Genomics, University of Guelph, Guelph, Canada|National Museum of Natural History, Smithsonian Institution, Washington, United States of America|School of Environmental Sciences, University of Guelph, Guelph, Canada

Description

The use of DNA barcoding has revolutionised biodiversity science, but its application depends on the existence of comprehensive and reliable reference libraries. For many poorly known taxa, such reference sequences are missing even at higher-level taxonomic scales. We harvested the collections of the Smithsonian's National Museum of Natural History (USNM) to generate DNA barcoding sequences for genera of terrestrial arthropods previously not recorded in one or more major public sequence databases. Our workflow used a mix of Sanger and Next-Generation Sequencing (NGS) approaches to maximise sequence recovery while ensuring affordable cost. In total, COI sequences were obtained for 5,686 specimens belonging to 3,737 determined species in 3,886 genera and 205 families distributed in 137 countries. Success rates varied widely according to collection data and focal taxon. NGS helped recover sequences of specimens that failed a previous run of Sanger sequencing. Success rates and the optimal balance between Sanger and NGS are the most important drivers to maximise output and minimise cost in future projects. The corresponding sequence and taxonomic data can be accessed through the Barcode of Life Data System, GenBank, the Global Biodiversity Information Facility, the Global Genome Biodiversity Network Data Portal and the NMNH data portal.

Files

BDJ_article_100904.pdf

Files (456.5 kB)

Name Size Download all
md5:b9b3965d96964baba46fe8dd5fa7053a
456.5 kB Preview Download

System files (154.1 kB)

Name Size Download all
md5:d360628d8e50957db34a8241b3648301
154.1 kB Download

Linked records

Additional details