Published June 19, 2019 | Version v1
Journal article Open

The Catalogue of Life: Assembling data into a global taxonomic checklist

  • 1. Illinois Natural History Survey, University of Illinois at Urbana-Champaign, Champaign, IL, United States of America

Description

Producing a global taxonomic checklist of all species is essential for indexing biodiversity data, and for providing the basic knowledge needed to study, manage, and conserve biological diversity. The Catalogue of Life (CoL) aims to provide a global taxonomic checklist of all species, and includes 1.9 million species names in the 2019 annual edition. The task of assembling data into CoL is complex and requires reformatting data, quality assurance testing, and collaborating with data providers to resolve detected taxonomic conflicts. Global Species Databases (GSDs) are submitted in a wide variety of data formats to CoL by hundreds of taxonomic experts and institutions. Submitted data are reformatted to a standard data submission format: CoL Standard Dataset (ACEF), DarwinCore, or CoLDP. A series of standardized data integrity checks are run to detect and resolve frequently occurring data quality problems including character encoding corruption, non-Latin characters in scientific names, missing parents, duplicated and homonymic names within the GSD and among other GSDs, split taxonomic groups that have been assigned to multiple parent taxa, and other issues. The process and challenges of assembling data into the Catalogue of Life, and future directions of the project in migrating to CoL+ infrastructure will be discussed.

Files

BISS_article_37221.pdf

Files (66.4 kB)

Name Size Download all
md5:9fffe2ad9d640d973ec6093af87949e0
58.7 kB Preview Download
md5:729161598e140c338fa6da8b0922ef78
7.7 kB Preview Download

Linked records