Published May 4, 2026 | Version 2026-05-03

COInr a comprehensive, non-redundant COI database from NCBI-nt and BOLD

Authors/Creators

  • 1. IMBE, Aix-Marseille University

Contributors

Contact person:

  • 1. IMBE, Aix-Marseille University

Description

COInr is a non-redundant, comprehensive database of COI sequences extracted from NCBI-nt and BOLD. It is not limited to a taxon, a gene region, or a taxonomic resolution. Sequences are dereplicated between databases and within taxa.

Each taxon has a unique taxonomic Identifier (taxID), fundamental to avoid ambiguous associations of homonyms and synonyms in the source database. TaxIDs form a coherent hierarchical system fully compatible with the NCBI taxIDs allowing creating their full or ranked linages.
 
COInr is a good starting point to create custom databases according to the users’ needs using mkCOInr scripts available at https://github.com/meglecz/mkCOInr  
It is possible to select/eliminate sequences for a list of taxa, select a specific gene region, select for minimum taxonomic resolution, add new custom sequences, and format the database for BLAST, QIIME, RDP classifiers.

 

Notes

From COInr_2025_05_11 onward the selection of BOLD sequences is modified. Due to the relativelly frequent incorrectly labelled BOLD sequences among the ones lacking a BIN_URI, the following compromise has been adopted:

  • For taxa that include other sequences with a BIN_URI, sequences without a BIN_URI are excluded from COInr.

  • For taxa that only have sequences without a BIN_URI, those sequences are included in COInr.

Files

Files (474.8 MB)

Name Size
md5:115ffd0e04b65a409dd7baffbf054188
474.8 MB Download

Additional details

Related works

Is cited by
Software: https://github.com/meglecz/mkCOInr (URL)
Is published in
Preprint: 10.1101/2022.05.18.492423 (DOI)
Journal article: 10.1111/1755-0998.13756 (DOI)