Published May 23, 2025 | Version 2025-05-23
Dataset Open

COInr a comprehensive, non-redundant COI database from NCBI-nt and BOLD

  • 1. IMBE, Aix-Marseille University

Contributors

Contact person:

  • 1. IMBE, Aix-Marseille University

Description

COInr is a non-redundant, comprehensive database of COI sequences extracted from NCBI-nt and BOLD. It is not limited to a taxon, a gene region, or a taxonomic resolution. Sequences are dereplicated between databases and within taxa.

Each taxon has a unique taxonomic Identifier (taxID), fundamental to avoid ambiguous associations of homonyms and synonyms in the source database. TaxIDs form a coherent hierarchical system fully compatible with the NCBI taxIDs allowing creating their full or ranked linages.
 
COInr is a good starting point to create custom databases according to the users’ needs using mkCOInr scripts available at https://github.com/meglecz/mkCOInr  
It is possible to select/eliminate sequences for a list of taxa, select a specific gene region, select for minimum taxonomic resolution, add new custom sequences, and format the database for BLAST, QIIME, RDP classifiers.

 

Notes

COInr_2025_05_11 introduces changes in the selection of BOLD sequences compared to previous versions of COInr. Due to the relativelly frequent incorrectly labelled BOLD sequences among the ones lacking a BIN_URI, the following compromise has been adopted:

  • For taxa that include other sequences with a BIN_URI, sequences without a BIN_URI are excluded from COInr.

  • For taxa that only have sequences without a BIN_URI, those sequences are included in COInr.

Notes

Due to changes in NCBI taxonomy taxonomic rangs (domain vs. superkingdom), COInr_2025_05_11 had slightly incoherent taxID and lineages.

The latest version of mkCOInr (mkCOInr v.0.5.0) has been adapted to the new NCBI taxonomy file and the new version of the COInr (COInr_2025_05_23) replaces the previous (COInr_2025_05_11).

Files

Files (418.1 MB)

Name Size Download all
md5:040b81ea8510d3d883b5aa865053f2de
418.1 MB Download

Additional details

Related works

Is cited by
Software: https://github.com/meglecz/mkCOInr (URL)
Is published in
Preprint: 10.1101/2022.05.18.492423 (DOI)
Journal article: 10.1111/1755-0998.13756 (DOI)