Published October 7, 2016 | Version v2.0
Software Open

bakerccm/entrez_qiime: entrez_qiime v2.0

Creators

  • 1. Princeton University

Description

This is an updated release of a Python script (entrez_qiime.py) and accompanying guidelines (entrez_qiime.pdf) for a workflow to take an input FASTA file generated from the NCBI database (e.g. through an Entrez/gquery search) and generate the id-to-taxonomy mapping file needed to BLAST metabarcode data against those sequences using the QIIME script assign_taxonomy.py.

The original version of this workflow script and documentation has been available from the author Chris Baker since at least October 2012. It was uploaded to this GitHub repository and formally released as v1.0 essentially unchanged in September 2016.

This updated release in October 2016 includes a major change to the operation of the script. Instead of taking FASTA files with GI numbers as the sequence identifiers, it now takes FASTA files with NCBI accession.version numbers as the sequence identifiers. This change is intended to allow this workflow to continue being used when the NCBI phases out GI numbers in the GenBank, GenPept, and FASTA formats supported by NCBI for sequence records (https://www.ncbi.nlm.nih.gov/news/03-02-2016-phase-out-of-GI-numbers/).

In addition to this change: (i) the script now takes either a FASTA file, as before, or a list of accession numbers as input; (ii) the script now only outputs two files - the id-to-taxonomy mapping file as required by qiime, plus a logfile; (iii) sequences in the FASTA file (or list file) that do not appear in the taxonomy database are now included in the output, but with "NA;NA;NA;..." as their taxonomy string.

The PDF documentation has also been updated to reflect these changes.

Files

bakerccm/entrez_qiime-v2.0.zip

Files (149.4 kB)

Name Size Download all
md5:667cf3624d80c6477a6f693d6cc49405
149.4 kB Preview Download

Additional details

Related works