Published December 24, 2020 | Version v1
Dataset Open

DADA2 formatted taxonomy from GTDBr95

Authors/Creators

  • 1. Anand Agricultural University, Anand

Description

DADA2 requires taxonomy files in a specific format. This datasets are the files required to assign taxonomy using Genome Taxonomy Database (GTDB) database.

GTDB release 95.0 is the latest version of database released on July 17th, 2020. GTDB-r95 contains 30,238 bacterial and 1,672 archaeal species clusters which span 194,600 genomes. FASTA file of 16S rRNA gene sequences identified within the representative genomes of bacteria (21965) and Archaea (1126) were downloaded from this link on 24-12-2020. The link provides resources for GTDB species representatives hence, limiting one sequence per organism. The sequence headers were modified according to DADA2 requirements using regular expression based replace in Notepad++ (I was too lazy to do the same through awk/sed). 

Files  GTDBr95-Genus.fna and GTDBr95-Species.fna are to be used with assignTaxonomy and addSpecies commands of DADA2, respectively.

Prepared files were checked and found compatible when run on DADA2 v1.14.0 (R v3.6.3).

Files

Files (65.0 MB)

Name Size Download all
md5:66b315ae614502741fcfc40fa2afdbd6
33.0 MB Download
md5:9a0c92f6cf73173751af9e33d4662680
32.1 MB Download

Additional details

References