DADA2 formatted taxonomy from GTDBr95
Description
DADA2 requires taxonomy files in a specific format. This datasets are the files required to assign taxonomy using Genome Taxonomy Database (GTDB) database.
GTDB release 95.0 is the latest version of database released on July 17th, 2020. GTDB-r95 contains 30,238 bacterial and 1,672 archaeal species clusters which span 194,600 genomes. FASTA file of 16S rRNA gene sequences identified within the representative genomes of bacteria (21965) and Archaea (1126) were downloaded from this link on 24-12-2020. The link provides resources for GTDB species representatives hence, limiting one sequence per organism. The sequence headers were modified according to DADA2 requirements using regular expression based replace in Notepad++ (I was too lazy to do the same through awk/sed).
Files GTDBr95-Genus.fna and GTDBr95-Species.fna are to be used with assignTaxonomy and addSpecies commands of DADA2, respectively.
Prepared files were checked and found compatible when run on DADA2 v1.14.0 (R v3.6.3).
Files
Files
(65.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:66b315ae614502741fcfc40fa2afdbd6
|
33.0 MB | Download |
|
md5:9a0c92f6cf73173751af9e33d4662680
|
32.1 MB | Download |
Additional details
References
- Parks, D.H., et al. (2020). "A complete domain-to-species taxonomy for Bacteria and Archaea." Nature Biotechnology, https://doi.org/10.1038/s41587-020-0501-8.
- Callahan, B.J. et al. (2016). "DADA2: High-resolution sample inference from Illumina amplicon data." Nature Methods, https://doi.org/10.1038/nmeth.3869.