Dataset Open Access

Silva 138.1 prokaryotic SSU taxonomic training data formatted for DADA2

McLaren, Michael R.; Callahan, Benjamin J.

These training fasta files are derived from the Silva Project's version 138.1 release and formatted for use with DADA2. These files are intended for use in classifying prokaryotic 16S sequencing data and are not appropriate for classifying eukaryotic ASVs.

See https://benjjneb.github.io/dada2/training.html for information about DADA2 reference databases and https://www.arb-silva.de/documentation/release-138.1/ for database and citation information for Silva 138.1. The Silva 138.1 database is licensed under Creative Commons Attribution 4.0 (CC-BY 4.0); see file "SILVA_LICENSE.txt". These fasta database files were generated and checked for consistency using the R markdown documents in the silva-138.1 folder in https://zenodo.org/record/4587946.

If you use these files, please cite one or both of the Silva references below (or at the above link) and the DADA2 paper (reference below). I also recommend citing or linking to the Zenodo record for this specific version in your Methods or published source code to record the specific taxonomic database files used in your analysis.

NOTE: These database files have a known problem in 3/895 families and 59/3936 genera. See https://github.com/mikemc/dada2-reference-databases/blob/main/silva-138.1/v1/bad-taxa.csv for a list of affected taxa and https://github.com/benjjneb/dada2/issues/1293 for more information.

Files (354.3 MB)
Name Size
SILVA_LICENSE.txt
md5:79a80df3eb578830f29d004f0d7ea107
451 Bytes Download
silva_nr99_v138.1_train_set.fa.gz
md5:6b41db7139834c71171f8ce5b5918fc6
137.3 MB Download
silva_nr99_v138.1_wSpecies_train_set.fa.gz
md5:ba13ab369161e0cb85df7e0ee3a4182e
138.3 MB Download
silva_species_assignment_v138.1.fa.gz
md5:f21c2d97c79ff07c17949a9622371a4c
78.7 MB Download
  • Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucl. Acids Res. 41 (D1): D590-D596.

  • Yilmaz P, Parfrey LW, Yarza P, Gerken J, Pruesse E, Quast C, Schweer T, Peplies J, Ludwig W, Glöckner FO (2014) The SILVA and "All-species Living Tree Project (LTP)" taxonomic frameworks. Nucl. Acids Res. 42:D643-D648

  • Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. 2016. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods 13:581–583. doi:10.1038/nmeth.3869

14,170
37,649
views
downloads
All versions This version
Views 14,17014,168
Downloads 37,64937,649
Data volume 4.2 TB4.2 TB
Unique views 11,76311,761
Unique downloads 14,57914,579

Share

Cite as