Dataset Open Access
Morien, Evan; Parfrey, Laura W.
These are species-level taxonomy classification training sets for the assignTaxonomy function from the dada2 R package.
The v132 training set includes every Eukaryotic organism from SILVA's v132 database, clustered at 99% similarity.
The v128 training set includes every Eukaryotic organism from SILVA's v128 database, clustered at 99% similarity. Additionally, it includes corrected species labels for the Blastocystis clade, and 37 Entamoeba sequences sourced from GenBank not present in the original v128 db. The v128 training set is modified specifically to allow for better species-level assignments for those two clades in mammalian gut microbiome studies.