Published November 15, 2024 | Version v1
Dataset Open

RDP taxonomic training data formatted for DADA2 (RDP release 19 - update 2023-08-23)

  • 1. ROR icon North Carolina State University

Description

These DADA2-formatted training fasta files were derived from the Ribosomal Database Project's Training Set 19 and the 2023-08-23 release of the RDP database. https://sourceforge.net/projects/rdp-classifier/files/RDP_Classifier_TrainingData/

These fastas were generated by the following commands using the dada2 R package version 1.35.4:

## RDP data: https://sourceforge.net/projects/rdp-classifier/files/RDP_Classifier_TrainingData/

path <- "~/tax/rdp/v19"
fn.out.rdp <- "~/Desktop/rdp_19_toGenus_trainset.fa.gz"
dada2:::makeTaxonomyFasta_RDP(file.path(path, "trainset19_072023_speciesrank.fa"), 
                             file.path(path, "trainset19_db_taxid.txt"), 
                             fn.out.rdp, include.species=FALSE,
                             compress=TRUE)

fn.out.spc.rdp <- "~/Desktop/rdp_19_toSpecies_trainset.fa.gz"
dada2:::makeTaxonomyFasta_RDP(file.path(path, "trainset19_072023_speciesrank.fa"), 
                             file.path(path, "trainset19_db_taxid.txt"), 
                             fn.out.spc.rdp, include.species=TRUE,
                             compress=TRUE)

Files

Files (12.8 MB)

Name Size Download all
md5:390b8a359c45648adf538e72a1ee7e28
6.3 MB Download
md5:951c6d90f1bcc893411f0624b34663f5
6.5 MB Download

Additional details

Funding

National Institutes of Health
Quantitative Metagenomics and the Vaginal Microbiome of Preterm Birth R35GM133745

Software

Repository URL
https://github.com/benjjneb/dada2
Programming language
R
Development Status
Active

References

  • Wang Q, Garrity GM, Tiedje JM, Cole JR. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and environmental microbiology. 2007 Aug 15;73(16):5261-7.
  • Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJ, Holmes SP. DADA2: High-resolution sample inference from Illumina amplicon data. Nature methods. 2016 Jul;13(7):581-3.