Published November 15, 2024 | Version v1
Dataset Open

Greengenes2 training data formatted for DADA2 (Greengenes2 release version 2024.09)

  • 1. ROR icon North Carolina State University

Description

These DADA2-formatted training fasta files were derived from the Greengenes2 version 2024.09 release. https://ftp.microbio.me/greengenes_release/2024.09/

These fastas were generated by the following commands (using the dada2 R package version 1.35.4):

path <- "~/tax/GG2/2024_09"
fn <- file.path(path, "5b42d9b6-2f24-4f01-b989-9b4dafca7d5e/data/dna-sequences.fasta")
txfn <- file.path(path, "b7c3e691-ea51-4547-94dd-f79f49e41a36/data/taxonomy.tsv")

fn.out.gg <- "~/Desktop/gg2_2024_09_toGenus_trainset.fa.gz"
dada2:::makeTaxonomyFasta_GG2(fn, txfn, fn.out.gg, include.species=FALSE, compress=TRUE)

fn.out.spc.gg <- "~/Desktop/gg2_2024_09_toSpecies_trainset.fa.gz"
dada2:::makeTaxonomyFasta_GG2(fn, txfn, fn.out.spc.gg, include.species=TRUE, compress=TRUE)

Files

Files (136.8 MB)

Name Size Download all
md5:82a2571c9ff5009cbd2f3fded79069ed
67.8 MB Download
md5:fa78bff5f6c34c826e5c6a87cef49a58
69.1 MB Download

Additional details

Funding

Quantitative Metagenomics and the Vaginal Microbiome of Preterm Birth R35GM133745
National Institutes of Health

Software

Repository URL
https://github.com/benjjneb/dada2
Programming language
R
Development Status
Active

References

  • McDonald D, Jiang Y, Balaban M, Cantrell K, Zhu Q, Gonzalez A, Morton JT, Nicolaou G, Parks DH, Karst SM, Albertsen M. Greengenes2 unifies microbial data in a single reference tree. Nature biotechnology. 2024 May;42(5):715-8.
  • Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJ, Holmes SP. DADA2: High-resolution sample inference from Illumina amplicon data. Nature methods. 2016 Jul;13(7):581-3.