Greengenes2 training data formatted for DADA2 (Greengenes2 release version 2024.09)
Description
These DADA2-formatted training fasta files were derived from the Greengenes2 version 2024.09 release. https://ftp.microbio.me/greengenes_release/2024.09/
These fastas were generated by the following commands (using the dada2 R package version 1.35.4):
path <- "~/tax/GG2/2024_09"
fn <- file.path(path, "5b42d9b6-2f24-4f01-b989-9b4dafca7d5e/data/dna-sequences.fasta")
txfn <- file.path(path, "b7c3e691-ea51-4547-94dd-f79f49e41a36/data/taxonomy.tsv")fn.out.gg <- "~/Desktop/gg2_2024_09_toGenus_trainset.fa.gz"
dada2:::makeTaxonomyFasta_GG2(fn, txfn, fn.out.gg, include.species=FALSE, compress=TRUE)fn.out.spc.gg <- "~/Desktop/gg2_2024_09_toSpecies_trainset.fa.gz"
dada2:::makeTaxonomyFasta_GG2(fn, txfn, fn.out.spc.gg, include.species=TRUE, compress=TRUE)
Files
Files
(136.8 MB)
Name | Size | Download all |
---|---|---|
md5:82a2571c9ff5009cbd2f3fded79069ed
|
67.8 MB | Download |
md5:fa78bff5f6c34c826e5c6a87cef49a58
|
69.1 MB | Download |
Additional details
Funding
- Quantitative Metagenomics and the Vaginal Microbiome of Preterm Birth R35GM133745
- National Institutes of Health
Software
- Repository URL
- https://github.com/benjjneb/dada2
- Programming language
- R
- Development Status
- Active
References
- McDonald D, Jiang Y, Balaban M, Cantrell K, Zhu Q, Gonzalez A, Morton JT, Nicolaou G, Parks DH, Karst SM, Albertsen M. Greengenes2 unifies microbial data in a single reference tree. Nature biotechnology. 2024 May;42(5):715-8.
- Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJ, Holmes SP. DADA2: High-resolution sample inference from Illumina amplicon data. Nature methods. 2016 Jul;13(7):581-3.