This dataset includes all the databases/datasets queried by YAMP (https://github.com/alesssia/YAMP). This dataset has been created to help the users to get started with YAMP, and to save them from the hassle of collecting and downloading the data from different sources.
More in details, this dataset contains:
a FASTA file listing the adapter sequences to remove in the trimming step. This file is usually provided within the BBmap installation (https://sourceforge.net/projects/bbmap, version 37.68).
two FASTA files describing synthetic contaminants (sequencing_artifacts.fa.gz and phix174_ill.ref.fa.gz). These files are usually provided within the BBmap installation (https://sourceforge.net/projects/bbmap, version 37.68).
a FASTA file provided by Brian Bushnell for removing human contamination (described here: http://seqanswers.com/forums/showthread.php?t=42552). Please note that this file should be indexed beforehand. This can be done using BBMap, using the following command: `bbmap.sh -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz`.
the BowTie2 database file for MetaPhlAn2. This file is usually provided within the MetaPhlAn2 installation (version 2.6.0)
the ChocoPhlAn and UniRef (Uniref50, Uniref90) databases, downloaded directly by HUMAnN2 (version 0.9.9), as explained here: https://bitbucket.org/biobakery/humann2/wiki/Home#markdown-header-5-download-the-databases