Date: 15 Jan 2019 Author: Alessio Milanese (milanese@embl.de) This directory contains the simulated metagenomic samples that we used for benchmarking the mOTUs v2 tool. Find additional information at http://motu-tool.org CONTENT OF THE DIRECTORY ------------------------------------------------------- - sampleX.tar.gz, metagenomic samples (10 in total, paired end reads in two files for every tar.gz); - README.txt, this file - gold_standard.tar.gz * gold_standard.tar.gz: Contains 2 directories and 1 file: - relative_abundances (DIRECTORY) contains 10 files that provide the relative abundance of the MAGs used for the simulation (i.e. the gold standard) - profiles (DIRECTORY) contains the profiles of the 10 samples with 4 profilers - mOTUs1 - mOTUs2 version 2.0.0 - MetaPhlAn version 2.6.0 (19 August 2016) [biobakery-metaphlan2-c3fb65390c21] - Kraken (+ Bracken) v1.0.0 and minikraken_20171101_8GB_dustmasked database - match_MAG_to_phylotypes (FILE) contains the map between the MAGs and the phylotypes measured by the metagenomic profilers MAG = metagenome-assembled genomes SIMULATION DESCRIPTION --------------------------------------------------------- To be able to assess taxonomic quantification accuracy, ten human gut metagenomic samples were simulated using 19,302 Human gut MAGs (see http://motu-tool.org). Metagenomic read data were simulated using BEAR (Johnson, S., et al., A better sequence-read simulator program for metagenomics. BMC Bioinformatics, 2014): first, we simulated 100M inserts (2 x 100M paired-end reads of 150 nt length) with 350 nt insert distance (standard deviation: 30) using generate_reads.py. Second, trim_reads.pl with default parameters was used to add the quality scores, introduce errors and shorten the reads. Every sample was simulated based on actual mOTU relative abundances. For each simulated sample, we randomly selected 50 MAGs with and 50 MAGs without a representative reference genome sequence. The 50 MAGs that map to a reference genomes were chosen to be represented in the Kraken, MetaPhlAn2, or ref-mOTU database, while the other 50 MAGs used for simulation are sampled from the MAGs lacking any such assignment (which does not preclude these MAGs to map to meta-mOTUs). We found an impartial way to map the MAGs to the phylotype profilable by the four studied metagenomic profilers. The idea is to simulate a metagenomic sample using reads from one MAG and then profile this sample and see how the tool classify those reads (and hence the MAG). In order to do this, we simulate also reads from another microbe, and we evaluate which other phylotype has a similar relative abundance. We selected Buchnera aphidicola for this since it is present in all the database of all profilers, and it should not appear in human gut samples. We implemented it in the following way: for every MAG x, we simulated 5M reads from B. aphidicola and 5M reads from x. No errors were simulated, and the reads were profiled with the four methods. If there is a phylotype that has at least half of the relative abundance of B. aphidicola, then we say that the MAG map to that phylotype. Check supplementary Figure 8 of the paper in Nature Communication for more information: Milanese et al. Microbial abundance, activity, and population genomic with mOTUs. Nature Communication (2019)