Published June 20, 2024 | Version 1.0
Dataset Open

In silico mock communities for evaluation of taxonomic profilers across prokaryotes and viruses

  • 1. ROR icon Clinical Microbiomics (Denmark)
  • 2. ROR icon Technical University of Denmark

Description

In silico mock communities generated with CAMISIM for benchmarking the performance of taxonomic profilers across prokaryotic (50 communities), eukaryotic (30 communities), and viral communities (10 communities) of the human microbiome. Metagenomes were generated using CAMISIM (Fritz et al., 2019), which simulates 2.1 Gb of Illumina 2 ×150 bp paired end reads with the default HiSeq 2500 error profile and a mean insert size of 200 bp. To assess profiling performance for a range of sequencing depths, the 50 in silico metagenomes were also rarefied with seqtk (-s100) to sequencing depths of 20, 5, 2, 1, 0.5, 0.25 and 0.1 million read pairs. Counts are provided for rarefied metagenomes.

Prokaryotic communities
For prokaryotic benchmarking, 10 body site-representative prokaryotic metagenomes were simulated for each of the following five body sites: adult gut, infant gut, oral, skin, and vagina. Genome accession ids for prokaryotic species found in each human body site were identified from published literature (Bäckhed et al., 2015; Proctor et al., 2019; Saheb Kashaf et al., 2021). 

Adult Gut: pro_gut_adult.zip
Infant Gut: pro_gut_infant.zip
Oral: pro_oral.zip
Skin: pro_skin_1.zip, pro_skin_2.zip, pro_skin_3.zip
Vaginal: pro_vaginal.zip

Downsized counts: 

Eukaryotic communities
30 eukaryotic in silico metagenomes comprising up to 200 randomly sampled genomes from a set of 113 eukaryotic species (See Supplementary Table 2 from the paper) corresponding to the eukaryotic species within both CHAMP and MetaPhlAn 4 (Blanco-Míguez et al., 2023) databases.

Eukaryotic data is deposited here: doi: 10.5281/zenodo.12090449

Viral communities
10 viral communities were simulated with 95% of the reads from bacteria and 5% of the reads originating from phages. Each community consisted of 200 randomly selected bacterial genomes from GTDB with species-level annotation and 200 viral genomes from the Gut Phage Database (GPD, Camarillo-Guerrero et al., 2021). 

Counts: phage_communities_counts.zip
FastQ, forward reads: camisimu_[1-10].fq.1.gz
FastQ, reverse reads: camisimu_[1-10].fq.2.gz

References

Bäckhed, F., Roswall, J., Peng, Y., Feng, Q., Jia, H., Kovatcheva-Datchary, P., et al. (2015). Dynamics and Stabilization of the Human Gut Microbiome during the First Year of Life. Cell Host Microbe 17, 690–703. doi: 10.1016/J.CHOM.2015.04.004

Blanco-Míguez, A., Beghini, F., Cumbo, F., McIver, L. J., Thompson, K. N., Zolfo, M., et al. (2023). Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nature Biotechnology 2023 41:11 41, 1633–1644. doi: 10.1038/s41587-023-01688-w

Camarillo-Guerrero, L. F., Almeida, A., Rangel-Pineros, G., Finn, R. D., and Lawley, T. D. (2021). Massive expansion of human gut bacteriophage diversity. Cell 184, 1098. doi: 10.1016/J.CELL.2021.01.029

Fritz, A., Hofmann, P., Majda, S., Dahms, E., Dröge, J., Fiedler, J., et al. (2019). CAMISIM: Simulating metagenomes and microbial communities. Microbiome 7, 1–12. doi: 10.1186/S40168-019-0633-6/FIGURES/5

Proctor, L. (2019). Priorities for the next 10 years of human microbiome research. Nature 2021 569:7758 569, 623–625. doi: 10.1038/d41586-019-01654-0

Saheb Kashaf, S., Proctor, D. M., Deming, C., Saary, P., Hölzer, M., Mullikin, J., et al. (2021). Integrating cultivation and metagenomics for a multi-kingdom view of skin microbiome diversity and functions. Nature Microbiology 2021 7:1 7, 169–179. doi: 10.1038/s41564-021-01011-w

Files

phage_communities_counts.zip

Files (186.5 GB)

Name Size Download all
md5:a943603532eeb999dc3c91e81d2d9eaa
5.1 GB Download
md5:d05eb9e41a8071507e3deb44dc36c43a
5.3 GB Download
md5:71bd723fa77d137d3598b73d93c953f4
5.1 GB Download
md5:10c2fcce415e43762c11d008c5a2c4b3
5.3 GB Download
md5:0abd9096f4d990851ff6081880b42c3e
5.1 GB Download
md5:292333256cad021985301049bc21806c
5.3 GB Download
md5:362e6989ae1b3d7f0b6d9a1a3f71d52f
5.1 GB Download
md5:8d83f3d9a613747bc13f41f425e354f3
5.3 GB Download
md5:81fa93303af550a08f4e4a1d90d6ddc9
5.1 GB Download
md5:cec612d2fa37ec2a13b928193642fa32
5.3 GB Download
md5:b7b4443eebe2d072f7bd187c20031bad
5.1 GB Download
md5:b5487ef914216f029b2e4b7f5fcbd3f3
5.3 GB Download
md5:23b5236a59809ff088cb49278069b203
5.1 GB Download
md5:11aff75d70c92c11621b0159c6ebb835
5.3 GB Download
md5:a7a83992b06ec396c7975a32021d5fab
5.0 GB Download
md5:4b849d2cc3903392c4c2ff989eefbe37
5.3 GB Download
md5:e038153891d9f2e9d2ac50fb1cce9b45
5.1 GB Download
md5:9af6d99037f22f3186814cf4c1df8c51
5.3 GB Download
md5:770a88a90e0f7486e4a772908e81d1ec
5.1 GB Download
md5:324b6ff6e16c65b3ee565c5871d988ac
5.3 GB Download
md5:09b588d015c518051ef12e3ecd86826a
3.6 MB Preview Download
md5:d1517396914401460afda7f182961aef
16.4 GB Preview Download
md5:7a2489393a878d4cb404c22805cab12e
16.6 GB Preview Download
md5:4b9cfadd22f054216cc6128c0407bde1
16.5 GB Preview Download
md5:d2e6ec72050caaba34adf2dac3d56018
8.3 GB Preview Download
md5:97bba38d9d7f9d717fdf3ab534300cb8
6.7 GB Preview Download
md5:e7f00f272e0a3e2a2e80d6e119bfba7e
1.7 GB Preview Download
md5:2f8994cc6e528480b3d03cf99bcd5849
16.4 GB Preview Download