10.5281/zenodo.3631711
https://zenodo.org/records/3631711
oai:zenodo.org:3631711
Lesker, Till Robin
Till Robin
Lesker
0000-0002-3085-6438
Helmholtz Centre for Infection Research
Strowig, Till
Till
Strowig
0000-0003-0185-1459
Helmholtz Centre for Infection Research
iMGMC - integrated Mouse Gut Metagenomic Catalog
Zenodo
2020
mouse gut
metagenome
gene catalog
Metagenome-assembled genomes (MAGs)
2020-01-31
eng
10.1101/528737
10.1016/j.celrep.2020.02.036
10.5281/zenodo.3631710
1
Creative Commons Attribution 4.0 International
Creation of an new mouse gut gene catalog with special features:
more diverse samples from different studies (12 Vendors incl. wild mice and various gut locations)
clustering-free approach: all-in-one assembly, keeping track of each ORF to contigs to bins
higher taxonomic resolution and more accuracy by using contigs for annotation
16S rRNA gene integration via linkage to bins
expansion by 20,927 MAGs from sample-wise assembly of 871 mouse gut metagenomic samples, representing 1,296 species
Code used: https://github.com/tillrobin/iMGMC
The vast complexity of host-associated microbial ecosystems requires host-specific reference catalogs to survey the functions and diversity of these communities. We generated a comprehensive resource, the integrated mouse gut metagenome catalog (iMGMC), comprising 4.6 million unique genes and 660 metagenome-assembled genomes (MAGs) with many of them (485 MAGs, 73%) linked to reconstructed full-length 16S rRNA gene sequences. iMGMC enables unprecedented coverage and taxonomic resolution of the mouse gut microbiota, i.e. more than 92% of MAGs lack species-level representatives in public repositories (<95% ANI match). The integration of MAGs and 16S rRNA gene data allows a more accurate prediction of functional profiles of communities than based on 16S rRNA amplicons alone. Accompanying iMGMC we provide a set of MAGs representing 1,296 gut bacteria obtained through complementary assembly strategies. We envision that integrated resources such as iMGMC together with MAG collections will enhance the resolution of numerous existing and future sequencing-based studies.
Genecatalog:
Description Size Filename
Catalog ORF sequences 1 GB iMGMC-GeneID.fasta.gz
Full assembly contigs 1.3 GB iMGMC-ConitgID.fasta.gz
Mapping File (GeneID->ContigID->BinID) 30 MB iMGMC-map-Gene-Contig-Bin.tab.gz
Taxonomic annotations 40 MB iMGMC_map_taxonomy.tar.gz
Functional annotations 36 MB iMGMC_map_functionality.tar.gz
16S rRNA sequences 2 MB iMGMC-16SrRNAgenes.fasta
Metagenome-assembled genomes (MAGs) :
Description Size Filename
integrated MAGs 0.5 GB iMGMC_MAGs.tar.gz
representave mMAGs (n=1296) 1 GB iMGMC-mMAGs-dereplicated_genomes.tar.gz
representave hqMAGs (n=830) 0.7 GB iMGMC-hqMAGs-dereplicated_genomes.tar.gz
all mMAGs (n=20,927) 15 GB iMGMC-mMAGs.tar.gz
Annotations by CheckM, dRep-Clustering, GTDB-Tk 2 MB MAG-annotation_CheckM_dRep_GTDB-Tk.tar.gz
Functional annotations (hqMAGs by eggNOG mapper v2) 187 MB hqMAGs.emapper.annotations.gz
Acknowledgements: TS was funded by the Helmholtz Association (VH-NG-933), by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, STR-1343/1 and STR-1343/2) and the European Union (StG337251). JFB was funded by the DFG under Germany`s Excellence Strategy – EXC 22167-390884018 and by the DFG Collaborative Research Center (CRC) 1182 "Origin and Function of Metaorganisms". TC received funding from the DFG (project CL481/2-1 and grants within Collaborative Research Center 1382).