There is a newer version of the record available.

Published August 29, 2025 | Version v9
Dataset Open

Updated Metagenomic Species Pan-genomes (MSPs) of the human gastrointestinal microbiota

Description

Gene catalog construction


The methodology for creating the IGC2 catalog is described in the original papers: Li et al., 2014 and Wen et al., 2017

MSP creation


Reads from publicly available human gut metagenomes were aligned against the IGC2 catalog with the Meteor to produce a raw gene abundance table (10.4M genes quantified in >2000 samples). Then, co-abundant genes were binned in 1,989 Metagenomic Species Pan-genomes (MSPs, i.e. clusters of co-abundant genes that likely belong to the same microbial species) using MSPminer.

MSPs taxonomic annotation


MSPs taxonomic annotation was performed by aligning MSP core and accessory genes against representative genomes of the Genome Taxonomy Database (GTDB r207) using blastn (task = megablast, word_size = 16). The 20 best hits for each gene were kept (--max-target-seq 20). Using an in-house pipeline, a species-level assignment was given if > 50% of the genes matched the representative genome of a given species, with a mean identity ≥ 95% and mean gene length coverage ≥ 90%. The remaining MSPs were assigned to a higher taxonomic level (genus to superkingdom), if more than 50% of their genes had the same annotation.

Construction of the phylogenetic tree


39 universal phylogenetic markers genes were extracted from the MSPs with fetchMGs. Then, the markers were separately aligned with MUSCLE. The alignments were merged and trimmed with trimAl (parameters: -automated1). Finally, the phylogenetic tree was computed with FastTreeMP (parameters: -gamma -pseudo -spr -mlacc 3 -slownni).

Mapping rate distribution across public cohorts

We generated mapping rate distribution plots using Meteor2 (default parameters), comparing performance between: PRJEB1786, PRJEB5224, PRJEB6337, PRJNA422434 (cohort used in catalogue assembly) and PRJEB11532, PRJEB33500, PRJEB37249, PRNJNA834801 (independent cohort not used in assembly).

Files

catalogue_mapping_rate_hs_10_4_gut.pdf

Files (13.6 GB)

Name Size Download all
md5:e2e2497f82d402a29dc945d0c0153e38
11.4 kB Preview Download
md5:9a6655b80c2fe9df91241037ca8c8788
12.7 GB Download
md5:5513514d0123b3580b5b7a35b61ca65e
866.4 MB Download