There is a newer version of the record available.

Published January 5, 2024 | Version 1.0.1
Dataset Open

Human Microbiome Compendium dataset

  • 1. University of Chicago
  • 2. University of Minnesota
  • 3. University of Colorado School of Medicine

Description

The Human Microbiome Compendium is an ongoing project to build a large collection of human microbiome sequencing data processed with a uniform pipeline. Currently, the compendium contains 16S rRNA amplicon sequencing data for human gut microbiome samples retrieved from the Sequence Read Archive. Our website at microbiomap.org has more information about the project and links to related resources. This data is freely available under a CC-BY license; if you use it in your work, please cite our preprint, "Integration of 168,000 samples reveals global patterns of the human gut microbiome" (doi: 10.1101/2023.10.11.560955).

If you are using this dataset in conjunction with your own results, it's important to note that starting in version 1.0.1, the nomenclature used in this taxonomic table diverges from the output generated by DADA2 and the SILVA database. See the v1.0.1 release notes directly below for details.

Version history

1.0.1: The "asv_assignments" table was corrected to fix entries in which the taxonomic levels were incorrectly inferred from the reference database by DADA2 (e.g. genus "Brassicibacter" was listed as a family, genus "Gelria" was listed as an order). The problem is documented in issues attached to repositories for DADA2, DADA2 reference databases, and our MicroBioMap library. In short, problems were noted in v138 of the SILVA database in which taxonomic names were not recorded properly if they were missing levels (e.g. a taxon has been assigned a proposed genus, but not a family). This was addressed in v138.1, which we originally used for generating this dataset. However, several dozen entries remain incorrectly annotated in v138.1—our 1.0.1 release corrects these by filling in the nomenclature gaps with "(unclassified)" and moving the existing data to the correct level. 2881 ASV assignments were affected out of about 4.3 million. The new file "taxa_corrections.tsv" is a copy of the "bad-taxa.csv" list generated by Michael McLaren, with notes added to reflect what we changed.

1.0.0: Added README.md file to the repository, and added a link to the preprint and title/author metadata for the Zenodo entry

0.2.1: "sample_metadata.tsv" was missing (Note: This was accidentally tagged "0.2.0" in the version history.)

0.2.0: Replacing "country" column in sample_metadata.tsv with an "iso" column using the country code rather than name.

0.1.0: Prepping for public release

Files

Files (54.8 MB)

Name Size Download all
md5:bf779bcc8ec301b378913aae28d9d4d6
18.8 kB Download
md5:636bb19868c91f8f009af0f9ca553091
23.4 MB Download
md5:5abf6a97f497f5ea70de33ec898a8a09
12.3 kB Download
md5:a0f6e11f9e7011aff3432485c3f5eaa7
31.4 MB Download

Additional details

Related works

Is documented by
Preprint: 10.1101/2023.10.11.560955 (DOI)

Funding

Human Microbiome Compendium: large-scale curation and processing of human microbiome datasets 1R01LM013863-01A1
National Institutes of Health