Human Microbiome Compendium dataset

Richard J. Abdill; Samantha P. Graham; Vincent Rubinetti; Frank W. Albert; Casey S. Greene; Sean Davis; Ran Blekhman

doi:10.5281/zenodo.10452633

Published January 5, 2024 | Version 1.0.1

Dataset Open

Human Microbiome Compendium dataset

1. University of Chicago
2. University of Minnesota
3. University of Colorado School of Medicine

The Human Microbiome Compendium is an ongoing project to build a large collection of human microbiome sequencing data processed with a uniform pipeline. Currently, the compendium contains 16S rRNA amplicon sequencing data for human gut microbiome samples retrieved from the Sequence Read Archive. Our website at microbiomap.org has more information about the project and links to related resources. This data is freely available under a CC-BY license; if you use it in your work, please cite our preprint, "Integration of 168,000 samples reveals global patterns of the human gut microbiome" (doi: 10.1101/2023.10.11.560955).

If you are using this dataset in conjunction with your own results, it's important to note that starting in version 1.0.1, the nomenclature used in this taxonomic table diverges from the output generated by DADA2 and the SILVA database. See the v1.0.1 release notes directly below for details.

Version history

1.0.1: The "asv_assignments" table was corrected to fix entries in which the taxonomic levels were incorrectly inferred from the reference database by DADA2 (e.g. genus "Brassicibacter" was listed as a family, genus "Gelria" was listed as an order). The problem is documented in issues attached to repositories for DADA2, DADA2 reference databases, and our MicroBioMap library. In short, problems were noted in v138 of the SILVA database in which taxonomic names were not recorded properly if they were missing levels (e.g. a taxon has been assigned a proposed genus, but not a family). This was addressed in v138.1, which we originally used for generating this dataset. However, several dozen entries remain incorrectly annotated in v138.1—our 1.0.1 release corrects these by filling in the nomenclature gaps with "(unclassified)" and moving the existing data to the correct level. 2881 ASV assignments were affected out of about 4.3 million. The new file "taxa_corrections.tsv" is a copy of the "bad-taxa.csv" list generated by Michael McLaren, with notes added to reflect what we changed.

1.0.0: Added README.md file to the repository, and added a link to the preprint and title/author metadata for the Zenodo entry

0.2.1: "sample_metadata.tsv" was missing (Note: This was accidentally tagged "0.2.0" in the version history.)

0.2.0: Replacing "country" column in sample_metadata.tsv with an "iso" column using the country code rather than name.

0.1.0: Prepping for public release

Files

Files (54.8 MB)

Name	Size	Download all
asv_inference.tsv md5:bf779bcc8ec301b378913aae28d9d4d6	18.8 kB	Download
sample_metadata.tsv md5:636bb19868c91f8f009af0f9ca553091	23.4 MB	Download
taxon_corrections.tsv md5:5abf6a97f497f5ea70de33ec898a8a09	12.3 kB	Download
taxonomic_table.csv.gz md5:a0f6e11f9e7011aff3432485c3f5eaa7	31.4 MB	Download

Additional details

Is documented by: Preprint: 10.1101/2023.10.11.560955 (DOI)

National Institutes of Health
Human Microbiome Compendium: large-scale curation and processing of human microbiome datasets 1R01LM013863-01A1

	All versions	This version
Views	6,322	754
Downloads	10,732	1,107
Data volume	539.6 GB	30.3 GB

Files (54.8 MB)

Related works

Funding

Human Microbiome Compendium dataset

Authors/Creators

Description

Files

Files (54.8 MB)

Additional details

Related works

Funding