microbiomeutilities.Rmd
This package is in experimental stage and should be used only for testing/trial purposes. I will keep improving this with time and feedback.
This is mainly a wrapper tool R package. Apart for some simple scripts for data visualization, this package has a single function called microbiome_pipeline
for carrying out preliminary QC, Alpha Diversity, Ordination and Composition analysis of OTU tables. The output is a HTML report for convenient investigating of the data.
install.packages("devtools")
devtools::install_github("microsud/microbiomeutilities")
library(microbiomeutilities)
library(microbiome)
library(knitr)
library(tibble)
Data from pre-print Zackular et al., 2014: The Gut Microbiome Modulates Colon Tumorigenesis.
Useful resources:
For more information on phyloseq data structure and uses you can have a look at Phyloseq
Tools for microbiome analysis in R. Microbiome package URL: microbiome package.
data("zackular2014")
ps0 <- zackular2014
print(ps0)
#> phyloseq-class experiment-level object
#> otu_table() OTU Table: [ 2078 taxa and 88 samples ]
#> sample_data() Sample Data: [ 88 samples by 7 sample variables ]
#> tax_table() Taxonomy Table: [ 2078 taxa by 7 taxonomic ranks ]
Function microbiome_pipeline
generates an HTML
report with preliminary QC, Alpha Diversity, Ordination and Composition analysis of OTU tables. This function saves all intermediate files incuding figures and phyloseq objects in user specified directory.
microbiome_pipeline(
otufile = "my.biom",
mapping = "mymap.csv",
taxonomy = NULL,
treefilename = "myTree.tre",
type = "biom",
work_dir = "F:/path/my/input/filefolder",
out_dir = "F:/path/to/save/my/files/folder",
VariableA = "MC_type1",
VariableB = "Region",
UnConstOrd = TRUE,
heatmap = TRUE,
filterCount = 4,
filterPrev = 0.01,
col.palette = "Paired",
filterpseq = TRUE,
samsize = NA,
projectname = "Mock",
author = "Sudarshan"
)
Most commonly it is observed that the taxonomy file has classification until a given taxonomic level. We will fill the empty cells with the maximum classification available along with the OTU number.
Check the taxonomy in phyloseq object.
kable(head(tax_table(ps0)))
Domain | Phylum | Class | Order | Family | Genus | Species | |
---|---|---|---|---|---|---|---|
d__denovo1773 | k__Bacteria | p__Bacteroidetes | c__Bacteroidia | o__Bacteroidales | f__Bacteroidaceae | g__Bacteroides | s__ |
d__denovo1771 | k__Bacteria | p__Bacteroidetes | c__Bacteroidia | o__Bacteroidales | f__Bacteroidaceae | g__Bacteroides | s__ |
d__denovo1776 | k__Bacteria | p__Firmicutes | c__Clostridia | o__Clostridiales | f__Ruminococcaceae | g__ | s__ |
d__denovo1777 | k__Bacteria | p__Bacteroidetes | c__Bacteroidia | o__Bacteroidales | f__Bacteroidaceae | g__Bacteroides | s__ |
d__denovo1775 | k__Bacteria | p__Firmicutes | c__Clostridia | o__Clostridiales | f__Lachnospiraceae | g__Blautia | s__ |
d__denovo2639 | k__Bacteria | p__Bacteroidetes | c__Bacteroidia | o__Bacteroidales | f__Bacteroidaceae | g__Bacteroides | s__ |
Some have only g__ of s__ information.
# reduce size for example
ps0 <- core(ps0, detection = 10, prevalence = 20/100)
ps0.f <- format_phyloseq(ps0)
kable(head(tax_table(ps0.f))[3:6])
Now the available taxonomy is added.
There is a second version which will change the names in both otu table and taxonomy table. This can be useful if the analysis has to be done at OTU level. Only ID are less useful.
# reduce size for example
ps0 <- core(ps0, detection = 10, prevalence = 20/100)
ps0.f2 <- format_to_besthit(ps0)
kable(head(tax_table(ps0.f2))[3:6])
Domain | Phylum | Class | Order | Family | Genus | Species | best_hit | |
---|---|---|---|---|---|---|---|---|
OTU-d__denovo165:f__Ruminococcaceae | Bacteria | Firmicutes | Clostridia | Clostridiales | Ruminococcaceae | f__Ruminococcaceae | f__Ruminococcaceae | OTU-d__denovo165:f__Ruminococcaceae |
OTU-d__denovo167:Coprococcus | Bacteria | Firmicutes | Clostridia | Clostridiales | Lachnospiraceae | Coprococcus | g__Coprococcus | OTU-d__denovo167:Coprococcus |
OTU-d__denovo166:o__Clostridiales | Bacteria | Firmicutes | Clostridia | Clostridiales | o__Clostridiales | o__Clostridiales | o__Clostridiales | OTU-d__denovo166:o__Clostridiales |
OTU-d__denovo161:Roseburia | Bacteria | Firmicutes | Clostridia | Clostridiales | Lachnospiraceae | Roseburia | g__Roseburia | OTU-d__denovo161:Roseburia |
As can be seen, the rownames have the OTUIDs and available toxonomic name(s).
This can be useful to get an overview of taxonomic classifications. Only patterns such as [g__] or NA
is expected. [g__NA
.
percent_classified(ps0)
#> Only patterns such as [g__] or similar is expected. [g__<empty>] or [g__unclassified] not considered
#>
#> please convert for eg. g__unclassified to uniform [g__] or NAs
#> Taxonomic_Levels Percent_Classification
#> 1 Domain 99.5 %
#> 2 Phylum 99.5 %
#> 3 Class 99 %
#> 4 Order 99 %
#> 5 Family 97.5 %
#> 6 Genus 80.5 %
#> 7 Species -0.5 %
#> 8 OTUs/ASVs 200
tax_sum <- taxa_summary(ps0, "Phylum")
#> Data provided is not compositional
#> will first transform
kable(tax_sum)
Taxa | Max.Rel.Ab | Mean.Rel.Ab | Median.Rel.Ab | Std.dev |
---|---|---|---|---|
p__Firmicutes | 0.938702718865466 | 0.632215259187888 | 0.634069631401654 | 0.186558659452389 |
p__Bacteroidetes | 0.77520831039166 | 0.243009108261216 | 0.170031595087053 | 0.184741228455718 |
p__Actinobacteria | 0.239570602807597 | 0.0320558241277642 | 0.0156884949226686 | 0.0415554036141861 |
p__Proteobacteria | 0.394890253922297 | 0.0353682325052657 | 0.0113792598497232 | 0.0684708878042132 |
p__Verrucomicrobia | 0.305658324265506 | 0.0282593697373209 | 0.000100148060027087 | 0.061780449994701 |
p__Euryarchaeota | 0.30114063247046 | 0.0274355245134364 | 4.81433454220324e-05 | 0.061342757370339 |
p__Deinococcus-Thermus | 0.0356312744176846 | 0.00165668166710886 | 0 | 0.00535480262819235 |
Useful for QC purposes. Check for siatribution of sequencing depth.
p <- plot_read_distribution(ps0, groups="DiseaseState", plot.type= "density")
print(p)
Useful if the user wants to plot specific features.
# reduce size for example
ps0 <- core(ps0, detection = 10, prevalence = 20/100)
pseq_df <- phy_to_ldf(ps0, transform.counts = NULL)
#> An additonal column Sam_rep with sample names is created for reference purpose
kable(head(pseq_df))
OTUID | Domain | Phylum | Class | Order | Family | Genus | Species | Sam_rep | Abundance | investigation_type | project_name | DiseaseState | age | body_product | FOBT.result | material |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
d__denovo66 | k__Bacteria | p__Firmicutes | c__Clostridia | o__Clostridiales | f__Lachnospiraceae | g__Ruminococcus2 | s__ | Adenoma10-2757 | 27 | metagenomic | The Gut Microbiome Improves Predictive Models for Diagnosis of Colorectal Cancer | nonCRC | 37 | feces | negative | feces |
d__denovo175 | k__Bacteria | p__Bacteroidetes | c__Bacteroidia | o__Bacteroidales | f__Rikenellaceae | g__Alistipes | s__ | Adenoma10-2757 | 0 | metagenomic | The Gut Microbiome Improves Predictive Models for Diagnosis of Colorectal Cancer | nonCRC | 37 | feces | negative | feces |
d__denovo165 | k__Bacteria | p__Firmicutes | c__Clostridia | o__Clostridiales | f__Ruminococcaceae | g__ | s__ | Adenoma10-2757 | 49 | metagenomic | The Gut Microbiome Improves Predictive Models for Diagnosis of Colorectal Cancer | nonCRC | 37 | feces | negative | feces |
d__denovo167 | k__Bacteria | p__Firmicutes | c__Clostridia | o__Clostridiales | f__Lachnospiraceae | g__Coprococcus | s__ | Adenoma10-2757 | 155 | metagenomic | The Gut Microbiome Improves Predictive Models for Diagnosis of Colorectal Cancer | nonCRC | 37 | feces | negative | feces |
d__denovo166 | k__Bacteria | p__Firmicutes | c__Clostridia | o__Clostridiales | f__ | g__ | s__ | Adenoma10-2757 | 0 | metagenomic | The Gut Microbiome Improves Predictive Models for Diagnosis of Colorectal Cancer | nonCRC | 37 | feces | negative | feces |
d__denovo161 | k__Bacteria | p__Firmicutes | c__Clostridia | o__Clostridiales | f__Lachnospiraceae | g__Roseburia | s__ | Adenoma10-2757 | 11 | metagenomic | The Gut Microbiome Improves Predictive Models for Diagnosis of Colorectal Cancer | nonCRC | 37 | feces | negative | feces |
Utility plot function for diversity measures calcualted by microbiome
package.
library(microbiome)
data("zackular2014")
ps0 <- zackular2014
ps0 <- core(ps0, detection = 2, prevalence = 20/100)
p <- plot_alpha_diversities(ps0,
type = "diversities",
index.val = "all",
plot.type = "stripchart",
variableA = "DiseaseState",
palette = "jco")
print(p)
library(microbiomeutilities)
library(RColorBrewer)
data("zackular2014")
p0 <- zackular2014
ps1 <- format_to_besthit(p0)
ps1 <- subset_samples(ps1, DiseaseState == "H")
ps1 <- prune_taxa(taxa_sums(ps1) > 0, ps1)
prev.thres <- seq(.05, 1, .05)
det.thres <- 10^seq(log10(1e-4), log10(.2), length = 10)
pseq.rel <- microbiome::transform(ps1, "compositional")
# reduce size for example
pseq.rel <- core(pseq.rel, detection = 0.001, prevalence = 20/100)
ord.bray <- ordinate(pseq.rel, "NMDS", "bray")
#> Run 0 stress 0.1925851
#> Run 1 stress 0.1925852
#> ... Procrustes: rmse 0.0001333266 max resid 0.0006043361
#> ... Similar to previous best
#> Run 2 stress 0.2431167
#> Run 3 stress 0.1991279
#> Run 4 stress 0.2042816
#> Run 5 stress 0.203173
#> Run 6 stress 0.2248904
#> Run 7 stress 0.2135613
#> Run 8 stress 0.1925851
#> ... Procrustes: rmse 3.775377e-05 max resid 0.0001468821
#> ... Similar to previous best
#> Run 9 stress 0.2238146
#> Run 10 stress 0.2244357
#> Run 11 stress 0.1925851
#> ... Procrustes: rmse 7.774405e-05 max resid 0.00032036
#> ... Similar to previous best
#> Run 12 stress 0.2207064
#> Run 13 stress 0.2235846
#> Run 14 stress 0.1955387
#> Run 15 stress 0.2262274
#> Run 16 stress 0.2387625
#> Run 17 stress 0.2313222
#> Run 18 stress 0.2500976
#> Run 19 stress 0.2442357
#> Run 20 stress 0.2161705
#> *** Solution reached
p <- plot_ordiplot_core(pseq.rel, ord.bray,
prev.thres, det.thres, min.prevalence = 0.9,
color.opt = "DiseaseState", shape = NULL, Sample = TRUE)
#> Convergence reached in the ordination object provided
print(p)
ps.f2 <- format_to_besthit(ps0)
psf2.rel <- microbiome::transform(ps.f2, "compositional")
otu <- abundances(psf2.rel)
meta <- meta(psf2.rel)
library(vegan)
#> Loading required package: permute
#> Loading required package: lattice
#> This is vegan 2.5-1
permanova <- adonis(t(otu) ~ DiseaseState,
data = meta, permutations=99, method = "bray")
# P-value
print(as.data.frame(permanova$aov.tab)["group", "Pr(>F)"])
#> [1] NA
coef <- coefficients(permanova)["DiseaseState1",]
top.coef <- coef[rev(order(abs(coef)))[1:3]]
top.coef.df <- as.data.frame(top.coef)
my_taxa <- c(rownames(top.coef.df))
p <- plot_select_taxa(psf2.rel, my_taxa, "DiseaseState", "Paired", plot.type = "boxplot")
#> An additonal column Sam_rep with sample names is created for reference purpose
print(p)
Plot relative abundance of top taxa specified by user.
pn <- plot_taxa_boxplot(ps0,
taxonomic.level = "Phylum",
top.otu = 5, VariableA = "DiseaseState",
title = "Relative abudance plot", color = "Set2")
#> The phy_tree slot is empty, easy to make the plot
print(pn)
library(RColorBrewer)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# reduce data for egample purposes
ps.cor <- core(ps0, detection = 10, prevalence = 0.5)
p <- plot_taxa_composition(ps.cor, taxonomic.level = "Phylum", transform = "compositional", average_by = NULL)
print(p)
Useful for visualisng differences in top otus between sample groups.
heat.sample <- plot_taxa_heatmap(ps0, subset.top = 20,
VariableA = "DiseaseState",
heatcolors =brewer.pal(6, "Blues"),
transformation = "compositional")
#> Top 20 OTUs selected
#> First converted to compositional
#> then top taxa were selected
We provide access to a subset of studies included in the MicrobiomeHD
database from Duvallet et al 2017: Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nature communications.
study <- list_microbiome_data(printtab = FALSE)
knitr::kable(study)
Study | Disease |
---|---|
Son2015_ASD | ASD |
Kang2013_ASD | ASD |
Schubert2014_CDI | CDI |
Youngster2014_CDI | CDI |
Baxter2016_CRC | CRC |
Zackular2014_CRC | CRC |
Zeller2014_CRC | CRC |
Singh2015_EDD | EDD |
NogueraJulian2016_HIV | HIV |
Dinh2015_HIV | HIV |
Lozupone2013_HIV | HIV |
Gevers2014_IBD | IBD |
Zhang2013_LIV | LIV |
Wong2013_NASH | NASH |
Ross2015_OB | OB |
Zupancic2012_OB | OB |
Scher2013_PAR | PAR |
Alkanani2015_T1D | T1D |
Scheperjans2015_PAR | PAR |
Alkanani2015_T1D | T1D |
Below is the per study reference.
NOTE: When using these studies, please cite Duvallet et al. 2017 and the respective studies.
Study | Disease | Reference |
---|---|---|
Son2015_ASD | ASD | Son, J. et al. Comparison of fecal microbiota in children with autism spectrum disorders and neurotypical siblings in the simons simplex collection. PLoS ONE 10, e0137725 (2015). |
Kang2013_ASD | ASD | Kang, D. W. et al. Reduced incidence of Prevotella and other fermenters in intestinal microflora of autistic children. PLoS ONE8, e68322 (2013). |
Schubert2014_CDI | CDI | Schubert, A. M. et al. Microbiome data distinguish patients with clostridium difficile infection and non-c. difficile-associated diarrhea from healthy controls. mBio 5, e01021–14–e01021–14 (2014). |
Youngster2014_CDI | CDI | Youngster, I. et al. Fecal microbiota transplant for relapsing clostridium difficile infection using a frozen inoculum from unrelated donors: a randomized, open-label, controlled pilot study. Clin. Infect. Dis. 58, 1515–1522 (2014). |
Baxter2016_CRC | CRC | Baxter, N. T., Ruffin, M. T., Rogers, M. A. & Schloss, P. D. Microbiota-based model improves the sensitivity of fecal immunochemical test for detecting colonic lesions. Genome Med. 8, 37 (2016). |
Zackular2014_CRC | CRC | Zackular, Joseph P., et al. “The gut microbiome modulates colon tumorigenesis.” MBio 4.6 (2013): e00692-13. |
Zeller2014_CRC | CRC | Zeller, G. et al. Potential of fecal microbiota for early-stage detection of colorectal cancer. Mol. Syst. Biol. 10, 766–766 (2014). |
Singh2015_EDD | EDD | Singh, P. et al. Intestinal microbial communities associated with acute enteric infections and disease recovery. Microbiome 3, 45 (2015). |
NogueraJulian2016_HIV | HIV | Noguera-Julian, M. et al. Gut microbiota linked to sexual preference and hiv infection. EBioMedicine 5, 135–146 (2016). |
Dinh2015_HIV | HIV | Dinh, D. M. et al. Intestinal microbiota, microbial translocation, and systemic inflammation in chronic HIV infection. J. Infect. Dis. 211, 19–27 (2014). |
Lozupone2013_HIV | HIV | Lozupone, C. A. et al. Alterations in the gut microbiota associated with hiv-1 infection. Cell Host Microbe 14, 329–339 (2013). |
Gevers2014_IBD | IBD | Gevers, D. et al. The treatment-naive microbiome in new-onset crohn’s disease. Cell Host Microbe 15, 382–392 (2014). |
Zhang2013_LIV | LIV | Zhang, Z. et al Large-scale survey of gut microbiota associated with MHE via 16s rRNA-based pyrosequencing. Am. J. Gastroenterol. 108, 1601–1611 (2013). |
Wong2013_NASH | NASH | Wong, V. W. et al. Molecular characterization of the fecal microbiota in patients with nonalcoholic steatohepatitis – a longitudinal study. PLoS ONE 8, e62885 (2013). |
Ross2015_OB | OB | Ross, M. C. et al. 16s gut community of the cameron county hispanic cohort. Microbiome 3, 7 (2015). |
Zupancic2012_OB | OB | Zupancic, M. L. et al. Analysis of the gut microbiota in the old order Amish and its relation to the metabolic syndrome. PLoS ONE 7, e43052 (2012). |
Scher2013_PAR | PAR | Scher, J. U. et al. Expansion of intestinal prevotella copri correlates with enhanced susceptibility to arthritis. eLife 2, e01202 (2013). |
Alkanani2015_T1D | T1D | Alkanani, A. K. et al. Alterations in intestinal microbiota correlate with susceptibility to type 1 diabetes. Diabetes 64, 3510–3520 (2015). |
Scheperjans2015_PAR | PAR | Scheperjans, F. et al Gut microbiota are related to parkinson’s disease and clinical phenotype. Mov. Disord. 30, 350–358 (2014). |
sessionInfo()
#> R version 3.4.4 (2018-03-15)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 7 x64 (build 7601) Service Pack 1
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252
#> [3] LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C
#> [5] LC_TIME=Dutch_Netherlands.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] dplyr_0.7.4 vegan_2.5-1
#> [3] lattice_0.20-35 permute_0.9-4
#> [5] RColorBrewer_1.1-2 bindrcpp_0.2.2
#> [7] tibble_1.4.2 knitr_1.20
#> [9] microbiomeutilities_0.99.0 microbiome_1.0.2
#> [11] ggplot2_2.2.1 phyloseq_1.23.1
#>
#> loaded via a namespace (and not attached):
#> [1] viridis_0.5.1 Biobase_2.38.0 tidyr_0.8.0
#> [4] viridisLite_0.3.0 jsonlite_1.5 splines_3.4.4
#> [7] foreach_1.4.4 assertthat_0.2.0 highr_0.6
#> [10] stats4_3.4.4 yaml_2.1.18 ggrepel_0.7.0
#> [13] pillar_1.2.2 backports_1.1.2 glue_1.2.0
#> [16] digest_0.6.15 XVector_0.18.0 colorspace_1.3-2
#> [19] cowplot_0.9.2 htmltools_0.3.6 Matrix_1.2-14
#> [22] plyr_1.8.4 pkgconfig_2.0.1 pheatmap_1.0.8
#> [25] zlibbioc_1.24.0 purrr_0.2.4 scales_0.5.0
#> [28] mgcv_1.8-23 IRanges_2.12.0 ggpubr_0.1.6
#> [31] BiocGenerics_0.24.0 lazyeval_0.2.1 survival_2.42-3
#> [34] magrittr_1.5 crayon_1.3.4 memoise_1.1.0
#> [37] evaluate_0.10.1 fs_1.2.2 nlme_3.1-137
#> [40] MASS_7.3-49 xml2_1.2.0 tools_3.4.4
#> [43] data.table_1.10.4-3 stringr_1.3.0 S4Vectors_0.16.0
#> [46] munsell_0.4.3 ggsci_2.8 cluster_2.0.7-1
#> [49] Biostrings_2.46.0 ade4_1.7-11 compiler_3.4.4
#> [52] pkgdown_1.0.0 rlang_0.2.0 rhdf5_2.22.0
#> [55] grid_3.4.4 iterators_1.0.9 biomformat_1.7.0
#> [58] rstudioapi_0.7 igraph_1.2.1 labeling_0.3
#> [61] rmarkdown_1.9 gtable_0.2.0 codetools_0.2-15
#> [64] multtest_2.34.0 roxygen2_6.0.1 reshape2_1.4.3
#> [67] R6_2.2.2 gridExtra_2.3 bindr_0.1.1
#> [70] commonmark_1.4 rprojroot_1.3-2 desc_1.1.1
#> [73] ape_5.1 stringi_1.1.7 parallel_3.4.4
#> [76] Rcpp_0.12.16 tidyselect_0.2.4