This code processes the dataset ‘processed_data_phyloseq.RDS’, which is a phyloseq object that can be downloaded from https://zenodo.org/deposit/5337076. This dataset contains 16S microbiome data from 1109 meerkat faecal samples. The dataset is not normalised, but lab and sand contaminants have been filtered out, and only samples used in the final analysis are included. Original data contained an internal standard of Allobacillus and Imtechella - these have already been removed. Samples with low sequencing depth and any replicates have been removed. The code contains general data exploration and model validation that are additional to those presented in the paper. At the end it also analyzes the technical replicate data (‘technical_replicate_data_phyloseq.RDS’) and the pilot study data on captive meerkats (‘pilot_study_data_phyloseq.RDS’). Note that this code takes several hours to run (~12 hours). To do a test run on a reduced dataset, there is the option to run a chunk after data import that generates a small but representative dataset, which takes about an hour to process, although some of the sensitivity analyses will throw up an error for this dataset due to small sample sizes. This Rmarkdown script allows errors to occur without halting the report. To turn this option off, remove “error = TRUE” from the global options. Each code chunk ends with elapsed time, and the total time taken to run the script is as the end of the document. For details on the study population and study design, please refer to the manuscript.
Diurnal oscillations in gut bacterial load and composition eclipse seasonal and lifetime dynamics in wild meerkats, Suricata suricatta
Risely, A.1*, Wilhelm, K.1, Clutton-Brock, T.2, 3, 4, Manser, M.B. 3, 4, 5, and Sommer, S.1
Corresponding author: Alice Risely, riselya@gmail.com
Circadian rhythms in gut microbiota composition are crucial for metabolic function, yet the extent to which they govern microbial dynamics compared to seasonal and lifetime processes remains unknown. Here we investigate gut bacterial dynamics in wild meerkats (Suricata suricatta) over a 20-year period to compare diurnal, seasonal, and lifetime processes in concert, applying ratios of absolute abundance. We found that diurnal oscillations in bacterial load and composition eclipsed seasonal and lifetime dynamics. Diurnal oscillations were characterised by a peak in Clostridium abundance at dawn, were associated with temperature-constrained foraging schedules, and did not decay with age. Some genera exhibited seasonal fluctuations, whilst others developed with age, although we found little support for microbial senescence in very old meerkats. Strong microbial circadian rhythms in this species may reflect the extreme daily temperature fluctuations typical of arid-zone climates. Our findings demonstrate that accounting for circadian rhythms is essential for future gut microbiome research.
# data analysis packages
library(phyloseq)
library(mgcv)
library(gratia)
library(vegan)
library(tidyverse)
library(caret)
library(metagMisc)
library(expss)
library(scales)
library(broom)
library(lme4)
library(performance)
# viz packages
library(ggplot2)
library(magick)
library(RColorBrewer)
library(ggsci)
library(viridis)
library(ggpubr)
library(sjPlot)
library(gridExtra)
memory.limit(1000000) # set large memory limit
## [1] 1e+06
This chunk took 0.18 minutes
sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
## [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
## [5] LC_TIME=English_Australia.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] gridExtra_2.3 sjPlot_2.8.4 ggpubr_0.4.0 viridis_0.5.1
## [5] viridisLite_0.3.0 ggsci_2.9 RColorBrewer_1.1-2 magick_2.4.0
## [9] performance_0.4.7 lme4_1.1-23 Matrix_1.2-18 broom_0.5.6
## [13] scales_1.1.1 expss_0.10.2 metagMisc_0.0.4 caret_6.0-86
## [17] forcats_0.5.0 stringr_1.4.0 dplyr_1.0.0 purrr_0.3.4
## [21] readr_1.3.1 tidyr_1.1.0 tibble_3.0.1 ggplot2_3.3.2
## [25] tidyverse_1.3.0 vegan_2.5-6 lattice_0.20-38 permute_0.9-5
## [29] gratia_0.4.1 mgcv_1.8-31 nlme_3.1-143 phyloseq_1.30.0
##
## loaded via a namespace (and not attached):
## [1] readxl_1.3.1 backports_1.1.7 plyr_1.8.6
## [4] igraph_1.2.5 splines_3.6.2 TH.data_1.0-10
## [7] digest_0.6.27 foreach_1.5.0 htmltools_0.5.1.1
## [10] fansi_0.4.1 magrittr_2.0.1 checkmate_2.0.0
## [13] cluster_2.1.0 openxlsx_4.1.5 recipes_0.1.13
## [16] Biostrings_2.54.0 modelr_0.1.8 gower_0.2.2
## [19] matrixStats_0.56.0 sandwich_2.5-1 colorspace_1.4-1
## [22] blob_1.2.1 rvest_0.3.5 haven_2.3.1
## [25] xfun_0.22 crayon_1.3.4 jsonlite_1.7.0
## [28] zoo_1.8-8 survival_3.2-7 iterators_1.0.12
## [31] ape_5.4 glue_1.4.1 gtable_0.3.0
## [34] emmeans_1.4.8 ipred_0.9-9 zlibbioc_1.32.0
## [37] XVector_0.26.0 sjstats_0.18.0 sjmisc_2.8.5
## [40] car_3.0-8 Rhdf5lib_1.8.0 BiocGenerics_0.32.0
## [43] abind_1.4-5 mvtnorm_1.1-1 DBI_1.1.0
## [46] ggeffects_1.1.0 rstatix_0.6.0 Rcpp_1.0.4.6
## [49] xtable_1.8-4 htmlTable_2.0.0 foreign_0.8-74
## [52] stats4_3.6.2 lava_1.6.7 prodlim_2019.11.13
## [55] htmlwidgets_1.5.1 httr_1.4.1 ellipsis_0.3.1
## [58] pkgconfig_2.0.3 nnet_7.3-12 dbplyr_1.4.4
## [61] tidyselect_1.1.0 rlang_0.4.11 reshape2_1.4.4
## [64] effectsize_0.3.1 munsell_0.5.0 cellranger_1.1.0
## [67] tools_3.6.2 cli_2.0.2 generics_0.0.2
## [70] sjlabelled_1.1.6 ade4_1.7-15 evaluate_0.14
## [73] biomformat_1.14.0 yaml_2.2.1 ModelMetrics_1.2.2.2
## [76] knitr_1.33 fs_1.4.1 zip_2.0.4
## [79] mvnfast_0.2.5.1 xml2_1.3.2 compiler_3.6.2
## [82] rstudioapi_0.11 curl_4.3 ggsignif_0.6.0
## [85] reprex_0.3.0 statmod_1.4.34 stringi_1.4.6
## [88] parameters_0.8.0 nloptr_1.2.2.1 multtest_2.42.0
## [91] vctrs_0.3.1 pillar_1.4.4 lifecycle_0.2.0
## [94] estimability_1.3 data.table_1.12.8 cowplot_1.1.0
## [97] insight_0.13.1 patchwork_1.1.0.9000 R6_2.4.1
## [100] rio_0.5.16 IRanges_2.20.2 codetools_0.2-16
## [103] boot_1.3-24 MASS_7.3-54 assertthat_0.2.1
## [106] rhdf5_2.30.1 withr_2.2.0 multcomp_1.4-13
## [109] S4Vectors_0.24.4 bayestestR_0.8.2 parallel_3.6.2
## [112] hms_0.5.3 grid_3.6.2 rpart_4.1-15
## [115] timeDate_3043.102 coda_0.19-3 class_7.3-15
## [118] minqa_1.2.4 rmarkdown_2.8 carData_3.0-4
## [121] pROC_1.16.2 Biobase_2.46.0 lubridate_1.7.9
start_time<-Sys.time()
This chunk took 0 minutes
meerkat_final<-readRDS("processed_data_phyloseq.RDS") # import data as phyloseq object
meerkat_final #data summary
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 26122 taxa and 1109 samples ]
## sample_data() Sample Data: [ 1109 samples by 26 sample variables ]
## tax_table() Taxonomy Table: [ 26122 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 26122 tips and 26121 internal nodes ]
## scale to internal standard
asv_table<-data.frame(meerkat_final@otu_table@.Data)
names(asv_table)<-sample_data(meerkat_final)$feature.id
meerkat_normalized<-sweep(asv_table, MARGIN=2, sample_data(meerkat_final)$scaling_factor, `*`)
meerkat_normalized<-round(meerkat_normalized, digits = 0)
meerkat_normalized<-otu_table(meerkat_normalized, taxa_are_rows = TRUE)
meerkat_scaled<-meerkat_final
otu_table(meerkat_scaled)<-meerkat_normalized
### add total abundance of reads to dataframe
sample_data(meerkat_scaled)$TotalAbundance<-sample_sums(meerkat_scaled)
paste("Number of individuals sampled:",length(unique(sample_data(meerkat_scaled)$IndividID)))
## [1] "Number of individuals sampled: 235"
# 1109 samples from 235 meerkats
## how many samples per individual?
summary(data.frame(table(sample_data(meerkat_scaled)$IndividID))$Freq)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 4.000 4.719 6.000 14.000
This chunk took 0.2 minutes
To speed up analysis to see if the script works, run this code which generates a smaller dataset (~300) samples) to work with for testing (change “eval = FALSE” to “eval = TRUE” option at the start of this chunk to use this dataset). This test dataset needs to be quite specific for models to work - the samples need to be distributed across limited numbers of individuals and social groups so random effects do not break models. Some validation code might not work with this reduced dataset, therefore I have allowed errors so that these do not halt the script.
#not run
samplecount<-data.frame(sample_data(meerkat_scaled)) %>%
group_by(IndividID) %>%
summarise(NoSamples = n())
samplecount<-samplecount%>%arrange(IndividID, -NoSamples)
sample_data(meerkat_scaled)$SampleCount<-vlookup(sample_data(meerkat_scaled)$IndividID, samplecount, lookup_column = "IndividID", result_column = "NoSamples")
test_data<-subset_samples(meerkat_scaled, SampleCount >5)
samplenames<-sample_names(test_data)
subsamples<-sample(samplenames, 300) # choose 150
# also need some samples from sequencing run 4 to that some of the code works
run4<-subset_samples(meerkat_scaled, Seq_run == "RUN4")
samplenames_run4<-sample_names(run4)
subsamples_final<-c(subsamples, samplenames_run4)
test_data<-prune_samples(subsamples_final, meerkat_scaled)
test_data<-microbiome::core(test_data, detection = 10, prevalence =0.01)
# remove rare social groups
majorgroups<-data.frame(table(sample_data(test_data)$GroupAtSampling))
majorgroups<-majorgroups[order(-majorgroups$Freq),]
majorgroups<-subset(majorgroups, Freq >10)
majorgroups<-as.character(majorgroups$Var1)
sample_data(test_data)$Test2<-sample_data(test_data)$GroupAtSampling %in% majorgroups
test_data<-subset_samples(test_data, Test2==TRUE)
meerkat_scaled<-test_data
This chunk took 0 minutes
str(data.frame(sample_data(meerkat_final)))
## 'data.frame': 1109 obs. of 26 variables:
## $ IndividID : Factor w/ 235 levels "ID 1","ID 10",..: 202 202 170 228 1 165 202 170 168 214 ...
## $ feature.id : Factor w/ 1109 levels "Feature1","Feature10",..: 1 379 587 793 897 2 192 201 212 222 ...
## $ SampleNo : Factor w/ 1109 levels "Sample1","Sample10",..: 1 251 359 467 572 680 789 893 1002 2 ...
## $ Storage : Factor w/ 2 levels "FREEZEDRIED",..: 2 1 1 1 2 1 1 2 2 2 ...
## $ Seq_run : Factor w/ 4 levels "RUN1","RUN2",..: 1 1 1 4 1 1 1 1 1 1 ...
## $ Seq_depth_nospikein : num 26904 40273 30469 31224 31869 ...
## $ HoursTilFrozen : num 3.961 1.976 2.611 0.504 3.751 ...
## $ scaling_factor : num 4.769 2.809 5.245 0.063 1.846 ...
## $ SampleDate : Date, format: "1997-04-18" "2002-01-06" ...
## $ SampleTime : POSIXct, format: "1899-12-30 08:29:00" "1899-12-30 19:05:00" ...
## $ SampleTimeCat : Factor w/ 2 levels "AFTERNOON","MORNING": 2 1 2 2 2 2 1 2 2 1 ...
## $ HoursAfterSunrise : num 1.55 13.23 2.81 5.21 2.24 ...
## $ month : num 4 1 4 5 8 8 9 9 9 10 ...
## $ Season : Factor w/ 2 levels "DRY WINTER","WET SUMMER": 2 2 2 1 1 1 1 1 1 2 ...
## $ Year : num 1997 2002 2002 2002 2002 ...
## $ AgeAtSamplingYears : num 0.126 4.849 3.521 2.173 6.633 ...
## $ AgeCat : Factor w/ 3 levels "MIDDLEAGED","OLD",..: 3 1 1 1 2 1 2 1 1 1 ...
## $ Sex : Factor w/ 2 levels "F","M": 1 1 2 1 1 2 1 2 2 2 ...
## $ GroupAtSampling : Factor w/ 42 levels "Group 1","Group 10",..: 30 19 30 31 2 35 19 30 40 33 ...
## $ Condition_weight : num -101.94 -52.34 6.77 -4.15 -61.91 ...
## $ SocialStatus : Factor w/ 2 levels "Dominant","Subordinate": 2 2 1 1 2 2 2 1 2 2 ...
## $ HoursSinceForagingStart: num 0 0.629 1.298 3.504 0.278 ...
## $ Temp_max : num 32.7 37.7 34.3 29 23.1 26.6 29.8 26.2 34 21.1 ...
## $ Temp_min : num 15.2 25.3 16.2 5.3 -0.4 7.5 11.6 1.1 10.4 7.3 ...
## $ HourlyTemp : num 16.7 34.4 24 25.7 12.5 23.6 29 20.2 27.7 19.7 ...
## $ sum_rainfall_month : num 25.4 13.2 19.2 18.4 0.2 0.2 0.2 0.2 23.8 23.8 ...
This chunk took 0 minutes
VARIABLE DEFINITIONS
[1] IndividID = Meerkat ID
[2] feature.id = DNA sample ID
[3] SampleNo = faecal sample ID
[4] Storage = Sample storage (freezedried or frozen)
[5] Seq_run = Sequencing run
[6] Seq_depth_nospikein = Sequencing depth after internal standard removed
[7] HoursTilFrozen = Number of hours sample was in field before frozen
[8] scaling_factor = Scaling factor to scale sample to internal standard
[9] SampleDate = Date sample was collected
[10] SampleTime = Time sample was collected
[11] SampleTimeCat = Categorical time variable (morning/afternoon)
[12] HoursAfterSunrise = Number of hours after sunrise at sample collection
[13] month = the month the sample was collected
[14] Season = Wet or dry season
[15] Year = Year sample was collected
[16] AgeAtSamplingYears = Meerkat age in years at sample collection
[17] AgeCat = Categorical age variable (young/adult/old)
[18] Sex = Meerkat sex
[19] GroupAtSampling = Social group at sampling
[20] Condition_weight = Residual body mass from GAM (mass ~ age)
[21] SocialStatus = Social status at time of sample collection (dom/sub)
[22] HoursSinceForagingStart = Time meerkat was foraging before sample collection
[23] Temp_max = Maximum temperature on day of sample collection
[24] Temp_min = Minimum temperature on day of sample collection
[25] HourlyTemp = Temperature (to the nearest hour) at sample collection
[26] sum_rainfall_month = Cumulative rainfall in mm over the previous month
metadata <- data.frame(sample_data(meerkat_scaled)) #extract metadata as DF
# order meerkat ID my their median sampling date
metadata1 <- metadata %>%
group_by(IndividID) %>%
summarize(first = mean(SampleDate))
metadata1 <- metadata1[order(metadata1$first), ]
ID_order <- as.character(metadata1$IndividID)
# order ID by median sampling date
sample_data(meerkat_scaled)$IndividID <-
factor(sample_data(meerkat_scaled)$IndividID, levels = ID_order)
This chunk took 0.01 minutes
fig1a<-ggplot(data=sample_data(meerkat_scaled),aes(x=SampleDate,y=IndividID,group=IndividID))+
geom_line(size=0.5,alpha=0.5)+
geom_point(aes(fill=HoursAfterSunrise),colour="black",pch=21,size=1.5)+
theme_bw(base_size=14)+
theme(panel.grid.minor=element_blank(),
panel.background=element_blank(),axis.line=element_line(colour="black"))+
theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())+
scale_fill_viridis_c(option="plasma",direction=-1,name="Time of day")+
scale_x_date(date_breaks = "1 year")+
theme(axis.text.x = element_text(angle = 45, vjust = 0.9, hjust=1))+
ylab("Meerkat ID")+
xlab("")+
scale_y_discrete(expand=expansion(add=c(0.2,1.5)))+
theme(legend.position="none")
This chunk took 0 minutes
hist1<-ggplot(sample_data(meerkat_scaled), aes(x = HoursAfterSunrise))+
geom_histogram(fill = "grey", col = "black", binwidth = 1, boundary = 0)+
theme_bw(base_size = 22)+
xlab("Hours after sunrise")+
ylab("Frequency")
hist2<-ggplot(sample_data(meerkat_scaled), aes(x = as.factor(month)))+
geom_histogram(fill = "grey", col = "black", binwidth = 1, stat = "count")+
theme_bw(base_size = 22)+
xlab("Month")+
ylab("Frequency")+
scale_x_discrete(breaks = c("2","4", "6","8","10","12"))+
theme(axis.title.y = element_blank())
hist3<- ggplot(sample_data(meerkat_scaled), aes(x = AgeAtSamplingYears))+
geom_histogram(fill = "grey", col = "black", binwidth = 1, boundary = 0)+
theme_bw(base_size = 22)+
xlab("Age (years)")+
ylab("Frequency")+
scale_x_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9,10))+
theme(axis.title.y = element_blank())
Fig1e<- ggarrange(hist1, hist2, hist3, ncol = 3, widths = c(1.2,1,1))
#ggarrange(hist1, hist2, hist3, ncol = 3, widths = c(1.2,1,1))
# save as pdf
#ggsave("FIGURES/fig1e.pdf")
This chunk took 0.01 minutes
## import Fig1bcd
fig1bcd <- magick::image_read_pdf("C:/Users/risel/Dropbox/Sommer postdoc/Meerkat project/PROJECTS/MeerkatSpikeInData/Analysis updated/TEMPORAL DYNAMICS ANALYSIS/FIGURES/Fig1abc.pdf")
## give it a border
fig1bcd_border <- fig1bcd %>%
magick::image_border("black", "15 x 15")
## import fig1e that i saved earlier
fig1e <- magick::image_read_pdf("C:/Users/risel/Dropbox/Sommer postdoc/Meerkat project/PROJECTS/MeerkatSpikeInData/Analysis updated/TEMPORAL DYNAMICS ANALYSIS/FIGURES/fig1e.pdf")
# give border
fig1e_border <- fig1e %>%
magick::image_border("black", "15 x 15")
# this figure won't wont fully work with reduced datset.
fig1a +annotation_raster(fig1bcd_border, xmin = as.Date("1997-01-01"), xmax = as.Date("2010-01-01"), ymin = "ID 134", ymax = "ID 231") +
annotation_raster(fig1e_border, xmin = as.Date("2010-01-01"), xmax = as.Date("2020-01-01"), ymin = "ID 39", ymax = "ID 61")
# ggsave("FIGURES/Figure_1.pdf", width = 12, height = 6, units = c("in"))
This chunk took 0.11 minutes
We model log10-transformed bacterial load (i.e. total 16S copy number per each scaled sample) against time of day, month, and meerkat age using Generalized Additive Mixed Models whilst accounting for methodological variables (storage method, sampling depth, sequencing run, time in field prior to freezing) as fixed linear effects, individual ID and social group as random effects, and including an auto-regression term to account for temporal autocorrelation (Wood 2017). The auto-regression term doesn’t really change models but I included it anyway as a precaution.
metadata<-data.frame(sample_data(meerkat_scaled))
### bacterial load distribution
ggplot(metadata, aes(x = TotalAbundance))+geom_histogram(fill = "grey", col = "black", boundary = 0) + ggtitle("Untransformed")+
ggplot(metadata, aes(x = log10(TotalAbundance)))+geom_histogram(fill = "grey", col = "black", boundary = 0) + ggtitle("Log10-transformed")
This chunk took 0.01 minutes
#scale variables
metadata$Seq_depth_nospikein<-as.numeric(scale(metadata$Seq_depth_nospikein))
metadata$HoursTilFrozen<-as.numeric(scale(metadata$HoursTilFrozen))
## GAMM MODEL
abundance_model <- mgcv::gamm(log10(TotalAbundance)~
s(HoursAfterSunrise, bs = "cr")+
s(month, bs = "cc")+
s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re"),
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata,
family = gaussian)
# GAMM summary
sjPlot::tab_model(abundance_model$gam,
show.stat = TRUE,
show.se = TRUE,
title = "Table S2: GAMM predicting bacterial load. Note that the estimate for smooth terms is actually EDF (estimated degrees of freedom), it is not an 'estimate'. EDF reflects the degree of non-linearity of a curve (1 = linear).",
dv.labels = "Bacterial load (log-transformed)",
pred.labels = c("Intercept", "Storage (frozen)", "Seq. run (2)", "Seq run (3)", "Seq. run (4)", "Time in field", "Hours after sunrise (non-linear)", "Month (non-linear)", "Age in years (non-linear)", "Seq. depth (non-linear)", "Meerkat ID (random)", "Social group (random)"))
Bacterial load (log-transformed) | |||||
---|---|---|---|---|---|
Predictors | Estimates | std. Error | CI | Statistic | p |
Intercept | 4.94 | 0.04 | 4.86 – 5.01 | 125.97 | <0.001 |
Storage (frozen) | -0.12 | 0.04 | -0.20 – -0.04 | -3.10 | 0.002 |
Seq. run (2) | -0.00 | 0.04 | -0.08 – 0.08 | -0.06 | 0.955 |
Seq run (3) | -0.02 | 0.04 | -0.11 – 0.06 | -0.46 | 0.642 |
Seq. run (4) | -0.65 | 0.10 | -0.86 – -0.45 | -6.26 | <0.001 |
Time in field | 0.05 | 0.03 | -0.01 – 0.10 | 1.74 | 0.082 |
Hours after sunrise (non-linear) | 4.64 | 54.36 | <0.001 | ||
Month (non-linear) | 2.06 | 1.05 | 0.007 | ||
Age in years (non-linear) | 1.65 | 9.10 | 0.003 | ||
Seq. depth (non-linear) | 5.11 | 53.71 | <0.001 | ||
Meerkat ID (random) | 9.64 | 0.04 | 0.278 | ||
Social group (random) | 10.20 | 0.61 | 0.001 | ||
Observations | 1109 | ||||
R2 | 0.466 |
# Temporal effects (without residuals)
gratia::draw(abundance_model$gam, select = c(1:3), ncol = 3, scales = "fixed")
This chunk took 1.62 minutes
model_check_load<-gratia::appraise(abundance_model$gam)
annotate_figure(model_check_load, top = text_grob("Model check of bacterial load")) # check model
This chunk took 0.03 minutes
This chunk took 0 minutes
confint_df_time<-confint(abundance_model$gam, parm = "s(HoursAfterSunrise)", type = "confidence")
confint_df_season<-confint(abundance_model$gam, parm = "s(month)", type = "confidence")
confint_df_age<-confint(abundance_model$gam, parm = "s(AgeAtSamplingYears)", type = "confidence")
confint_df_time$Variable<-"HoursAfterSunrise"
names(confint_df_time)[3]<-"Time"
confint_df_season$Variable<-"month"
names(confint_df_season)[3]<-"Time"
confint_df_age$Variable<-"AgeAtSamplingYears"
names(confint_df_age)[3]<-"Time"
confint_df_load<-rbind(confint_df_time, confint_df_season, confint_df_age)
###########add geom_rug data ##################
rugdata<- data.frame(cbind(metadata$HoursAfterSunrise, metadata$month, metadata$AgeAtSamplingYears))
names(rugdata)<-c("HoursAfterSunrise", "month", "AgeAtSamplingYears")
rugdata<-gather(rugdata, Variable, Time, HoursAfterSunrise:AgeAtSamplingYears, factor_key=TRUE)
rugdata$est = 0
rugdata$Variable<-factor(rugdata$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))
################# add residuals #######
fv <- predict(abundance_model$gam,type="terms") ## get term estimates
prsd_time <- data.frame(as.numeric(residuals(abundance_model$gam,type="working")) + fv[,3])
names(prsd_time)<-"Residuals"
prsd_time$Time<-metadata$HoursAfterSunrise
prsd_time$Variable<-"HoursAfterSunrise"
prsd_time$predicted<-predict(abundance_model$gam,type="response")
prsd_month <-data.frame(as.numeric(residuals(abundance_model$gam,type="working")) + fv[,4])
names(prsd_month)<-"Residuals"
prsd_month$Time<-metadata$month
prsd_month$Variable<-"month"
prsd_month$predicted<-predict(abundance_model$gam,type="response")
prsd_age <- data.frame(as.numeric(residuals(abundance_model$gam,type="working")) + fv[,5])
names(prsd_age)<-"Residuals"
prsd_age$Time<-metadata$AgeAtSamplingYears
prsd_age$Variable<-"AgeAtSamplingYears"
prsd_age$predicted<-predict(abundance_model$gam,type="response")
resids_df<-rbind(prsd_time, prsd_month, prsd_age)
resids_df$est<-resids_df$predicted+resids_df$Residuals
## rearrange level order
resids_df$Variable<-factor(resids_df$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))
confint_df_load$Variable<-factor(confint_df_load$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))
#### make y axis the real abunance values, not estimates
#10^summary(abundance_model$gam)$p.coeff[1] # mean
confint_df_load$est<-confint_df_load$est+summary(abundance_model$gam)$p.coeff[1]
confint_df_load$lower<-confint_df_load$lower+summary(abundance_model$gam)$p.coeff[1]
confint_df_load$upper<-confint_df_load$upper+summary(abundance_model$gam)$p.coeff[1]
confint_df_load$est<-10^confint_df_load$est
confint_df_load$lower<-10^confint_df_load$lower
confint_df_load$upper<-10^confint_df_load$upper
resids_df$est<-10^resids_df$est
resids_load<-resids_df
####
P1_resids<-ggplot(subset(confint_df_load, Variable == "HoursAfterSunrise"), aes(x = Time, y = est))+
theme_light(base_size = 14)+
geom_point(data = subset(resids_load, Variable =="HoursAfterSunrise"),alpha = 0.5, aes(fill=Time), col = "white", pch = 21, size = 2)+
# geom_line( size = 3,aes(col = Time))+
geom_line( size = 1,col = "black", alpha = 0.7)+
scale_color_viridis_c(option = "plasma", direction= -1)+
scale_fill_viridis_c(option = "plasma", direction= -1)+
geom_ribbon(aes(ymin=lower, ymax=upper), linetype=2, alpha=0.3)+
theme( strip.background = element_blank(),
strip.text.x = element_blank())+
ylab("Bacterial abundance")+
xlab("Hours after sunrise")+
scale_x_continuous(breaks = int_breaks)+
theme(legend.position = "none")+
coord_cartesian(ylim = c(1000,1000000))+
theme( panel.grid = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))+
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))+
geom_vline(xintercept = c(as.numeric(abundance_model$gam$smooth[[1]]$xp)), col = "darkgrey", linetype = "dashed")
P2_resids<-ggplot(subset(confint_df_load, Variable == "month"), aes(x = Time, y = est))+
theme_light(base_size = 14)+
geom_jitter(data = subset(resids_load, Variable =="month"),alpha = 0.5, aes(fill=Time), col = "white", pch = 21, size = 2, width = 0.15)+
geom_line( size = 1,col = "black", alpha = 0.7)+
scale_color_gradientn(colours = c( "seagreen","seagreen","seagreen","seagreen","khaki","khaki","khaki","khaki","khaki","seagreen", "seagreen"))+
scale_fill_gradientn(colours = c( "seagreen","seagreen","seagreen","seagreen","khaki","khaki","khaki","khaki","khaki","seagreen", "seagreen"))+
geom_ribbon(aes(ymin=lower, ymax=upper), linetype=2, alpha=0.3)+
theme( strip.background = element_blank(),
strip.text.x = element_blank())+
ylab("Model estimate")+
xlab("Month")+
coord_cartesian(ylim = c(1000,1000000))+
theme(axis.title.y = element_blank(),
axis.text.y = element_blank())+
scale_x_continuous(breaks = int_breaks)+
theme(legend.position = "none")+
# ylim(-2,1)+
theme( panel.grid = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))+
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))+
geom_vline(xintercept = c(as.numeric(abundance_model$gam$smooth[[2]]$xp)), col = "darkgrey", linetype = "dashed")
P3_resids<-ggplot(subset(confint_df_load, Variable == "AgeAtSamplingYears"), aes(x = Time, y = est))+
theme_light(base_size = 14)+
geom_point(data = subset(resids_load, Variable =="AgeAtSamplingYears"),alpha = 0.5, aes(fill=Time), col = "white", pch = 21, size = 2)+
geom_line( size = 1,col = "black", alpha = 0.7)+
scale_color_gradientn(colours = c( "blue","blue","darkgray","darkgray","darkgray","red", "red", "red", "red", "red", "red"))+
scale_fill_gradientn(colours = c( "blue","blue","darkgray","darkgray","darkgray","red", "red", "red", "red", "red", "red"))+
geom_ribbon(aes(ymin=lower, ymax=upper), linetype=2, alpha=0.3)+
theme( strip.background = element_blank(),
strip.text.x = element_blank())+
ylab("Model estimate")+
xlab("Age (years)")+
coord_cartesian(ylim = c(1000,1000000))+
theme(axis.title.y = element_blank(),
axis.text.y = element_blank())+
scale_x_continuous(breaks = int_breaks)+
theme(legend.position = "none")+
# ylim(-2,1)+
theme( panel.grid = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))+
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))+
geom_vline(xintercept = c(as.numeric(abundance_model$gam$smooth[[3]]$xp)), col = "darkgrey", linetype = "dashed")
Fig2abc<- ggarrange(P1_resids, P2_resids, P3_resids, ncol = 3, align = "h", widths = c(1.2,1,1))
This chunk took 0.02 minutes
# just methods
abundance_model_methods <- mgcv::gamm(log10(TotalAbundance)~
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen,
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata,
family = gaussian)
paste("The partial R2 for methods is", summary(abundance_model_methods$gam)$r.sq)
## [1] "The partial R2 for methods is 0.313956729949413"
## no methods
abundance_model_nomethods <- mgcv::gamm(log10(TotalAbundance)~
s(HoursAfterSunrise, bs = "cr")+
s(month, bs = "cc")+
s(AgeAtSamplingYears, bs = "cr")+
# s(Seq_depth_nospikein, bs = "cr")+
#Storage+
# Seq_run+
# HoursTilFrozen,
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re"),
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata,
family = gaussian)
### minus time of day
load_r2_time<-mgcv::gamm(log10(TotalAbundance)~
# s(HoursAfterSunrise, bs = "cr")+
s(month, bs = "cc")+
s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re"),
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata,
family = gaussian)
paste("The partial R2 for time of day is", summary(abundance_model$gam)$r.sq - summary(load_r2_time$gam)$r.sq)
## [1] "The partial R2 for time of day is 0.123427692985502"
# minus month
load_r2_month<-mgcv::gamm(log10(TotalAbundance)~
s(HoursAfterSunrise, bs = "cr")+
# s(month, bs = "cc")+
s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re"),
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata,
family = gaussian)
paste("The partial R2 for month is", summary(abundance_model$gam)$r.sq - summary(load_r2_month$gam)$r.sq)
## [1] "The partial R2 for month is 0.0106767169116505"
# minus age
load_r2_age<-mgcv::gamm(log10(TotalAbundance)~
s(HoursAfterSunrise, bs = "cr")+
s(month, bs = "cc")+
# s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re"),
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata,
family = gaussian)
paste("The partial R2 for age is", summary(abundance_model$gam)$r.sq - summary(load_r2_age$gam)$r.sq)
## [1] "The partial R2 for age is 0.00964358646132946"
## id and social group
abundance_model_ID <- mgcv::gamm(log10(TotalAbundance)~
s(HoursAfterSunrise, bs = "cr")+
s(month, bs = "cc")+
s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen,
# s(IndividID, bs = "re")+
# s(GroupAtSampling, bs = "re"),
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata,
family = gaussian)
paste("The partial R2 for meerkat ID is", summary(abundance_model$gam)$r.sq - summary(abundance_model_ID$gam)$r.sq)
## [1] "The partial R2 for meerkat ID is 0.0211040082314377"
This chunk took 6.15 minutes
metadata$TimeInterval <- cut(metadata$HoursAfterSunrise, breaks = c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))
table(metadata$TimeInterval)
##
## (0,1] (1,2] (2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] (9,10]
## 35 145 204 203 109 42 4 6 20 57
## (10,11] (11,12] (12,13] (13,14] (14,15]
## 71 77 77 55 4
# remove intervals with less than 20 samples
newdata<-subset(metadata, TimeInterval != "(6,7]" & TimeInterval != "(7,8]" & TimeInterval != "(14,15]" )
table(newdata$TimeInterval)
##
## (0,1] (1,2] (2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] (9,10]
## 35 145 204 203 109 42 0 0 20 57
## (10,11] (11,12] (12,13] (13,14] (14,15]
## 71 77 77 55 0
newdata$TimeInterval<-factor(newdata$TimeInterval)
# sample 20 rows per time inetrval
set.seed(999)
newdata1<-newdata%>% group_by(TimeInterval) %>% sample_n(20)
## repeat model
reduced_model_load<-mgcv::gamm(log10(TotalAbundance)~
s(HoursAfterSunrise, bs = "cr")+
s(month, bs = "cc")+
s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen,
correlation = corARMA(form = ~ 1|Year, p = 3),
data=newdata1,
family = gaussian)
figS10a<-draw(reduced_model_load$gam, select = c(1:3), ncol = 3, scales = "fixed")
figS10a<-annotate_figure(figS10a, top = text_grob("a) 20 samples per hour", size = 16, face = "bold"))
figS10b<-draw(abundance_model_nomethods$gam, scales = "fixed", ncol = 3, select = c(1:3))
figS10b<-annotate_figure(figS10b, top = text_grob("b) Methods excluded", size = 16, face = "bold"))
ggarrange(figS10a, figS10b, ncol = 1)
This chunk took 0.06 minutes
# Test predictive power of model
# Create the training and test datasets
rsq_list<-list()
for(i in 1:100){
# Step 1: Get row numbers for the training data
trainRowNumbers <- createDataPartition(metadata$Storage, p=0.7, list=FALSE)
# Step 2: Create the training dataset
trainData <- metadata[trainRowNumbers,]
dim(trainData)
# Step 3: Create the test dataset
testData <- metadata[-trainRowNumbers,]
dim(testData)
# new model of train data
abundance_model_train <- mgcv::gamm(log10(TotalAbundance)~
s(HoursAfterSunrise, bs = "cr")+
s(month, bs = "cc")+
s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re"),
correlation = corARMA(form = ~ 1|Year, p = 3),
data=trainData,
family = gaussian)
# Predict on testData
testData$predicted <- mgcv::predict.gam(abundance_model_train$gam, testData)
ggplot(testData, aes(x = predicted, y = log10(TotalAbundance)))+geom_point()+geom_smooth(method = "lm")
rsq_list[[i]]<-summary(lm(log10(TotalAbundance) ~ predicted, data =testData))$r.sq
}
abundance_rsqs<-do.call(c, rsq_list)
summary(abundance_rsqs)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3229 0.4021 0.4366 0.4340 0.4603 0.5246
# R2 distribution of 100 test models
# black line is mean for test datasets
# red line is true R2 from full dataset
ggplot()+aes(abundance_rsqs)+
geom_histogram(fill = "grey", col = "black")+
theme_light(base_size = 10)+
xlab("R^2 distribution for test dataset - bacterial load")+
ylab("Frequency")+
geom_vline(xintercept = mean(abundance_rsqs), size = 2, linetype = "dashed")+
geom_vline(xintercept = summary(abundance_model$gam)$r.sq , size = 2, linetype = "dashed", col = "red")
# one sided t-test
t.test(abundance_rsqs, mu = summary(abundance_model$gam)$r.sq, alternative = "two.sided")
##
## One Sample t-test
##
## data: abundance_rsqs
## t = -8.5314, df = 99, p-value = 1.722e-13
## alternative hypothesis: true mean is not equal to 0.4662944
## 95 percent confidence interval:
## 0.4265305 0.4415368
## sample estimates:
## mean of x
## 0.4340336
This chunk took 93.35 minutes
We model observed ASV richness against time of day, month, and meerkat age using Generalized Additive Mixed Models whilst accounting for methodological variables (storage method, sampling depth, and sequencing run) as fixed linear effects, individual ID and social group as random effects, and including a autoregression term to account for temporal autocorrelation (Wood 2017).
# estimate alpha diversity
sample_data(meerkat_scaled)$Observed<-phyloseq::estimate_richness(meerkat_scaled, measures="Observed")
sample_data(meerkat_scaled)$Observed<-sample_data(meerkat_scaled)$Observed$Observed
sample_data(meerkat_scaled)$Shannon<-phyloseq::estimate_richness(meerkat_scaled, measures="Shannon")
sample_data(meerkat_scaled)$Shannon<-sample_data(meerkat_scaled)$Shannon$Shannon
## extract metadata as dataframe
metadata<-data.frame(sample_data(meerkat_scaled))
alpha_histograms<-ggarrange(ggplot(metadata, aes(x = Observed))+geom_histogram(fill = "grey", col = "black", boundary = 0) + ggtitle("ASV richness"),
ggplot(metadata, aes(x = Shannon))+geom_histogram(fill = "grey", col = "black", boundary = 0) + ggtitle("Shannon"))
annotate_figure(alpha_histograms, top = text_grob("Distribution of alpha diversity measures", size = 16, face = "bold"))
# scale variables around the mean
metadata$Seq_depth_nospikein<-as.numeric(scale(metadata$Seq_depth_nospikein))
metadata$HoursTilFrozen<-as.numeric(scale(metadata$HoursTilFrozen))
This chunk took 0.29 minutes
m_alpha <- mgcv::gamm(Observed~
s(HoursAfterSunrise, bs = "cr")+
s(month, bs = "cc")+
s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re"),
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata)
sjPlot::tab_model(m_alpha$gam, show.stat = TRUE, show.se = TRUE, title = "Table S3: GAMM predicting observed ASV richness. Note that the estimate for smooth terms is actually EDF (estimated degrees of freedom), it is not an 'estimate'. EDF reflects the degree of non-linearity of a curve (1 = linear)",
dv.labels = "Observed ASV richness",
pred.labels = c("Intercept", "Storage (frozen)", "Seq. run (2)", "Seq run (3)", "Seq. run (4)", "Time in field", "Hours after sunrise (non-linear)", "Month (non-linear)", "Age in years (non-linear)", "Seq. depth (non-linear)", "Meerkat ID (random)", "Social group (random)"))
Observed ASV richness | |||||
---|---|---|---|---|---|
Predictors | Estimates | std. Error | CI | Statistic | p |
Intercept | 153.23 | 7.27 | 138.97 – 167.49 | 21.08 | <0.001 |
Storage (frozen) | -8.29 | 6.59 | -21.22 – 4.65 | -1.26 | 0.209 |
Seq. run (2) | 32.70 | 8.42 | 16.16 – 49.23 | 3.88 | <0.001 |
Seq run (3) | 25.18 | 8.64 | 8.22 – 42.14 | 2.91 | 0.004 |
Seq. run (4) | 119.10 | 19.12 | 81.59 – 156.62 | 6.23 | <0.001 |
Time in field | 1.79 | 4.82 | -7.67 – 11.25 | 0.37 | 0.711 |
Hours after sunrise (non-linear) | 3.60 | 20.80 | <0.001 | ||
Month (non-linear) | 2.92 | 3.39 | <0.001 | ||
Age in years (non-linear) | 2.68 | 3.76 | 0.072 | ||
Seq. depth (non-linear) | 3.73 | 11.03 | <0.001 | ||
Meerkat ID (random) | 54.32 | 0.33 | 0.001 | ||
Social group (random) | 0.00 | 0.00 | 0.616 | ||
Observations | 1109 | ||||
R2 | 0.298 |
draw(m_alpha$gam, select = c(1:3), ncol = 3, scales = "fixed")
This chunk took 1.93 minutes
model_check_alpha<-gratia::appraise(m_alpha$gam)
annotate_figure(model_check_alpha, top = text_grob("Model check of observed ASV richness")) # check model
This chunk took 0.03 minutes
# generate table of model estimates
confint_df_time<-confint(m_alpha$gam, parm = "s(HoursAfterSunrise)", type = "confidence")
confint_df_season<-confint(m_alpha$gam, parm = "s(month)", type = "confidence")
confint_df_age<-confint(m_alpha$gam, parm = "s(AgeAtSamplingYears)", type = "confidence")
confint_df_time$Variable<-"HoursAfterSunrise"
names(confint_df_time)[3]<-"Time"
confint_df_season$Variable<-"month"
names(confint_df_season)[3]<-"Time"
confint_df_age$Variable<-"AgeAtSamplingYears"
names(confint_df_age)[3]<-"Time"
confint_df_alpha<-rbind(confint_df_time, confint_df_season, confint_df_age)
confint_df_alpha$Variable<-factor(confint_df_alpha$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))
### make y axis the real values, not estimates, by adding the model intercept (mean)
confint_df_alpha$est<-confint_df_alpha$est+summary(m_alpha$gam)$p.coeff[1]
confint_df_alpha$lower<-confint_df_alpha$lower+summary(m_alpha$gam)$p.coeff[1]
confint_df_alpha$upper<-confint_df_alpha$upper+summary(m_alpha$gam)$p.coeff[1]
## parital residuals
fv <- predict(m_alpha$gam,type="terms") ## get term estimates
prsd_time <- data.frame(as.numeric(residuals(m_alpha$gam,type="working")) + fv[,3])
names(prsd_time)<-"Residuals"
prsd_time$Time<-metadata$HoursAfterSunrise
prsd_time$Variable<-"HoursAfterSunrise"
prsd_time$predicted<-predict(m_alpha$gam,type="response")
prsd_month <-data.frame(as.numeric(residuals(m_alpha$gam,type="working")) + fv[,4])
names(prsd_month)<-"Residuals"
prsd_month$Time<-metadata$month
prsd_month$Variable<-"month"
prsd_month$predicted<-predict(m_alpha$gam,type="response")
prsd_age <- data.frame(as.numeric(residuals(m_alpha$gam,type="working")) + fv[,5])
names(prsd_age)<-"Residuals"
prsd_age$Time<-metadata$AgeAtSamplingYears
prsd_age$Variable<-"AgeAtSamplingYears"
prsd_age$predicted<-predict(m_alpha$gam,type="response")
resids_df<-rbind(prsd_time, prsd_month, prsd_age)
resids_df$est<-resids_df$predicted+resids_df$Residuals
## rearrange level order
resids_df$Variable<-factor(resids_df$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))
resids_alpha<-resids_df
This chunk took 0.02 minutes
alpha1_resids<-ggplot(subset(confint_df_alpha, Variable == "HoursAfterSunrise"), aes(x = Time, y = est))+
theme_light(base_size = 14)+
geom_point(data = subset(resids_alpha, Variable =="HoursAfterSunrise"),alpha = 0.5, aes(fill=Time), col = "white", pch = 21, size = 2)+
geom_line( size = 1, col = "black", alpha = 0.7)+
geom_ribbon(aes(ymin=lower, ymax=upper), linetype=2, alpha=0.3)+
theme( strip.background = element_blank(),
strip.text.x = element_blank())+
ylab("Observed ASV richness")+
xlab("Hours after sunrise")+
ylim(0,400)+
scale_x_continuous(breaks = int_breaks)+
theme(legend.position = "none")+
scale_color_viridis_c(option = "plasma", direction= -1)+
scale_fill_viridis_c(option = "plasma", direction= -1)+
theme( panel.grid = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))+
geom_vline(xintercept = c(as.numeric(m_alpha$gam$smooth[[1]]$xp)), col = "darkgrey", linetype = "dashed")
alpha2_resids<-ggplot(subset(confint_df_alpha, Variable == "month"), aes(x = Time, y = est))+
theme_light(base_size = 14)+
geom_jitter(data = subset(resids_alpha, Variable =="month"),alpha = 0.5, aes(fill=Time), col = "white", pch = 21, size = 2, width = 0.15)+
geom_line( size = 1, col = "black", alpha = 0.7)+
geom_ribbon(aes(ymin=lower, ymax=upper), linetype=2, alpha=0.3)+
ylim(0,400)+
xlab("Month")+
theme( strip.background = element_blank(),
strip.text.x = element_blank())+
theme( axis.title.y= element_blank(), axis.text.y = element_blank())+
scale_x_continuous(breaks = int_breaks)+
theme(legend.position = "none")+
scale_colour_gradientn(colours = c( "seagreen","seagreen","seagreen","seagreen","khaki","khaki","khaki","khaki","khaki","seagreen", "seagreen"))+
scale_fill_gradientn(colours = c( "seagreen","seagreen","seagreen","seagreen","khaki","khaki","khaki","khaki","khaki","seagreen", "seagreen"))+
theme( panel.grid = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))+
geom_vline(xintercept = c(as.numeric(m_alpha$gam$smooth[[2]]$xp)), col = "darkgrey", linetype = "dashed")
alpha3_resids<-ggplot(subset(confint_df_alpha, Variable == "AgeAtSamplingYears"), aes(x = Time, y = est))+
theme_light(base_size = 14)+
geom_point(data = subset(resids_alpha, Variable =="AgeAtSamplingYears"),alpha = 0.5, aes(fill=Time), col = "white", pch = 21, size = 2)+
geom_line( size = 1, col = "black", alpha = 0.7)+
geom_ribbon(aes(ymin=lower, ymax=upper), linetype=2, alpha=0.3)+
ylim(0,400)+
xlab("Age (years)")+
theme( strip.background = element_blank(),
strip.text.x = element_blank())+
theme(axis.title.y= element_blank(), axis.text.y = element_blank())+
scale_x_continuous(breaks = int_breaks)+
theme(legend.position = "none")+
scale_colour_gradientn(colours = c( "blue","blue","darkgray","darkgray","darkgray","darkgray", "red", "red", "red", "red", "red", "red"))+
scale_fill_gradientn(colours = c( "blue","blue","darkgray","darkgray","darkgray","darkgray", "red", "red", "red", "red", "red", "red"))+
theme( panel.grid = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))+
geom_vline(xintercept = c(as.numeric(m_alpha$gam$smooth[[3]]$xp)), col = "darkgrey", linetype = "dashed")
Fig2def<-ggarrange(alpha1_resids, alpha2_resids, alpha3_resids, ncol = 3, widths = c(1.2,1,1))
This chunk took 0 minutes
## methods
partial_r2_methods <- mgcv::gamm(Observed~
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen,
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata)
paste("The partial R2 for methods is", summary(partial_r2_methods$gam)$r.sq)
## [1] "The partial R2 for methods is 0.166001049590537"
## no methods
partial_r2_nomethods <- mgcv::gamm(Observed~
s(HoursAfterSunrise, bs = "cr")+
s(month, bs = "cc")+
s(AgeAtSamplingYears, bs = "cr")+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re"),
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata,
family = gaussian)
## Partial R2 time of day
partial_r2_time <- mgcv::gamm(Observed~
s(month, bs = "cc")+
s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re"),
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata)
paste("The partial R2 for time of day is", summary(m_alpha$gam)$r.sq - summary(partial_r2_time$gam)$r.sq)
## [1] "The partial R2 for time of day is 0.0581281837543095"
## Partial R2 month
partial_r2_month <- mgcv::gamm(Observed~
s(HoursAfterSunrise, bs = "cr")+
s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re"),
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata)
paste("The partial R2 for month is", summary(m_alpha$gam)$r.sq - summary(partial_r2_month$gam)$r.sq)
## [1] "The partial R2 for month is 0.0310567707985488"
## Partial R2 age
partial_r2_age <- mgcv::gamm(Observed~
s(HoursAfterSunrise, bs = "cr")+
s(month, bs = "cc")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re"),
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata)
paste("The partial R2 for age is", summary(m_alpha$gam)$r.sq - summary(partial_r2_age$gam)$r.sq)
## [1] "The partial R2 for age is 0.00704094084021467"
## id and social group
partial_r2_ID <- mgcv::gamm(Observed~
s(HoursAfterSunrise, bs = "cr")+
s(month, bs = "cc")+
s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen,
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata)
paste("The partial R2 for ID and social group is", summary(m_alpha$gam)$r.sq - summary(partial_r2_ID$gam)$r.sq)
## [1] "The partial R2 for ID and social group is 0.044227390218747"
This chunk took 5.87 minutes
metadata$TimeInterval <- cut(metadata$HoursAfterSunrise, breaks = c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))
table(metadata$TimeInterval)
##
## (0,1] (1,2] (2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] (9,10]
## 35 145 204 203 109 42 4 6 20 57
## (10,11] (11,12] (12,13] (13,14] (14,15]
## 71 77 77 55 4
unique(metadata$TimeInterval)
## [1] (1,2] (13,14] (2,3] (5,6] (3,4] (10,11] (4,5] (11,12] (12,13]
## [10] (0,1] (9,10] (8,9] (14,15] (7,8] (6,7]
## 15 Levels: (0,1] (1,2] (2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] ... (14,15]
newdata<-subset(metadata, TimeInterval != "(6,7]" & TimeInterval != "(7,8]" & TimeInterval != "(14,15]" )
table(newdata$TimeInterval)
##
## (0,1] (1,2] (2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] (9,10]
## 35 145 204 203 109 42 0 0 20 57
## (10,11] (11,12] (12,13] (13,14] (14,15]
## 71 77 77 55 0
newdata$TimeInterval<-factor(newdata$TimeInterval)
set.seed(999)
newdata1<-newdata%>% group_by(TimeInterval) %>% sample_n(20)
## repeat model
dim(newdata1)
## [1] 240 30
reduced_model_alpha<-mgcv::gamm(Observed~
s(HoursAfterSunrise, bs = "cr")+
s(month, bs = "cc")+
s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen,
correlation = corARMA(form = ~ 1|Year, p = 3),
data=newdata1,
family = gaussian)
figS11a<-draw(reduced_model_alpha$gam, select = c(1:3), ncol = 3, scales = "fixed") # Fig S11a
figS11a<-annotate_figure(figS11a, top = text_grob("a) 20 samples per hour", size = 14, face = "bold"))
figS11b<-draw(partial_r2_nomethods$gam, scales = "fixed", ncol = 3, select = c(1:3)) # Fig S11b
figS11b<-annotate_figure(figS11b, top = text_grob("b) Methods excluded", size = 14, face = "bold"))
ggarrange(figS11a, figS11b, ncol = 1)
This chunk took 0.05 minutes
# Test predictive power of model
# Create the training and test datasets
rsq_list<-list()
for(i in 1:100){
# Step 1: Get row numbers for the training data
trainRowNumbers <- createDataPartition(metadata$Storage, p=0.7, list=FALSE)
# Step 2: Create the training dataset
trainData <- metadata[trainRowNumbers,]
# Step 3: Create the test dataset
testData <- metadata[-trainRowNumbers,]
dim(testData)
m_alpha_train <- mgcv::gamm(Observed~
s(HoursAfterSunrise, bs = "cr")+
s(month, bs = "cc")+
s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re"),
correlation = corAR1(), #residual correlation structure
data=trainData)
# Predict on testData
testData$predicted <- mgcv::predict.gam(m_alpha_train$gam, testData)
ggplot(testData, aes(x = predicted, y = Observed))+geom_point()+geom_smooth(method = "lm")
rsq_list[[i]]<-summary(lm(Observed ~ predicted, data =testData))$r.sq
}
alpha_rsqs<-do.call(c, rsq_list)
summary(alpha_rsqs)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1343 0.1983 0.2383 0.2318 0.2658 0.3019
# black line is mean R2 for test data
# red line is true R2 from all data
ggplot()+aes(alpha_rsqs)+
geom_histogram(fill = "grey", col = "black")+
theme_light(base_size = 10)+
xlab("R square distribution")+
ylab("Frequency")+
geom_vline(xintercept = mean(alpha_rsqs), size = 2, linetype = "dashed")+
geom_vline(xintercept = summary(m_alpha$gam)$r.sq , size = 2, linetype = "dashed", col = "red")
# one sided t-test
t.test(alpha_rsqs, mu = summary(m_alpha$gam)$r.sq, alternative = "two.sided")
##
## One Sample t-test
##
## data: alpha_rsqs
## t = -15.881, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0.2977478
## 95 percent confidence interval:
## 0.2235439 0.2400265
## sample estimates:
## mean of x
## 0.2317852
This chunk took 124.73 minutes
m_shannon <- mgcv::gamm(Shannon~
s(HoursAfterSunrise, bs = "cr")+
s(month, bs = "cc")+
s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re"),
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata)
## model summary
sjPlot::tab_model(m_shannon$gam, show.stat = TRUE, show.se = TRUE, title = "Extra table: GAMM predicting Shannon diversity. Note that the estimate for smooth terms is actually EDF (estimated degrees of freedom), it is not an 'estimate'. EDF reflects the degree of non-linearity of a curve (1 = linear)",
dv.labels = "Shannon diversity",
pred.labels = c("Intercept", "Storage (frozen)", "Seq. run (2)", "Seq run (3)", "Seq. run (4)", "Time in field", "Hours after sunrise (non-linear)", "Month (non-linear)", "Age in years (non-linear)", "Seq. depth (non-linear)", "Meerkat ID (random)", "Social group (random)"))
Shannon diversity | |||||
---|---|---|---|---|---|
Predictors | Estimates | std. Error | CI | Statistic | p |
Intercept | 3.23 | 0.06 | 3.11 – 3.35 | 52.39 | <0.001 |
Storage (frozen) | -0.31 | 0.06 | -0.43 – -0.20 | -5.40 | <0.001 |
Seq. run (2) | 0.32 | 0.07 | 0.17 – 0.46 | 4.32 | <0.001 |
Seq run (3) | 0.31 | 0.08 | 0.16 – 0.46 | 4.09 | <0.001 |
Seq. run (4) | 0.83 | 0.15 | 0.53 – 1.13 | 5.45 | <0.001 |
Time in field | 0.03 | 0.04 | -0.05 – 0.11 | 0.76 | 0.446 |
Hours after sunrise (non-linear) | 2.82 | 4.59 | 0.009 | ||
Month (non-linear) | 1.37 | 0.40 | 0.077 | ||
Age in years (non-linear) | 3.46 | 6.50 | <0.001 | ||
Seq. depth (non-linear) | 1.00 | 21.32 | <0.001 | ||
Meerkat ID (random) | 23.02 | 0.12 | 0.067 | ||
Social group (random) | 0.00 | 0.00 | 0.911 | ||
Observations | 1109 | ||||
R2 | 0.152 |
# model estimates (plot)
shannon_effects<-draw(m_shannon$gam, select = c(1:3), ncol = 3, scales = "fixed")
annotate_figure(shannon_effects, top = text_grob("GAMM estimates for Shannon diversity", size = 14, face = "bold")) # check model
This chunk took 2.1 minutes
We next explore shifts in beta diversity across the three temporal scales by ordinating taxa composition using a Multi-Dimensional Scaling (MDS) using Weighted Unifrac distance. We also present results for unweighted unifrac, and results of PERMANOVAS.
scaled_core<-microbiome::core(meerkat_scaled, detection = 50, prevalence = 0)
paste("This filtering step removed", 1-round(sum(sample_sums(scaled_core))/sum(sample_sums(meerkat_scaled)), 3), "% of reads")
## [1] "This filtering step removed 0.002 % of reads"
This chunk took 0.02 minutes
ord_wunifrac_mds<-ordinate(scaled_core, "MDS", "wunifrac")
ord_unifrac_mds<-ordinate(scaled_core, "MDS", "unifrac")
dist_wunifrac<-distance(scaled_core, method="wunifrac")
dist_unifrac<-distance(scaled_core, method="unifrac")
This chunk took 29.19 minutes
qplot(y = ord_wunifrac_mds$values$Relative_eig[1:10], x = factor(1:10), geom="col", ylab = "Relative eingenvalue", xlab = "Axis", main = "Scree plot")+ ggtitle("Weighted Unifrac axis contributions")+ylim(0,0.65)+
qplot(y = ord_unifrac_mds$values$Relative_eig[1:10], x = factor(1:10), geom="col", ylab = "Relative eingenvalue", xlab = "Axis", main = "Scree plot")+ ggtitle("Unweighted Unifrac axis contributions")+ylim(0,0.65)
This chunk took 0.01 minutes
### ADD COLUMNS TO METADATA
sample_data(meerkat_scaled)$MDS1<-data.frame(ord_wunifrac_mds$vectors)$Axis.1
sample_data(meerkat_scaled)$MDS2<-data.frame(ord_wunifrac_mds$vectors)$Axis.2
sample_data(meerkat_scaled)$MDS3<-data.frame(ord_wunifrac_mds$vectors)$Axis.3
sample_data(meerkat_scaled)$MDS4<-data.frame(ord_wunifrac_mds$vectors)$Axis.4
## unweighted unifrac
sample_data(meerkat_scaled)$MDS1_UU<-data.frame(ord_unifrac_mds$vectors)$Axis.1
sample_data(meerkat_scaled)$MDS2_UU<-data.frame(ord_unifrac_mds$vectors)$Axis.2
### convert to dataframe
metadata<-data.frame(sample_data(meerkat_scaled))
########
cent <- aggregate(cbind(MDS1, MDS2) ~ SampleTimeCat, data = metadata, FUN = mean)
beta1<-ggplot(metadata, aes(x = MDS1, y = MDS2)) +
theme_bw(base_size = 14)+
stat_ellipse(geom = "polygon",aes(col = SampleTimeCat), alpha = 0, size = 1)+
geom_point(aes(fill = HoursAfterSunrise), pch = 21, size = 2, alpha = 0.8, col = "white") + # spiders
#geom_point(aes(fill = HoursAfterSunrise), pch = 21, size = 1.5) +
scale_fill_viridis_c(option = "plasma", direction= -1)+
scale_color_manual(values = c("darkmagenta", "darkgoldenrod1"))+
geom_point(data = cent, size = 6, aes(col = SampleTimeCat), stroke = 1, alpha = 0.8) +
geom_point(data = cent, size = 6, col = "black", stroke = 1, pch = 21, alpha = 0.8) +
theme(legend.position = "none")+
theme(plot.margin = unit(c(0.2,0.3,0.3,0.4), "cm"))+
ylab("Axis 2 (17.1%)")+
xlab("Axis 1 (25.8%)")+
theme( panel.grid = element_blank())
#### season
cent <- aggregate(cbind(MDS1, MDS2) ~ Season, data = metadata, FUN = mean)
beta2<-ggplot(metadata, aes(x = MDS1, y = MDS2)) +
theme_bw(base_size = 14)+
stat_ellipse(geom = "polygon",aes(col = Season), alpha = 0, size = 1)+
geom_point(aes(fill = month), pch = 21, size = 2, alpha = 0.8, col = "white") +
scale_fill_gradientn(colours = c( "seagreen","seagreen","seagreen","seagreen","khaki","khaki","khaki","khaki","khaki","seagreen", "seagreen"))+
# geom_point(aes(fill = Season), pch = 21, size = 1.5) +
# scale_fill_manual(values = c( "seagreen","khaki" ))+
scale_color_manual(values = c( "seagreen","khaki"))+
geom_point(data = cent, size = 6, aes(col = Season), stroke = 1, alpha = 0.8) +
geom_point(data = cent, size = 6, col = "black", stroke = 1, pch = 21, alpha = 0.7) +
theme(axis.text.y = element_blank(), axis.title.y = element_blank())+
theme(legend.position = "none")+
xlab("Axis 1 (25.8%)")+
theme( panel.grid = element_blank())
#### age
cent <- aggregate(cbind(MDS1, MDS2) ~ AgeCat, data = metadata, FUN = mean)
beta3<-ggplot(metadata, aes(x = MDS1, y = MDS2)) +
theme_bw(base_size = 14)+
stat_ellipse(geom = "polygon",aes(col = AgeCat), alpha = 0, size = 1)+
geom_point(aes(fill = AgeAtSamplingYears), pch = 21, size = 2, alpha = 0.8, col = "white") + # spiders
#geom_point(aes(fill = AgeCat), pch = 21, size = 1.5) +
scale_fill_gradientn(colours = c( "blue","blue","darkgray","darkgray","darkgray","darkgray", "red", "red", "red", "red", "red", "red"))+
scale_color_manual(values = c( "red", "blue", "darkgrey"))+
geom_point(data = cent, size = 6, aes(col = AgeCat), stroke = 1, alpha = 0.8) +
geom_point(data = cent, size = 6, col = "black", stroke = 1, pch = 21, alpha = 0.7) +
theme(axis.text.y = element_blank(), axis.title.y = element_blank())+
theme(legend.position = "none")+
xlab("Axis 1 (25.8%)")+
theme( panel.grid = element_blank())
Fig2ghi<-ggarrange(beta1, beta2, beta3, ncol = 3, widths = c(1.2,1,1))
This chunk took 0.12 minutes
cent <- aggregate(cbind(MDS1_UU, MDS2_UU) ~ SampleTimeCat, data = metadata, FUN = mean)
beta1_UU<-ggplot(metadata, aes(x = MDS1_UU, y = MDS2_UU)) +
theme_bw(base_size = 12)+
stat_ellipse(geom = "polygon",aes(col = SampleTimeCat), alpha = 0, size = 1)+
geom_point(aes(fill = HoursAfterSunrise), pch = 21, size = 2, alpha = 0.8, col = "white") + # spiders
#geom_point(aes(fill = HoursAfterSunrise), pch = 21, size = 1.5) +
scale_fill_viridis_c(option = "plasma", direction= -1)+
scale_color_manual(values = c("darkmagenta", "darkgoldenrod1"))+
geom_point(data = cent, size = 6, aes(col = SampleTimeCat), stroke = 1, alpha = 0.8) +
geom_point(data = cent, size = 6, col = "black", stroke = 1, pch = 21, alpha = 0.8) +
#theme(legend.position = "none")+
theme(plot.margin = unit(c(0.2,0.3,0.3,0.4), "cm"))+
ylab("Axis 2 (9%)")+
xlab("Axis 1 (16%)")+
theme( panel.grid = element_blank())
#### season
cent <- aggregate(cbind(MDS1_UU, MDS2_UU) ~ Season, data = metadata, FUN = mean)
beta2_UU<-ggplot(metadata, aes(x = MDS1_UU, y = MDS2_UU)) +
theme_bw(base_size = 12)+
stat_ellipse(geom = "polygon",aes(col = Season), alpha = 0, size = 1)+
geom_point(aes(fill = month), pch = 21, size = 2, alpha = 0.8, col = "white") +
scale_fill_gradientn(colours = c( "seagreen","seagreen","seagreen","seagreen","khaki","khaki","khaki","khaki","khaki","seagreen", "seagreen"))+
scale_color_manual(values = c( "seagreen","khaki"))+
geom_point(data = cent, size = 6, aes(col = Season), stroke = 1, alpha = 0.8) +
geom_point(data = cent, size = 6, col = "black", stroke = 1, pch = 21, alpha = 0.7) +
theme(axis.text.y = element_blank(), axis.title.y = element_blank())+
# theme(legend.position = "none")+
xlab("Axis 1 (16%)")+
theme( panel.grid = element_blank())
#### age
cent <- aggregate(cbind(MDS1_UU, MDS2_UU) ~ AgeCat, data = metadata, FUN = mean)
beta3_UU<-ggplot(metadata, aes(x = MDS1_UU, y = MDS2_UU)) +
theme_bw(base_size = 12)+
stat_ellipse(geom = "polygon",aes(col = AgeCat), alpha = 0, size = 1)+
geom_point(aes(fill = AgeAtSamplingYears), pch = 21, size = 2, alpha = 0.8, col = "white") + # spiders
scale_fill_gradientn(colours = c( "blue","blue","darkgray","darkgray","darkgray","darkgray", "red", "red", "red", "red", "red", "red"))+
scale_color_manual(values = c( "red", "blue", "darkgrey"))+
geom_point(data = cent, size = 6, aes(col = AgeCat), stroke = 1, alpha = 0.8) +
geom_point(data = cent, size = 6, col = "black", stroke = 1, pch = 21, alpha = 0.7) +
theme(axis.text.y = element_blank(), axis.title.y = element_blank())+
#theme(legend.position = "none")+
xlab("Axis 1 (16%)")+
theme( panel.grid = element_blank())
ggarrange(beta1_UU, beta2_UU, beta3_UU, ncol = 3, widths = c(1.2,1,1))
This chunk took 0.03 minutes
metadata<-data.frame(sample_data(scaled_core))
dim(metadata)
## [1] 1109 29
perm_wunifrac_marginal<-adonis2(dist_wunifrac~
Storage+
Seq_run+
HoursTilFrozen+
Seq_depth_nospikein +
month+
AgeAtSamplingYears+
HoursAfterSunrise+
GroupAtSampling+
IndividID,
data = metadata,
by = "margin")
perm_unifrac_marginal<-adonis2(dist_unifrac~
Storage+
Seq_run+
HoursTilFrozen+
Seq_depth_nospikein +
month+
AgeAtSamplingYears+
HoursAfterSunrise+
GroupAtSampling+
IndividID,
data = metadata,
by = "margin")
This chunk took 233.6 minutes
# weighted unifrac
perm_wunifrac_marginal
# unweighted unifrac
perm_unifrac_marginal
This chunk took 0.01 minutes
ggarrange(Fig2abc,Fig2def,Fig2ghi ,ncol = 1)
#ggsave("FIGURES/figure2.pdf",width = 9, height = 9, units = c("in"))
This chunk took 0.18 minutes
# replace unknown genera with eg 'unknown Lachnospiraceae genus'
taxtable<-data.frame(scaled_core@tax_table@.Data)
taxtable$Family<-as.character(taxtable$Family)
taxtable$Genus<-as.character(taxtable$Genus)
taxtable$Genus<-ifelse(is.na(taxtable$Genus) , paste("Unknown", taxtable$Family, sep = " "), taxtable$Genus)
taxtable$Genus<-ifelse(taxtable$Genus== "uncultured" , paste("Unknown", taxtable$Family, sep = " "), taxtable$Genus)
taxtable$Genus<-ifelse(taxtable$Genus== "uncultured bacterium" , paste("Unknown", taxtable$Family, sep = " "), taxtable$Genus)
row.names(taxtable)<-taxa_names(scaled_core)
taxtable<-as.matrix(taxtable)
tax_table(scaled_core)<-taxtable
#scaled_core<-tip_glom(scaled_core, h = 0.2)
scaled_core_genus<-speedyseq::tax_glom(scaled_core, taxrank = "Genus")
#### remove some unknown or strange genera
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "Unknown uncultured bacterium")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "uncultured")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "uncultured organism")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "uncultured bacterium")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "Unknown uncultured")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "gut metagenome")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "uncultured Chloroflexi bacterium")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "metagenome")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "uncultured Firmicutes bacterium")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "uncultured rumen bacterium")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "uncultured soil bacterium")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "Unknown Unknown Family")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "BCf9-17 termite group")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "Blattella germanica (German cockroach)")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "CPla-4 termite group" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "DSSF69" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "E1B-B3-114" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "Family XIII AD3011 group" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "FFCH7168" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "GCA-900066225" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "hydrothermal vent metagenome" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "Incertae Sedis" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "M2PB4-61 termite group" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "OLB13" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "OM27 clade" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "Allorhizobium-Neorhizobium-Pararhizobium-Rhizobium" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "OM27 clade" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "CHKCI001" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "CHKCI001" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "Termite planctomycete cluster" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "uncultured Lachnospiraceae bacterium")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "Unknown gir-aah93h0" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "Unknown JG30-KF-CM45" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "Unknown Rs-E47 termite group" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "UBA1819" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "ZOR0006" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "GCA-900066575" )
taxtable<-data.frame(scaled_core_genus@tax_table@.Data)$Genus
taxa_names(scaled_core_genus)<-taxtable
#taxa_names(scaled_core_genus)=="Unknown Bacillaceae"
This chunk took 0.11 minutes
dominant_phylo<-scaled_core_genus
sample_data(dominant_phylo)<-sample_data(dominant_phylo)[,"feature.id"]
# melt data
psmelt_df<-speedyseq::psmelt(dominant_phylo)
head(psmelt_df)
dominant_taxa_df<-psmelt_df %>%
group_by(feature.id) %>%
arrange(feature.id, -Abundance) %>%
dplyr::mutate(rank=rank(-Abundance))
dominant_taxa_df<-subset(dominant_taxa_df, rank ==1)
dominant_taxa_df$Genus<-factor(dominant_taxa_df$Genus)
# add to metadata
sample_data(scaled_core_genus)$DomGenus <-vlookup(sample_data(scaled_core_genus)$feature.id, dominant_taxa_df, lookup_column = "Sample", result_column = "Genus")
## colour by focus genera (we justify this list later)
# but for now want the colours to match later figures of the focus genera
top_genera<-c( "Clostridium sensu stricto 1" ,
"Blautia" ,
"Unknown Lachnospiraceae",
"Enterococcus" ,
"[Ruminococcus] torques group",
"Bacteroides",
"Paeniclostridium",
"Romboutsia" ,
"Lachnoclostridium",
"Lactococcus" ,
"Peptococcus",
"Helicobacter" ,
"Unknown Bacillaceae",
"Raoultibacter" ,
"Cellulomonas",
"Eubacterium" ,
"Other" )
sample_data(scaled_core_genus)$DomGenus_plot<-ifelse(sample_data(scaled_core_genus)$DomGenus %in% top_genera, as.character(sample_data(scaled_core_genus)$DomGenus), "Other")
sample_data(scaled_core_genus)$DomGenus_plot<-factor(sample_data(scaled_core_genus)$DomGenus_plot, levels =top_genera)
This chunk took 0.02 minutes
## colours scale
mypaly<-brewer.pal(12,"Paired")
mypaly[16]<-"darkgreen"
mypaly[5]<-"red"
mypaly[8]<-"darkorange"
mypaly[7]<-"darkblue"
mypaly[9]<-"magenta"
mypaly[10]<-"cyan"
mypaly[11]<-"#ffe20b"
mypaly[6]<-"#ae0001"
mypaly[12]<-"#795548"
mypaly[13]<-"peachpuff"
mypaly[14]<-"blue"
mypaly[15]<-"mediumorchid"
mypaly[15]<-"#8B008B"
mypaly[4]<-"forestgreen"
mypaly[17]<-"grey"
mypaly<-mypaly[c(1,4,3,2,5:8, 10,9,11:17)]
This chunk took 0 minutes
#Merge by genus
meerkat_genus <- scaled_core_genus %>%
speedyseq::tax_glom(taxrank = "Genus") %>% # agglomerate at phylum level
transform_sample_counts(function(x) {x/sum(x)} ) %>% # Transform to rel. abundance
psmelt() %>% # Melt to long format
arrange(Genus) # Sort data frame alphabetically by family
head(meerkat_genus)
dim(meerkat_genus)
## [1] 648765 40
# generate column for time interval
meerkat_genus$TimeInterval<-cut(meerkat_genus$HoursAfterSunrise, breaks = c(0,0.5,1,1.5,2,2.5,3,3.5,4,4.5,5,5.5,6,6.5,7,7.5,8,8.5,9,9.5,10,10.5,11,11.5,12,12.5,13,13.5,14,14.5))
#head(meerkat_genus)
meerkat_genus$Genus_plot<-ifelse(meerkat_genus$Genus %in% top_genera == TRUE, as.character(meerkat_genus$Genus), "Other")
meerkat_genus$Genus_plot<-factor(meerkat_genus$Genus_plot, levels = top_genera)
# merge
merged_samples<-meerkat_genus %>% group_by(TimeInterval, Genus_plot) %>% dplyr::summarize(mean_rel_abundance = sum(Abundance))
# barplot by half an hour intervals
barplot_by_time<- ggplot( merged_samples, aes(x = TimeInterval, y = mean_rel_abundance, fill = Genus_plot)) +
geom_col( width = 1, position = "fill", col = "black") +
scale_fill_manual(values = mypaly)+
ylab("Relative abundance")+
xlab("30 minute time intervals after sunrise")+
theme_light(base_size = 14)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
theme(legend.position = "none")+
theme(plot.margin=unit(c(0,0.2,0.2,0.2),"cm"))
sample_data(meerkat_scaled)$TimeInterval<- cut(sample_data(meerkat_scaled)$HoursAfterSunrise, breaks = c(0,0.5,1,1.5,2,2.5,3,3.5,4,4.5,5,5.5,6,6.5,7,7.5,8,8.5,9,9.5,10,10.5,11,11.5,12,12.5,13,13.5,14,14.5))
hist_plot<-data.frame(table(sample_data(meerkat_scaled)$TimeInterval))
histogram<-ggplot(hist_plot, aes(x = Var1, y = Freq))+
geom_bar(stat = "identity")+
geom_text(aes(label=Freq), position=position_dodge(width=0.9), vjust=-0.25)+
theme_bw(base_size = 14)+
theme(axis.text.x = element_blank())+
theme(axis.title.x = element_blank())+
ylab("No. samples")+
ylim(0,150)+
theme(plot.margin=unit(c(0.5,0.2,0,0.2),"cm"))
Fig3_2<-ggarrange(histogram, barplot_by_time, ncol = 1, heights = c(0.3,1),
legend = FALSE, labels = c("a)"), font.label = list(size = 20), hjust = 1)
This chunk took 2.1 minutes
metadata<-data.frame(sample_data(scaled_core_genus))
vectors_df<-ord_wunifrac_mds$vectors
## model using continuous variables for time of day and age (to get partial R2 for each)
# axes 1 and 2 using continuous variables
(fit_continuous <- envfit(vectors_df, metadata[,c("HoursAfterSunrise", "AgeAtSamplingYears", "Season", "Storage", "Seq_run", "Seq_depth_nospikein")], perm = 999, na.rm = TRUE, choices=c(1,2)))
##
## ***VECTORS
##
## Axis.1 Axis.2 r2 Pr(>r)
## HoursAfterSunrise 0.83127 0.55586 0.2695 0.001 ***
## AgeAtSamplingYears -0.95336 0.30184 0.0012 0.482
## Seq_depth_nospikein -0.81101 -0.58503 0.0495 0.001 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
##
## ***FACTORS:
##
## Centroids:
## Axis.1 Axis.2
## SeasonDRY WINTER -0.0031 -0.0139
## SeasonWET SUMMER 0.0019 0.0084
## StorageFREEZEDRIED -0.0059 -0.0233
## StorageFROZEN 0.0083 0.0328
## Seq_runRUN1 -0.0258 0.0199
## Seq_runRUN2 -0.0014 -0.0123
## Seq_runRUN3 0.0247 -0.0067
## Seq_runRUN4 0.0135 -0.0019
##
## Goodness of fit:
## r2 Pr(>r)
## Season 0.0040 0.015 *
## Storage 0.0261 0.001 ***
## Seq_run 0.0188 0.001 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
This chunk took 0.11 minutes
# axes 3 and 4 nusing continuous variables
(fit_continuous <- envfit(vectors_df, metadata[,c("HoursAfterSunrise", "AgeAtSamplingYears", "Season", "Storage", "Seq_run", "Seq_depth_nospikein")], perm = 999, na.rm = TRUE, choices=c(3,4)))
##
## ***VECTORS
##
## Axis.3 Axis.4 r2 Pr(>r)
## HoursAfterSunrise -0.99910 0.04235 0.0029 0.222
## AgeAtSamplingYears -0.91956 -0.39295 0.0428 0.001 ***
## Seq_depth_nospikein 0.95102 0.30914 0.0061 0.022 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
##
## ***FACTORS:
##
## Centroids:
## Axis.3 Axis.4
## SeasonDRY WINTER 0.0118 0.0051
## SeasonWET SUMMER -0.0071 -0.0031
## StorageFREEZEDRIED 0.0112 -0.0076
## StorageFROZEN -0.0157 0.0107
## Seq_runRUN1 -0.0280 -0.0061
## Seq_runRUN2 0.0236 -0.0085
## Seq_runRUN3 -0.0039 0.0108
## Seq_runRUN4 0.0299 0.0193
##
## Goodness of fit:
## r2 Pr(>r)
## Season 0.0074 0.001 ***
## Storage 0.0190 0.001 ***
## Seq_run 0.0431 0.001 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
This chunk took 0.06 minutes
metadata<-metadata[,c("SampleTimeCat", "AgeCat", "Season", "Storage", "Seq_run", "Seq_depth_nospikein")]
This chunk took 0 minutes
fit <- envfit(vectors_df, metadata, perm = 999, na.rm = TRUE, choices=c(1,2))
df<-data.frame(vectors_df)
df$Dom_taxa<-sample_data(scaled_core_genus)$DomGenus_plot
df$TotalAbundance<-sample_data(scaled_core_genus)$TotalAbundance
#Get the vectors for bioenv.fit
df_biofit<-vegan::scores(fit,display=c("factors"), choices=1:2)
## scaling factor for arrows to fill 30% of plot
mul <- ordiArrowMul(df_biofit, fill = 0.3)
df_biofit <- df_biofit * mul # Scale the biplot scores
df_biofit<-as.data.frame(df_biofit)
# remove methodological variables from biofit dataframe,
#as I am just interested in biological variables after controling for methods
#head(df_biofit)
df_biofit<- df_biofit[c("SampleTimeCatAFTERNOON","SampleTimeCatMORNING"),]
row.names(df_biofit)<-c("Afternoon", "Morning")
### spider plot
cent <- aggregate(cbind(Axis.1, Axis.2) ~ Dom_taxa, data = df, FUN = mean)
segs <- merge(df, setNames(cent, c('Dom_taxa','oNMDS1','oNMDS2')),
by = 'Dom_taxa', sort = FALSE)
df1<-df
df_biofit1<-df_biofit
cent1<-cent
segs1<-segs
# order taxa so that they at least slightly match later figures!
df1$Dom_taxa<-factor(df1$Dom_taxa,levels=top_genera)
segs1$Dom_taxa<-factor(segs1$Dom_taxa,levels=top_genera)
P1<-ggplot(df1, aes(x = Axis.1, y = Axis.2)) +
theme_light(base_size = 14)+
geom_point(aes(fill = Dom_taxa), size = 3, pch = 21, alpha = 0.8, col = "white") + # spiders
scale_fill_manual(values = mypaly[c(-16)])+
scale_color_manual(values = mypaly[c(-16)])+
geom_jitter(data = cent1, size = 7, aes(fill = Dom_taxa), col = "black", pch = 21, stroke = 1.5, width = 0.02, height = 0.02) +
geom_segment(data=df_biofit1, aes(x = 0, y = 0, xend = Axis.1*0.7, yend = Axis.2*0.7),
arrow = arrow(length = unit(0.5, "cm"), type = "closed"), lwd = 1, col = "black")+
geom_label(data=df_biofit1,
alpha = 0.9, col = "black", size = 4, fill = "yellow",
hjust = c(0,1),
vjust = c(1,1),
aes(Axis.1*0.7,Axis.2*0.7, label = rownames(df_biofit1)))+
xlab("Axis 1 (25.8%)")+
ylab("Axis 2 (17.1%)")+
guides(fill = guide_legend(override.aes = list(size = 6), ncol = 5))+
theme(legend.title = element_blank())+
annotate("text", x = 0, y = 0, label = "X", size = 7)+
theme(plot.margin = margin(0.2,0.2,1,0.2, "cm"))+
annotate(geom="text", x=-0.3, y=-0.27, label="Hours after sunrise: R^2 = 0.27, p < 0.001",
color="black", size = 5, hjust = 0)+
xlim(-0.3,0.3)
This chunk took 0.04 minutes
metadata<-data.frame(sample_data(scaled_core_genus))
metadata<-metadata[,c("SampleTimeCat", "AgeCat", "Season", "Storage", "Seq_run", "Seq_depth_nospikein")]
vectors_df<-ord_wunifrac_mds$vectors
# Model variables contributing to axis 3 and 4
fit <- envfit(vectors_df, metadata, perm = 999, na.rm = TRUE, choices=c(3,4))# axes 3 and 4
df<-data.frame(vectors_df)
df$Dom_taxa<-sample_data(scaled_core_genus)$DomGenus_plot
df$TotalAbundance<-sample_data(scaled_core_genus)$TotalAbundance
#Get the vectors for bioenv.fit
df_biofit<-vegan::scores(fit,display=c("factors"))
## scaling factor for arrows to fill 80% of plot
mul <- ordiArrowMul(df_biofit, fill = 0.3)
df_biofit <- df_biofit * mul # Scale the biplot scores
df_biofit<-as.data.frame(df_biofit)
# remove methodological variables from biofit dataframe,
#as I am just interested in biological variables after controling for methods
df_biofit<- df_biofit[c("AgeCatMIDDLEAGED", "AgeCatOLD", "AgeCatYOUNG", "SeasonDRY", "SeasonWET"),]
row.names(df_biofit)<-c("Adult", "Old", "Young", "Dry season", "Wet season" )
### spider plot
cent <- aggregate(cbind(Axis.3, Axis.4) ~ Dom_taxa, data = df, FUN = mean)
segs <- merge(df, setNames(cent, c('Dom_taxa','oNMDS1','oNMDS2')),
by = 'Dom_taxa', sort = FALSE)
df2<-df
df_biofit2<-df_biofit
cent2<-cent
segs2<-segs
# order taxa so that they at least slightly match later figures!
df2$Dom_taxa<-factor(df2$Dom_taxa,levels=top_genera)
segs2$Dom_taxa<-factor(segs2$Dom_taxa,levels=top_genera)
P2<-ggplot(df2, aes(x = Axis.3, y = Axis.4)) +
theme_light(base_size = 14)+
geom_point(aes(fill = Dom_taxa), size = 3, pch = 21, alpha = 0.8, col = "white") + # spiders
scale_fill_manual(values = mypaly[c(-16)])+
scale_color_manual(values = mypaly[c(-16)])+
annotate("text", x = 0, y = 0, label = "X", size = 6)+
geom_point(data = cent2, size = 7, aes(fill = Dom_taxa), col = "black", pch = 21, stroke = 1.5) +
geom_segment(data=df_biofit2, aes(x = 0, y = 0, xend = Axis.3*0.5, yend = Axis.4*0.5),
arrow = arrow(length = unit(0.5, "cm"), type = "closed"), lwd = 0.5, col = "black")+
geom_label(data=as.data.frame(df_biofit2), #position=position_jitter(width=0.01, height = 0.01),#labels
alpha = 0.9, col = "black", size = 4, fill = "yellow",
vjust = c(-0.2, 1, 1, -0.5, 1.5),
hjust = c(1,0,0,0.5,0.7),
aes(Axis.3*0.5,Axis.4*0.5, label = rownames(df_biofit2)))+
xlab("Axis 3 (11.7%)")+
ylab("Axis 4 (7.1%)")+
guides(fill = guide_legend(override.aes = list(size = 7), ncol = 5))+
theme(legend.title = element_blank())+
theme(plot.margin = margin(0.2,0.2,1,0.2, "cm"))+
annotate(geom="text", x=-0.3, y=-0.19, label="Age: R^2 = 0.043, p < 0.001 \nSeason: R^2 = 0.007, p = 0.003",
color="black", size = 5, hjust = 0)
Fig3_1<-ggarrange(P1,P2, ncol = 2, nrow = 1, common.legend = T, legend = "bottom", labels = c("b)","c)"),
font.label = list(size = 20))
ggarrange(Fig3_2,Fig3_1, ncol = 1, common.legend = T, heights = c(1,1.3))+
theme(plot.margin = margin(0.1,0.1,0.1,1, "cm"))
#ggsave("FIGURES/Figure3.pdf", width = 11, height = 12.5, units = "in")
This chunk took 0.06 minutes
scaled_core_genus_0.2_original<- microbiome::core(scaled_core_genus, detection = 0, prevalence =0.16)
scaled_core_genus_0.2<-scaled_core_genus_0.2_original
sample_data(scaled_core_genus_0.2)<-sample_data(scaled_core_genus_0.2)[,c("feature.id", "SampleTimeCat")]
table(sample_data(scaled_core_genus)$SampleTimeCat)
##
## AFTERNOON MORNING
## 366 743
genus0.2_df<-speedyseq::psmelt(scaled_core_genus_0.2)
head(genus0.2_df)
## non-parametric
models_by_genus <- genus0.2_df %>%
group_by(OTU) %>%
do(model1 = tidy(wilcox.test(log10(Abundance+1)~ SampleTimeCat, data = ., exact=FALSE, conf.int=TRUE, conf.level=0.95), conf.int=TRUE)) %>%
gather(model_name, model, -OTU) %>%
unnest()
models_by_genus$p.adj<- p.adjust(models_by_genus$p.value, method = p.adjust.methods, n = length(models_by_genus$p.value))
models_by_genus$Sig<-models_by_genus$p.adj<0.05
models_by_genus_time<-models_by_genus
P_time<-ggplot(models_by_genus_time, aes(y = reorder(OTU, -estimate), x = estimate))+
geom_vline(xintercept = 0, col = "black")+
geom_point(aes(col = Sig), size = 3)+
geom_errorbarh(aes(xmin=conf.low, xmax=conf.high, height = 0, col = Sig), size = 1.5)+
theme_bw(base_size = 16)+
labs(color = "Significant")+
theme(axis.title = element_blank())+
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))+
theme(plot.margin=unit(c(2,0.2,0.2,0.2),"cm"))+
xlim(c(min(models_by_genus_time$conf.low),1.3))+
ggtitle("a) Morning <--> Afternoon")+
theme(plot.title = element_text(hjust = 1, size = 14, face = "bold"))
This chunk took 0.21 minutes
scaled_core_genus_0.2<-scaled_core_genus_0.2_original
sample_data(scaled_core_genus_0.2)<-sample_data(scaled_core_genus_0.2)[,c("feature.id", "Season")]
table(sample_data(scaled_core_genus)$Season)
##
## DRY WINTER WET SUMMER
## 418 691
genus0.2_df<-speedyseq::psmelt(scaled_core_genus_0.2)
head(genus0.2_df)
genus0.2_df$Season<-factor(genus0.2_df$Season, levels = c("WET SUMMER", "DRY WINTER"))
## non-parametric
models_by_genus <- genus0.2_df %>%
group_by(OTU) %>%
do(model1 = tidy(wilcox.test(log10(Abundance+1)~ Season, data = . , exact=TRUE, conf.int=TRUE, conf.level=0.95), conf.int=TRUE)) %>%
gather(model_name, model, -OTU) %>%
unnest()
models_by_genus$p.adj<- p.adjust(models_by_genus$p.value, method = p.adjust.methods, n = length(models_by_genus$p.value))
models_by_genus$Sig<-models_by_genus$p.adj<0.05
models_by_genus_season<-models_by_genus
# positive values higher in wet season
P_season<-ggplot(models_by_genus_season, aes(y = reorder(OTU, -estimate), x = estimate))+
geom_vline(xintercept = 0, col = "black")+
geom_point(aes(col = Sig), size = 3)+
geom_errorbarh(aes(xmin=conf.low, xmax=conf.high, height = 0, col = Sig), size = 1.5)+
theme_bw(base_size = 16)+
theme(axis.title = element_blank())+
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))+
theme(plot.margin=unit(c(2,0.2,0.2,0.2),"cm"))+
xlim(c(min(models_by_genus_time$conf.low), max(models_by_genus_time$conf.high)))+
ggtitle("b) Dry <--> Wet")+
theme(plot.title = element_text(hjust = 1.2, size = 14, face = "bold"))
This chunk took 0.19 minutes
scaled_core_genus_0.2<-scaled_core_genus_0.2_original
## make a comparison group 2-4 years old (to reduce unequal sample sizes)
sample_data(scaled_core_genus_0.2)$AgeCat <- as.factor(ifelse(
sample_data(scaled_core_genus_0.2)$AgeAtSamplingYears >2 & sample_data(scaled_core_genus_0.2)$AgeAtSamplingYears <4,
"ADULT", as.character(sample_data(scaled_core_genus_0.2)$AgeCat)))
table(sample_data(scaled_core_genus_0.2)$AgeCat)
##
## ADULT MIDDLEAGED OLD YOUNG
## 236 391 97 385
###
sample_data(scaled_core_genus_0.2)<-sample_data(scaled_core_genus_0.2)[,c("feature.id", "AgeCat")]
genus0.2_df<-speedyseq::psmelt(scaled_core_genus_0.2)
head(genus0.2_df)
genus0.2_df<-subset(genus0.2_df, AgeCat != "MIDDLEAGED" & AgeCat != "OLD" )
genus0.2_df$AgeCat<-factor(genus0.2_df$AgeCat)
## non-parametric
models_by_genus <- genus0.2_df %>%
group_by(OTU) %>%
do(model1 = tidy(wilcox.test(log10(Abundance+1)~ AgeCat, data = ., exact=FALSE, conf.int=TRUE, conf.level=0.95 ), conf.int=TRUE)) %>%
gather(model_name, model, -OTU) %>%
unnest()
models_by_genus$p.adj<- p.adjust(models_by_genus$p.value, method = p.adjust.methods, n = length(models_by_genus$p.value))
models_by_genus$Sig<-models_by_genus$p.adj<0.05
models_by_genus_young<-models_by_genus
## if positive, more prevalent in adults
P_young<-ggplot(models_by_genus_young, aes(y = reorder(OTU, -estimate), x = estimate))+
geom_vline(xintercept = 0, col = "black")+
geom_point(aes(col = Sig), size = 3)+
geom_errorbarh(aes(xmin=conf.low, xmax=conf.high, height = 0, col = Sig), size = 1.5)+
theme_bw(base_size = 16)+
theme(axis.title = element_blank())+
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))+
theme(plot.margin=unit(c(2,0.2,0.2,0.2),"cm"))+
xlim(c(min(models_by_genus_time$conf.low), 1.3))+
ggtitle("c) Young <--> Adult")+
theme(plot.title = element_text(hjust = 0.9, size = 14, face = "bold"))
This chunk took 0.11 minutes
scaled_core_genus_0.2<-scaled_core_genus_0.2_original
## make a comparison group 2-4 years old (to reduce unequal sample sizes
sample_data(scaled_core_genus_0.2)$AgeCat <- as.factor(ifelse(
sample_data(scaled_core_genus_0.2)$AgeAtSamplingYears >2 & sample_data(scaled_core_genus_0.2)$AgeAtSamplingYears <4,
"ADULT", as.character(sample_data(scaled_core_genus_0.2)$AgeCat)))
table(sample_data(scaled_core_genus_0.2)$AgeCat)
##
## ADULT MIDDLEAGED OLD YOUNG
## 236 391 97 385
###
sample_data(scaled_core_genus_0.2)<-sample_data(scaled_core_genus_0.2)[,c("feature.id", "AgeCat")]
genus0.2_df<-speedyseq::psmelt(scaled_core_genus_0.2)
genus0.2_df<-subset(genus0.2_df, AgeCat != "MIDDLEAGED" & AgeCat != "YOUNG" )
genus0.2_df$AgeCat<-factor(genus0.2_df$AgeCat, levels = c("OLD","ADULT"))
unique(genus0.2_df$AgeCat)
## [1] OLD ADULT
## Levels: OLD ADULT
## non-parametric
models_by_genus <- genus0.2_df %>%
group_by(OTU) %>%
do(model1 = tidy(wilcox.test(log10(Abundance+1)~ AgeCat, data = ., exact=TRUE, conf.int=TRUE, conf.level=0.95 ), conf.int=TRUE)) %>%
gather(model_name, model, -OTU) %>%
unnest()
models_by_genus$p.adj<- p.adjust(models_by_genus$p.value, method = p.adjust.methods, n = length(models_by_genus$p.value))
models_by_genus$Sig<-models_by_genus$p.adj<0.05
models_by_genus_old<-models_by_genus
P_old<-ggplot(models_by_genus_old, aes(y = reorder(OTU, -estimate), x = estimate))+
geom_vline(xintercept = 0, col = "black")+
geom_point(aes(col = Sig), size = 3)+
geom_errorbarh(aes(xmin=conf.low, xmax=conf.high, height = 0, col = Sig), size = 1.5)+
theme_bw(base_size = 16)+
theme(axis.title = element_blank())+
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))+
theme(plot.margin=unit(c(2,0.2,0.2,0.2),"cm"))+
xlim(c(min(models_by_genus_time$conf.low), max(models_by_genus_old$conf.high)))+
ggtitle("d) Adult <--> Old")+
theme(plot.title = element_text(hjust = 0.7, size = 14, face = "bold"))
ggarrange(P_time, P_season, P_young, P_old, ncol = 4, common.legend = T, legend = "right")
#ggsave("FIGURES/wilcox_test_estimates.pdf", width = 20, height = 22, units = "in")
This chunk took 0.11 minutes
This chunk took 0 minutes
# calculate prevalence per genus
genus_info_df<-data.frame(microbiome::prevalence(scaled_core_genus))
names(genus_info_df)<-"Prevalence"
genus_info_df$Genus<-row.names(genus_info_df)
genus_info_df<-genus_info_df[,c(2,1)]
### add mean abundance as column
genus_info_df$Abundance<-taxa_sums(scaled_core_genus)
genus_info_df$RelAbundance<-taxa_sums(scaled_core_genus)/sum(taxa_sums(scaled_core_genus))
####### add dominant taxa as column
dominant_phylo<-scaled_core_genus
sample_data(dominant_phylo)<-sample_data(dominant_phylo)[,"feature.id"]
# melt data
psmelt_df<-speedyseq::psmelt(dominant_phylo)
#head(psmelt_df)
dominant_taxa_df<-psmelt_df %>%
group_by(feature.id) %>%
arrange(feature.id, -Abundance) %>%
dplyr::mutate(rank=rank(-Abundance))
dominant_taxa_df<-subset(dominant_taxa_df, rank ==1)
dominant_taxa_df$Genus<-factor(dominant_taxa_df$Genus)
# add to metadata
sample_data(scaled_core)$DomGenus <-vlookup(sample_data(scaled_core)$feature.id, dominant_taxa_df, lookup_column = "Sample", result_column = "Genus")
dominant_genus_freq<-data.frame(table(dominant_taxa_df$OTU))
names(dominant_genus_freq)<-c("Genus", "Dominant")
genus_info_df$Dominant<-vlookup(genus_info_df$Genus, dominant_genus_freq, lookup_column = "Genus", result_column = "Dominant")
genus_info_df<-genus_info_df[order(-genus_info_df$RelAbundance),]
#head(genus_info_df, 10) # ordered by relative abundance
This chunk took 0.01 minutes
top_genera_candidates<-subset(genus_info_df, Prevalence > 0.6)
top_genera_candidates<-top_genera_candidates[order(-top_genera_candidates$Abundance),]
DT::datatable(top_genera_candidates)
### based on differential abundance analysis, make a list of genera to exclude
top_genera<-c( "Clostridium sensu stricto 1" ,
"Blautia" ,
"Unknown Lachnospiraceae",
"Enterococcus" ,
"[Ruminococcus] torques group",
"Bacteroides",
"Paeniclostridium",
"Romboutsia" ,
"Lachnoclostridium",
"Lactococcus" ,
"Peptococcus",
"Helicobacter" ,
"Unknown Bacillaceae",
"Raoultibacter" ,
"Cellulomonas",
"Eubacterium" ,
"Other" )
top_genera
## [1] "Clostridium sensu stricto 1" "Blautia"
## [3] "Unknown Lachnospiraceae" "Enterococcus"
## [5] "[Ruminococcus] torques group" "Bacteroides"
## [7] "Paeniclostridium" "Romboutsia"
## [9] "Lachnoclostridium" "Lactococcus"
## [11] "Peptococcus" "Helicobacter"
## [13] "Unknown Bacillaceae" "Raoultibacter"
## [15] "Cellulomonas" "Eubacterium"
## [17] "Other"
focus_taxa_df<-genus_info_df[top_genera[-17],]
focus_taxa_df$Prevalence<-round(focus_taxa_df$Prevalence, 3)
focus_taxa_df$RelAbundance<-round(focus_taxa_df$RelAbundance, 3)
row.names(focus_taxa_df)<-1:16
This chunk took 0 minutes
# info on 16 focus genera
DT::datatable(focus_taxa_df)
This chunk took 0 minutes
core_genus <- prune_taxa(top_genera, scaled_core_genus)
paste("Core taxa make up",round(sum(taxa_sums(core_genus))/sum(taxa_sums(meerkat_scaled)),2), "% of reads")
## [1] "Core taxa make up 0.75 % of reads"
This chunk took 0.03 minutes
## make vector of genus names
taxonomy<-data.frame(core_genus@tax_table@.Data)
taxa_names(core_genus)<-taxonomy$Genus
## make vector of variables to include in model
variables_all<-c("HoursAfterSunrise", "month", "AgeAtSamplingYears",
"IndividID", "GroupAtSampling",
"Storage", "Seq_depth_nospikein", "Seq_run", "HoursTilFrozen")
### loop to model each genus seperatly and extract estimates
taxanames<-as.character(taxa_names(core_genus))
This chunk took 0 minutes
base_genus_confint<-list()
for (i in 1:length(taxanames)){
tryCatch({ #catch errors
#print(i)
taxa1<-prune_taxa(taxanames[i], core_genus)
# taxa1<-prune_taxa("Clostridium sensu stricto 1" , core_genus)
sample_data(taxa1)$taxa_i_abundance<-sample_sums(taxa1)
metadata<-data.frame(sample_data(taxa1))
metadata<-metadata[,c(variables_all, "taxa_i_abundance", "Year")]
#scale variables
metadata$Seq_depth_nospikein<-as.numeric(scale(metadata$Seq_depth_nospikein))
# fit gamm
m_taxa <- mgcv::gamm(log10(taxa_i_abundance+1)~
s(HoursAfterSunrise, bs = "cr")+
s(month, bs = "cc")+
s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re"),
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata,
family = gaussian)
#draw(m_taxa$gam, select = c(1:3), ncol = 3, scales = "fixed")+geom_hline(yintercept = 0)
confint_df_time<-confint(m_taxa$gam, parm = "s(HoursAfterSunrise)", type = "confidence")
confint_df_season<-confint(m_taxa$gam, parm = "s(month)", type = "confidence")
confint_df_age<-confint(m_taxa$gam, parm = "s(AgeAtSamplingYears)", type = "confidence")
confint_df_time$Taxa<-taxanames[i]
confint_df_time$Variable<-"HoursAfterSunrise"
names(confint_df_time)[3]<-"Time"
confint_df_time$p.val<-data.frame(summary(m_taxa$gam)$s.table)$p.value[1]
confint_df_season$Taxa<-taxanames[i]
confint_df_season$Variable<-"month"
names(confint_df_season)[3]<-"Time"
confint_df_season$p.val<-data.frame(summary(m_taxa$gam)$s.table)$p.value[2]
confint_df_age$Taxa<-taxanames[i]
confint_df_age$Variable<-"AgeAtSamplingYears"
names(confint_df_age)[3]<-"Time"
confint_df_age$p.val<-data.frame(summary(m_taxa$gam)$s.table)$p.value[3]
confint_df<-rbind(confint_df_time, confint_df_season, confint_df_age)
confint_df$R2<-summary(m_taxa$gam)$r.sq
# convert into real estimate by adding the intercapt (= mean abundance)
confint_df$est_real<-confint_df$est+summary(m_taxa$gam)$p.coeff[1]
confint_df$lower_real<- confint_df$lower+summary(m_taxa$gam)$p.coeff[1]
confint_df$upper_real<- confint_df$upper+summary(m_taxa$gam)$p.coeff[1]
confint_df
base_genus_confint[[i]]<-confint_df
}, error=function(e){})
}
names(base_genus_confint)<-taxanames
base_genus_confint[1]
## $Helicobacter
## # A tibble: 600 x 15
## smooth by_variable Time est se crit lower upper Taxa Variable
## <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 s(Hou~ <NA> 0.225 -0.353 0.267 1.96 -0.876 0.169 Heli~ HoursAf~
## 2 s(Hou~ <NA> 0.296 -0.332 0.256 1.96 -0.834 0.170 Heli~ HoursAf~
## 3 s(Hou~ <NA> 0.366 -0.310 0.246 1.96 -0.792 0.172 Heli~ HoursAf~
## 4 s(Hou~ <NA> 0.437 -0.288 0.236 1.96 -0.750 0.173 Heli~ HoursAf~
## 5 s(Hou~ <NA> 0.507 -0.267 0.226 1.96 -0.709 0.176 Heli~ HoursAf~
## 6 s(Hou~ <NA> 0.578 -0.245 0.216 1.96 -0.669 0.179 Heli~ HoursAf~
## 7 s(Hou~ <NA> 0.649 -0.224 0.207 1.96 -0.629 0.182 Heli~ HoursAf~
## 8 s(Hou~ <NA> 0.719 -0.202 0.198 1.96 -0.590 0.186 Heli~ HoursAf~
## 9 s(Hou~ <NA> 0.790 -0.180 0.190 1.96 -0.552 0.191 Heli~ HoursAf~
## 10 s(Hou~ <NA> 0.860 -0.159 0.181 1.96 -0.515 0.197 Heli~ HoursAf~
## # ... with 590 more rows, and 5 more variables: p.val <dbl>, R2 <dbl>,
## # est_real <dbl>, lower_real <dbl>, upper_real <dbl>
This chunk took 29.15 minutes
confint_summary<-do.call(rbind, base_genus_confint)
confint_summary$Variable<-factor(confint_summary$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))
names(confint_summary)[which(colnames(confint_summary)=="p.val")]<-"p.val.all"
confint_summary$Significant<-confint_summary$p.val.all <0.05
## add abundance for plot
abundance_df<-data.frame(taxa_sums(core_genus))
names(abundance_df)<-"Abundance"
abundance_df$Genus<-row.names(abundance_df)
confint_summary$Abundance<-as.numeric(vlookup(confint_summary$Taxa, abundance_df, lookup_column = "Genus", result_column = "Abundance"))
confint_summary<-data.frame(confint_summary)
confint_summary<-dplyr::arrange(confint_summary, desc(Abundance))
confint_summary$Taxa<-factor(confint_summary$Taxa, levels = rev(top_genera[-17]))
This chunk took 0 minutes
taxanames<-as.character(taxa_names(core_genus))
base_genus_confint_storage<-list()
for (i in 1:length(taxanames)){
tryCatch({ #catch errors
#print(i)
taxa1<-prune_taxa(taxanames[i], core_genus)
# taxa1<-prune_taxa("Clostridium sensu stricto 1" , core_genus)
sample_data(taxa1)$taxa_i_abundance<-sample_sums(taxa1)
metadata<-data.frame(sample_data(taxa1))
metadata<-metadata[,c(variables_all, "taxa_i_abundance", "Year")]
#scale variables
metadata$Seq_depth_nospikein<-as.numeric(scale(metadata$Seq_depth_nospikein))
# fit gamm
m_taxa <- mgcv::gamm(log10(taxa_i_abundance+1)~
s(HoursAfterSunrise, bs = "cr", by = Storage)+
s(month, bs = "cc", by = Storage)+
s(AgeAtSamplingYears, bs = "cr", by = Storage)+
s(Seq_depth_nospikein, bs = "cr", by = Storage)+
Seq_run+
HoursTilFrozen+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re"),
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata,
family = gaussian)
summary(m_taxa$gam)
#draw(m_taxa$gam, select = c(1:3), ncol = 3, scales = "fixed")+geom_hline(yintercept = 0)
confint_df_time<-confint(m_taxa$gam, parm = "s(HoursAfterSunrise)", type = "confidence")
confint_df_season<-confint(m_taxa$gam, parm = "s(month)", type = "confidence")
confint_df_age<-confint(m_taxa$gam, parm = "s(AgeAtSamplingYears)", type = "confidence")
confint_df_time$Taxa<-taxanames[i]
confint_df_time$Variable<-"HoursAfterSunrise"
names(confint_df_time)[3]<-"Time"
### add p value
confint_df_time$p.val<-NA
confint_df_time$p.val[1:200]<-data.frame(summary(m_taxa$gam)$s.table)$p.value[1]
confint_df_time$p.val[201:400]<-data.frame(summary(m_taxa$gam)$s.table)$p.value[2]
confint_df_season$Taxa<-taxanames[i]
confint_df_season$Variable<-"month"
names(confint_df_season)[3]<-"Time"
### add p value
confint_df_season$p.val<-NA
confint_df_season$p.val[1:200]<-data.frame(summary(m_taxa$gam)$s.table)$p.value[3]
confint_df_season$p.val[201:400]<-data.frame(summary(m_taxa$gam)$s.table)$p.value[4]
confint_df_age$Taxa<-taxanames[i]
confint_df_age$Variable<-"AgeAtSamplingYears"
names(confint_df_age)[3]<-"Time"
### add p value
confint_df_age$p.val<-NA
confint_df_age$p.val[1:200]<-data.frame(summary(m_taxa$gam)$s.table)$p.value[5]
confint_df_age$p.val[201:400]<-data.frame(summary(m_taxa$gam)$s.table)$p.value[6]
confint_df<-rbind(confint_df_time, confint_df_season, confint_df_age)
confint_df$R2<-summary(m_taxa$gam)$r.sq
base_genus_confint_storage[[i]]<-confint_df
}, error=function(e){})
}
names(base_genus_confint_storage)<-taxanames
#saveRDS(base_genus_confint_storage, "R_OUTPUT/base_genus_confint_storage.RDS")
This chunk took 51.95 minutes
confint_genus_storage<-do.call(rbind, base_genus_confint_storage)
confint_genus_storage$Variable<-factor(confint_genus_storage$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))
confint_genus_storage$Taxa<-factor(confint_genus_storage$Taxa, levels = top_genera[-17])
storage_model_summary<-distinct(confint_genus_storage, Storage, Taxa, Variable, .keep_all = T )
storage_model_frozen<-subset(storage_model_summary, Storage=="FROZEN")
storage_model_frozen<-storage_model_frozen[,c("Variable","Taxa","p.val")]
dim(confint_summary)
## [1] 9600 17
confint_summary<-merge(confint_summary, storage_model_frozen, by = c("Variable", "Taxa"))
names(confint_summary)[which(colnames(confint_summary)=="p.val")]<-"p.val_frozen"
storage_model_freezedried<-subset(storage_model_summary, Storage=="FREEZEDRIED")
storage_model_freezedried<-storage_model_freezedried[,c("Variable","Taxa","p.val")]
dim(confint_summary)
## [1] 9600 18
confint_summary<-merge(confint_summary, storage_model_freezedried, by = c("Variable", "Taxa"))
names(confint_summary)[which(colnames(confint_summary)=="p.val")]<-"p.val_freezedried"
# add column for whether trend is robust
confint_summary$Sig_robust<-ifelse(confint_summary$p.val_frozen < 0.05 & confint_summary$p.val_freezedried < 0.05, TRUE, FALSE)
This chunk took 0.01 minutes
### rugdata
metadata<-data.frame(sample_data(core_genus))
rugdata<- data.frame(cbind(metadata$HoursAfterSunrise, metadata$month, metadata$AgeAtSamplingYears))
names(rugdata)<-c("HoursAfterSunrise", "month", "AgeAtSamplingYears")
rugdata<-gather(rugdata, Variable, Time, HoursAfterSunrise:AgeAtSamplingYears, factor_key=TRUE)
rugdata$est = 0
rugdata$Variable<-factor(rugdata$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))
##########
confint_summary_fig4b<-confint_summary[,c("Variable", "Taxa", "Time", "est_real","Sig_robust", "Abundance")]
confint_df_load_fig4b<-confint_df_load[,c("Variable", "Time", "est")]
confint_df_load_fig4b$Taxa <- "Bacterial load"
confint_df_load_fig4b$Sig_robust <- TRUE
confint_df_load_fig4b$Abundance <- 43124192 # same abundance of clostridium
confint_df_load_fig4b<-confint_df_load_fig4b[,c(1,4,2,3,5,6)]
names(confint_df_load_fig4b)<-c("Variable", "Taxa", "Time", "est_real","Sig_robust", "Abundance")
confint_df_load_fig4b$est_real<-log10(confint_df_load_fig4b$est_real)
confint_summary_fig4b<-rbind(confint_summary_fig4b,confint_df_load_fig4b )
confint_summary_fig4b$Taxa<-factor(confint_summary_fig4b$Taxa, levels = rev(c(top_genera[-17], "Bacterial load")))
Fig4def<-ggplot(confint_summary_fig4b, aes(x = Time, y = 10^est_real, group = Taxa))+
theme_light(base_size = 14)+
geom_line( aes(alpha = Sig_robust, col = Taxa, size = Abundance))+
#geom_ribbon( aes(ymin = 10^lower_real, ymax = 10^upper_real, fill = Taxa, alpha = Sig_robust, size = 1))+
scale_size(range = c(1, 1.5))+
facet_wrap(~Variable, scales = "free_x")+
geom_rug(data = rugdata, aes(x = Time, y = NULL), inherit.aes = F, size = 0.1)+
scale_color_manual(values = rev(c(mypaly[-17], "darkgrey")), labels = c(top_genera[-17], "Bacterial load") )+
ylab("Estimated abundance")+
scale_x_continuous(breaks = int_breaks)+
theme(panel.grid = element_blank(),
panel.background = element_blank(),
axis.line = element_line(colour = "black"))+
theme(panel.spacing = unit(1.2, "lines"))+
theme(plot.margin=unit(c(0.2,0.2,0.2,0.6),"cm"))+
theme(plot.title = element_text(size=10))+
guides(size=FALSE, abundance = FALSE, alpha = FALSE)+
guides(color = guide_legend(override.aes = list(size = 5)))+
#theme(legend.position = "none")+
theme(strip.background = element_blank(),
strip.text.x = element_blank())+
theme(axis.title.x = element_blank())+
scale_alpha_manual(values = c(0,0.8))+
scale_y_log10(limits = c(0.7,1e6), breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))+
guides(color = guide_legend(override.aes = list( size = 4, col = mypaly,
labels = c(top_genera))))+
labs(color='Genus')
This chunk took 0 minutes
confint_sumamry_fig4a<-confint_summary[,c("Variable", "Taxa", "Time", "est","Sig_robust", "Abundance")]
confint_df_load_fig4a<-confint_df_load[,c("Variable", "Time", "est")]
confint_df_load_fig4a$Taxa <- "Bacterial load"
confint_df_load_fig4a$Sig_robust <- FALSE
confint_df_load_fig4a$Abundance <- 43124192 # same abundance of clostridium
str(confint_df_load_fig4a)
## tibble [600 x 6] (S3: confint.gam/evaluated_1d_smooth/evaluated_smooth/tbl_df/tbl/data.frame)
## $ Variable : Factor w/ 3 levels "HoursAfterSunrise",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Time : num [1:600] 0.225 0.296 0.366 0.437 0.507 ...
## $ est : num [1:600] 106239 107397 108566 109743 110927 ...
## $ Taxa : chr [1:600] "Bacterial load" "Bacterial load" "Bacterial load" "Bacterial load" ...
## $ Sig_robust: logi [1:600] FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Abundance : num [1:600] 43124192 43124192 43124192 43124192 43124192 ...
str(confint_sumamry_fig4a)
## 'data.frame': 9600 obs. of 6 variables:
## $ Variable : Factor w/ 3 levels "HoursAfterSunrise",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ Taxa : Factor w/ 16 levels "Eubacterium",..: 12 12 12 12 12 12 12 12 12 12 ...
## $ Time : num 0.066 0.114 0.162 0.21 0.258 ...
## $ est : num -0.466 -0.436 -0.405 -0.374 -0.344 ...
## $ Sig_robust: logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ Abundance : num 8788264 8788264 8788264 8788264 8788264 ...
confint_df_load_fig4a<-confint_df_load_fig4a[,c(1,4,2,3,5,6)]
names(confint_df_load_fig4a)<-c("Variable", "Taxa", "Time", "est","Sig_robust", "Abundance")
confint_df_load_fig4a$est<-0
confint_sumamry_fig4a<-rbind(confint_sumamry_fig4a,confint_df_load_fig4a )
confint_sumamry_fig4a$Taxa<-factor(confint_sumamry_fig4a$Taxa, levels = rev(c(top_genera[-17], "Bacterial load")))
confint_sumamry_fig4a$est<-ifelse(confint_sumamry_fig4a$Taxa=="Raoultibacter", confint_sumamry_fig4a$est+0.03, confint_sumamry_fig4a$est)
Fig4abc<-ggplot(confint_sumamry_fig4a, aes(x = Time, y = est, group = Taxa))+
theme_light(base_size = 14)+
geom_line( aes(alpha = Sig_robust, col = Taxa, size = Abundance))+
#geom_line( aes( col = Taxa))+
scale_size(range = c(1, 1.5))+
facet_wrap(~Variable, scales = "free_x")+
geom_rug(data = rugdata, aes(x = Time, y = NULL), inherit.aes = F, size = 0.1)+
scale_color_manual(values = rev(c(mypaly[-17], "darkgrey")), labels = c(top_genera[-17], "Bacterial load") )+
ylab("Model estimate (log10)")+
scale_x_continuous(breaks = int_breaks)+
theme(panel.grid = element_blank(),
panel.background = element_blank(),
axis.line = element_line(colour = "black"))+
theme(panel.spacing = unit(1.2, "lines"))+
theme(plot.margin=unit(c(0.2,0.2,0.2,0.4),"cm"))+
theme(plot.title = element_text(size=10))+
guides(size=FALSE, alpha = FALSE, abundance = FALSE)+
theme(strip.background = element_blank(),
strip.text.x = element_blank())+
theme(axis.text.x = element_blank(),
axis.title.x = element_blank())+
geom_hline(yintercept = 0, linetype = "dashed")+
scale_alpha_manual(values = c(0,0.8))+
guides(color = guide_legend(override.aes = list( size = 4, col = mypaly,
labels = c(top_genera))))+
labs(color='Genus')
ggarrange(Fig4abc, Fig4def, ncol = 1, heights = c(1,1), legend = "right", common.legend = T)
#ggsave("FIGURES/figure4.pdf",width = 10, height = 5, units = "in")
This chunk took 0.08 minutes
confint_genus_storage<-do.call(rbind, base_genus_confint_storage)
confint_genus_storage$Variable<-factor(confint_genus_storage$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))
confint_genus_storage$Taxa<-factor(confint_genus_storage$Taxa, levels = top_genera[-17])
confint_genus_storage$Sig <-confint_genus_storage$p.val < 0.05
### rugdata
metadata<-data.frame(sample_data(core_genus))
rugdata<- data.frame(cbind(metadata$HoursAfterSunrise, metadata$month, metadata$AgeAtSamplingYears))
names(rugdata)<-c("HoursAfterSunrise", "month", "AgeAtSamplingYears")
rugdata<-gather(rugdata, Variable, Time, HoursAfterSunrise:AgeAtSamplingYears, factor_key=TRUE)
rugdata$est = 0
rugdata$Variable<-factor(rugdata$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))
### Diurnal
diurnal_storage<-subset(confint_genus_storage, Variable == "HoursAfterSunrise")
head(diurnal_storage)
ggplot(diurnal_storage, aes(x = Time, y = est, group = Storage))+
geom_line(aes(col = Storage, linetype = Sig) ,size = 1)+
geom_ribbon(aes(ymin=lower, ymax=upper, fill = Storage), linetype=2, alpha=0.5)+
facet_wrap(~Taxa, scales = "free", ncol = 4)+
theme_bw(base_size = 14)+
scale_color_manual(values = c("deeppink", "deepskyblue3"))+
scale_fill_manual(values = c( "deeppink", "deepskyblue3"))+
ylab("Model estimate")+
xlab("Hours after sunrise")+
guides(linetype=guide_legend(title="Significant (p < 0.05)"))
#ggsave("FIGURES/FigS4_diurnal.pdf", width = 12, height = 8, units = "in")
#### seasonal
seasonal_storage<-subset(confint_genus_storage, Variable == "month")
ggplot(seasonal_storage, aes(x = Time, y = est, group = Storage))+
geom_line(aes(col = Storage, linetype = Sig) ,size = 1)+
geom_ribbon(aes(ymin=lower, ymax=upper, fill = Storage), linetype=2, alpha=0.5)+
facet_wrap(~Taxa, scales = "free", ncol = 4)+
theme_bw(base_size = 14)+
scale_color_manual(values = c("deeppink", "deepskyblue3"))+
scale_fill_manual(values = c( "deeppink", "deepskyblue3"))+
ylab("Model estimate")+
xlab("Month")+
guides(linetype=guide_legend(title="Significant (p < 0.05)"))+
scale_x_continuous(breaks = c(1,2,3,4,5,6,7,8,9,10, 11, 12 ))
#ggsave("FIGURES/FigS5_seasonal.pdf", width = 12, height = 8, units = "in")
##### age
age_storage<-subset(confint_genus_storage, Variable == "AgeAtSamplingYears")
ggplot(age_storage, aes(x = Time, y = est, group = Storage))+
geom_line(aes(col = Storage, linetype = Sig) ,size = 1)+
geom_ribbon(aes(ymin=lower, ymax=upper, fill = Storage), linetype=2, alpha=0.5)+
facet_wrap(~Taxa, scales = "free", ncol = 4)+
theme_bw(base_size = 14)+
scale_color_manual(values = c("deeppink", "deepskyblue3"))+
scale_fill_manual(values = c( "deeppink", "deepskyblue3"))+
ylab("Model estimate")+
xlab("Age at sampling (years)")+
guides(linetype=guide_legend(title="Significant (p < 0.05)"))+
scale_x_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9,10 ))
#ggsave("FIGURES/FigS6_age.pdf", width = 12, height = 8, units = "in")
This chunk took 0.14 minutes
This chunk took 0 minutes
This chunk took 0 minutes
This section fits base GAMMs (methodological and temporal variables only) and full GAMMs (including biological variables) to all microbial phenotypes, and extracts model summaries.
### add ordination axes
sample_data(core_genus)$MDS1_W<-data.frame(ord_wunifrac_mds$vectors)$Axis.1
sample_data(core_genus)$MDS1_UW<-data.frame(ord_unifrac_mds$vectors)$Axis.1
diversity_names<-c("TotalAbundance", "Observed", "Shannon", "MDS1_W", "MDS1_UW")
taxanames
## [1] "Helicobacter" "Cellulomonas"
## [3] "[Ruminococcus] torques group" "Unknown Lachnospiraceae"
## [5] "Blautia" "Lachnoclostridium"
## [7] "Bacteroides" "Paeniclostridium"
## [9] "Romboutsia" "Enterococcus"
## [11] "Unknown Bacillaceae" "Lactococcus"
## [13] "Clostridium sensu stricto 1" "Peptococcus"
## [15] "Eubacterium" "Raoultibacter"
phenotypes<-c(taxanames, diversity_names)
# make empty list to store results
base_model_summary_list<-list()
for (i in 1:length(phenotypes)){
tryCatch({ #catch errors
#print(i)
metadata<-data.frame(sample_data(core_genus))
# str(metadata)
asvtable<-data.frame(t(otu_table(core_genus)))
asvtable_log<- log10(asvtable+1)
dim(asvtable_log)
names(asvtable_log)<-taxanames
metadata_all<-merge(metadata, asvtable_log, by = 0)
row.names(metadata_all)<-metadata_all$feature.id
#scale variables
metadata_all$Seq_depth_nospikein<-as.numeric(scale(metadata_all$Seq_depth_nospikein))
metadata_all$HoursTilFrozen<-as.numeric(scale(metadata_all$HoursTilFrozen))
metadata_all$TotalAbundance<-log10(metadata_all$TotalAbundance+1) #log10 transform load
metadata_all$Observed<-scale(metadata_all$Observed) # scale observed
metadata_all$Shannon<-scale(metadata_all$Shannon) # scale shannon
# fit gamm
m_taxa <- mgcv::gamm(metadata_all[, phenotypes[i]]~
s(HoursAfterSunrise, bs = "cr")+
s(month, bs = "cc")+
s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re"),
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata_all,
family = gaussian)
summary<- summary(m_taxa$gam)
fixed_terms<-data.frame(extract_fixed(m_taxa$gam))[,c(2,5:7)] ## need to load the function here
row.names(fixed_terms)<-extract_fixed(m_taxa$gam)$term
names(fixed_terms)<-c("Effect", "P.val", "lower_CI", "upper_CI")
smoothed_terms<-data.frame(summary$s.table)
names(smoothed_terms)<-c("lower_CI", "upper_CI", "Effect", "P.val")
smoothed_terms$lower_CI <-0
smoothed_terms$upper_CI <-0
results_df<-rbind(fixed_terms, smoothed_terms)
results_df$R.sq<-summary$r.sq
results_df$Effect_abs<-abs( results_df$Effect)
results_df$Variable<-row.names(results_df)
results_df$Taxa<-phenotypes[i]
base_model_summary_list[[i]]<-results_df
}, error=function(e){})
}
names(base_model_summary_list)<-phenotypes
#saveRDS(base_model_summary_list, "R_OUTPUT/base_models.RDS")
This chunk took 36.93 minutes
base_model_summary<-do.call(rbind, base_model_summary_list)
base_model_summary$Significance<-base_model_summary$P.val < 0.01 # add column = whether significant
base_model_summary<-subset(base_model_summary, Variable !="Intercept") # remove intercept row
base_model_summary$Variable<- factor(base_model_summary$Variable, levels = c( # order variables
"s(HoursAfterSunrise)", "s(month)" , "s(AgeAtSamplingYears)",
"s(IndividID)", "s(GroupAtSampling)",
"StorageFROZEN","s(Seq_depth_nospikein)", "Seq_runRUN2", "Seq_runRUN3","Seq_runRUN4", "HoursTilFrozen"))
This chunk took 0 minutes
# make PS object for frozen and freezedried samples
core_genus_frozen<-subset_samples(core_genus, Storage == "FROZEN")
core_genus_freezedried<-subset_samples(core_genus, Storage == "FREEZEDRIED")
# put in a list
ps_list<-list(core_genus_frozen, core_genus_freezedried)
names(ps_list)<-c("FROZEN", "FREEZEDRIED")
# make empty lists to store results
base_model_storage_i<-list()
base_model_storage_k<-list()
# loops loops loops
for (k in 1:length(ps_list)){
core_genus_storage<- ps_list[[k]]
# core_genus_storage<- ps_list[[2]]
for (i in 1:length(phenotypes)){
tryCatch({ #catch errors
#print(i)
# generate metadata dataframe
metadata<-data.frame(sample_data(core_genus_storage))
asvtable<-data.frame(t(otu_table(core_genus_storage))) # add genus-level abundances to dataframe
asvtable_log<- log10(asvtable+1)
names(asvtable_log)<-taxanames
metadata_all<-merge(metadata, asvtable_log, by = 0)
row.names(metadata_all)<-metadata_all$feature.id
#scale variables
metadata_all$Seq_depth_nospikein<-as.numeric(scale(metadata_all$Seq_depth_nospikein))
metadata_all$HoursTilFrozen<-as.numeric(scale(metadata_all$HoursTilFrozen))
metadata_all$TotalAbundance<-log10(metadata_all$TotalAbundance+1) #log10 transform load
metadata_all$Observed<-scale(metadata_all$Observed) # scale observed
metadata_all$Shannon<-scale(metadata_all$Shannon) # scale shannon
# fit gamm
m_taxa <- mgcv::gamm(metadata_all[, phenotypes[i]]~
# m_taxa <- mgcv::gamm(metadata_all[, phenotypes[13]]~
s(HoursAfterSunrise, bs = "cr")+
s(month, bs = "cc")+
s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Seq_run+
HoursTilFrozen+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re"),
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata_all,
family = gaussian)
# extract summary statistics
summary<- summary(m_taxa$gam)
fixed_terms<-data.frame(extract_fixed(m_taxa$gam))[,c(2,5:7)] ## need to load the function here
row.names(fixed_terms)<-extract_fixed(m_taxa$gam)$term
names(fixed_terms)<-c("Effect", "P.val", "lower_CI", "upper_CI")
smoothed_terms<-data.frame(summary$s.table)
names(smoothed_terms)<-c("lower_CI", "upper_CI", "Effect", "P.val")
smoothed_terms$lower_CI <-0
smoothed_terms$upper_CI <-0
results_df<-rbind(fixed_terms, smoothed_terms)
results_df$R.sq<-summary$r.sq
results_df$Effect_abs<-abs( results_df$Effect)
results_df$Variable<-row.names(results_df)
results_df$Taxa<-phenotypes[i]
results_df$Type <- as.character(metadata_all$Storage[1])
base_model_storage_i[[i]]<-results_df
}, error=function(e){})
}
base_model_storage_df<-do.call(rbind, base_model_storage_i)
base_model_storage_k[[k]] <-base_model_storage_df
}
base_model_storage_k_df<-do.call(rbind, base_model_storage_k)
#saveRDS(base_model_storage_k_df, "R_OUTPUT/base_models_by_storage.RDS")
This chunk took 21.31 minutes
base_model_summary_frozen<-subset(base_model_storage_k_df, Type == "FROZEN")
base_model_summary_freezedried<-subset(base_model_storage_k_df, Type == "FREEZEDRIED")
frozen_df<-base_model_summary_frozen[,c("Taxa", "Variable", "P.val")]
names(frozen_df)[3]<-"P.val_frozen"
freezedried_df<-base_model_summary_freezedried[,c("Taxa", "Variable", "P.val")]
names(freezedried_df)[3]<-"P.val_freezedried"
base_model_summary1<- merge(base_model_summary, frozen_df, by = c("Taxa", "Variable"))
base_model_summary2<- merge(base_model_summary1, freezedried_df, by = c("Taxa", "Variable"))
base_model_summary2$Sig_robust<- base_model_summary2$P.val_frozen < 0.05 & base_model_summary2$P.val_freezedried < 0.05
This chunk took 0 minutes
# make empty list to store results
full_model_summary_list<-list()
for (i in 1:length(phenotypes)){
tryCatch({ #catch errors
#print(i)
metadata<-data.frame(sample_data(core_genus))
# str(metadata)
asvtable<-data.frame(t(otu_table(core_genus)))
asvtable_log<- log10(asvtable+1)
head(asvtable_log)
names(asvtable_log)<-taxanames
metadata_all<-merge(metadata, asvtable_log, by = 0)
row.names(metadata_all)<-metadata_all$feature.id
#scale variables
metadata_all$Seq_depth_nospikein<-as.numeric(scale(metadata_all$Seq_depth_nospikein))
metadata_all$HoursTilFrozen<-as.numeric(scale(metadata_all$HoursTilFrozen))
metadata_all$TotalAbundance<-log10(metadata_all$TotalAbundance+1) #log10 transform load
metadata_all$Observed<-scale(metadata_all$Observed) # scale observed
metadata_all$Shannon<-scale(metadata_all$Shannon) # scale shannon
metadata_all$Condition_weight<-as.numeric(scale(metadata_all$Condition_weight))
metadata_all$Temp_min<-as.numeric(scale(metadata_all$Temp_min))
metadata_all$Temp_max<-as.numeric(scale(metadata_all$Temp_max))
metadata_all$HourlyTemp<-as.numeric(scale(metadata_all$HourlyTemp))
metadata_all$HoursSinceForagingStart<-as.numeric(scale(log10(metadata_all$HoursSinceForagingStart+0.1)))
metadata_all$sum_rainfall_month<-as.numeric(scale(log10(metadata_all$sum_rainfall_month+0.1)))
metadata_all$SocialStatus<-factor(metadata_all$SocialStatus)
# fit gamm minus temporal variables
m_taxa <- mgcv::gamm(metadata_all[, phenotypes[i]]~
# s(HoursAfterSunrise, bs = "cr")+
# s(month, bs = "cc")+
# s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re")+
HoursTilFrozen+
Sex+
SocialStatus+
Condition_weight+
HoursSinceForagingStart+
HourlyTemp+
Temp_max+
Temp_min+
sum_rainfall_month,
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata_all,
family = gaussian)
summary<- summary(m_taxa$gam)
fixed_terms<-data.frame(extract_fixed(m_taxa$gam))[,c(2,5:7)] ## need to load the function here
row.names(fixed_terms)<-extract_fixed(m_taxa$gam)$term
names(fixed_terms)<-c("Effect", "P.val", "lower_CI", "upper_CI")
smoothed_terms<-data.frame(summary$s.table)
names(smoothed_terms)<-c("lower_CI", "upper_CI", "Effect", "P.val")
smoothed_terms$lower_CI <-0
smoothed_terms$upper_CI <-0
results_df<-rbind(fixed_terms, smoothed_terms)
results_df$R.sq<-summary$r.sq
results_df$Effect_abs<-abs( results_df$Effect)
results_df$Variable<-row.names(results_df)
results_df$Taxa<-phenotypes[i]
#### model with only methods
m_taxa_methods <- mgcv::gamm(metadata_all[, phenotypes[i]]~
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen,
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata_all,
family = gaussian)
results_df$R.sq_methods<-summary(m_taxa_methods$gam)$r.sq
### model with all terms
m_taxa_full <- mgcv::gamm(metadata_all[, phenotypes[i]]~
s(HoursAfterSunrise, bs = "cr")+
s(month, bs = "cc")+
s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re")+
HoursTilFrozen+
Sex+
SocialStatus+
Condition_weight+
HoursSinceForagingStart+
HourlyTemp+
Temp_max+
Temp_min+
sum_rainfall_month,
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata_all,
family = gaussian)
results_df$R.sq_full<-summary(m_taxa_full$gam)$r.sq
full_model_summary_list[[i]]<-results_df
}, error=function(e){})
}
names(full_model_summary_list)<-phenotypes
#saveRDS(full_model_summary_list, "R_OUTPUT/full_models.RDS")
This chunk took 62.82 minutes
full_model_summary<-do.call(rbind, full_model_summary_list)
full_model_summary$Significance<-full_model_summary$P.val < 0.01
full_model_summary<-subset(full_model_summary, Variable !="Intercept")
full_model_summary$Variable<- factor(full_model_summary$Variable, levels = c(
"s(HoursAfterSunrise)", "s(month)" , "s(AgeAtSamplingYears)",
"s(IndividID)", "s(GroupAtSampling)",
"sum_rainfall_month", "Temp_max", "Temp_min", "HourlyTemp",
"SexM", "Condition_weight", "SocialStatusSubordinate", "HoursSinceForagingStart",
"StorageFROZEN","s(Seq_depth_nospikein)", "Seq_runRUN2", "Seq_runRUN3","Seq_runRUN4", "HoursTilFrozen"))
This chunk took 0 minutes
full_model_storage_i<-list()
full_model_storage_k<-list()
for (k in 1:length(ps_list)){
core_genus_storage<- ps_list[[k]]
# core_genus_storage<- ps_list[[1]]
for (i in 1:length(phenotypes)){
tryCatch({ #catch errors
# #print(i)
metadata<-data.frame(sample_data(core_genus_storage))
# str(metadata)
asvtable<-data.frame(t(otu_table(core_genus_storage)))
asvtable_log<- log10(asvtable+1)
dim(asvtable_log)
names(asvtable_log)<-taxanames
metadata_all<-merge(metadata, asvtable_log, by = 0)
row.names(metadata_all)<-metadata_all$feature.id
#scale variables
metadata_all$Seq_depth_nospikein<-as.numeric(scale(metadata_all$Seq_depth_nospikein))
metadata_all$HoursTilFrozen<-as.numeric(scale(metadata_all$HoursTilFrozen))
metadata_all$TotalAbundance<-log10(metadata_all$TotalAbundance+1) #log10 transform load
metadata_all$Observed<-scale(metadata_all$Observed) # scale observed
metadata_all$Shannon<-scale(metadata_all$Shannon) # scale shannon
metadata_all$Condition_weight<-as.numeric(scale(metadata_all$Condition_weight))
metadata_all$Temp_min<-as.numeric(scale(metadata_all$Temp_min))
metadata_all$Temp_max<-as.numeric(scale(metadata_all$Temp_max))
metadata_all$HourlyTemp<-as.numeric(scale(metadata_all$HourlyTemp))
metadata_all$HoursSinceForagingStart<-as.numeric(scale(log10(metadata_all$HoursSinceForagingStart+0.1)))
metadata_all$sum_rainfall_month<-as.numeric(scale(log10(metadata_all$sum_rainfall_month+0.1)))
metadata_all$SocialStatus<-factor(metadata_all$SocialStatus)
# fit gamm
m_taxa <- mgcv::gamm(metadata_all[, phenotypes[i]]~
# s(HoursAfterSunrise, bs = "cr")+
#s(month, bs = "cc")+
#s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
#Storage+
Seq_run+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re")+
HoursTilFrozen+
Sex+
SocialStatus+
Condition_weight+
HoursSinceForagingStart+
HourlyTemp+
Temp_max+
Temp_min+
sum_rainfall_month,
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata_all,
family = gaussian)
summary<- summary(m_taxa$gam)
fixed_terms<-data.frame(extract_fixed(m_taxa$gam))[,c(2,5:7)] ## need to load the function here
row.names(fixed_terms)<-extract_fixed(m_taxa$gam)$term
names(fixed_terms)<-c("Effect", "P.val", "lower_CI", "upper_CI")
smoothed_terms<-data.frame(summary$s.table)
names(smoothed_terms)<-c("lower_CI", "upper_CI", "Effect", "P.val")
smoothed_terms$lower_CI <-0
smoothed_terms$upper_CI <-0
results_df<-rbind(fixed_terms, smoothed_terms)
results_df$R.sq<-summary$r.sq
results_df$Effect_abs<-abs( results_df$Effect)
results_df$Variable<-row.names(results_df)
results_df$Taxa<-phenotypes[i]
results_df$Type <- names(ps_list[k])
full_model_storage_i[[i]]<-results_df
}, error=function(e){})
}
full_model_storage_df<-do.call(rbind, full_model_storage_i)
full_model_storage_k[[k]] <-full_model_storage_df
}
full_model_storage_k_df<-do.call(rbind, full_model_storage_k)
#saveRDS(full_model_storage_k_df, "R_OUTPUT/full_models_by_storage.RDS")
#table(full_model_storage_k_df$Taxa)
This chunk took 12.03 minutes
full_model_summary_frozen<-subset(full_model_storage_k_df, Type == "FROZEN")
full_model_summary_freezedried<-subset(full_model_storage_k_df, Type == "FREEZEDRIED")
frozen_df<-full_model_summary_frozen[,c("Taxa", "Variable", "P.val")]
names(frozen_df)[3]<-"P.val_frozen"
freezedried_df<-full_model_summary_freezedried[,c("Taxa", "Variable", "P.val")]
names(freezedried_df)[3]<-"P.val_freezedried"
full_model_summary1<- merge(full_model_summary, frozen_df, by = c("Taxa", "Variable"))
full_model_summary2<- merge(full_model_summary1, freezedried_df, by = c("Taxa", "Variable"))
full_model_summary2$Sig_robust<- full_model_summary2$P.val_frozen < 0.05 & full_model_summary2$P.val_freezedried < 0.05
#full_model_summary2
This chunk took 0 minutes
#full_model_df$R.sq_partial<-full_model_df$R.sq - full_model_df$R.sq_methods
full_model_summary2$r2_fixed <-full_model_summary2$R.sq - full_model_summary2$R.sq_methods
full_model_summary2$r2_smooth <-full_model_summary2$R.sq_full - full_model_summary2$R.sq
### order taxa
full_model_summary2$Taxa <- factor(full_model_summary2$Taxa, levels = c(rev(top_genera[-17]), rev(diversity_names)))
unique(full_model_summary2$Taxa)
## [1] [Ruminococcus] torques group Bacteroides
## [3] Blautia Cellulomonas
## [5] Clostridium sensu stricto 1 Enterococcus
## [7] Eubacterium Helicobacter
## [9] Lachnoclostridium Lactococcus
## [11] MDS1_UW MDS1_W
## [13] Observed Paeniclostridium
## [15] Peptococcus Raoultibacter
## [17] Romboutsia Shannon
## [19] TotalAbundance Unknown Bacillaceae
## [21] Unknown Lachnospiraceae
## 21 Levels: Eubacterium Cellulomonas Raoultibacter ... TotalAbundance
#full_model_summary2$Taxa <- factor(full_model_summary2$Taxa, levels = c(rev(top_genera[-16])))
### add third significance group
full_model_summary2$Sig_robust <-ifelse(full_model_summary2$Sig_robust == FALSE & full_model_summary2$P.val < 0.05, "Overall significant", as.character(full_model_summary2$Sig_robust))
full_model_df<-full_model_summary2
## base models
base_model_summary2$Taxa <- factor(base_model_summary2$Taxa, levels = c(rev(top_genera[-17]), rev(diversity_names)))
### add third significance group
base_model_summary2$Sig_robust <-ifelse(base_model_summary2$Sig_robust == FALSE & base_model_summary2$P.val < 0.05, "Overall significant", as.character(base_model_summary2$Sig_robust))
base_model_summary_df<-base_model_summary2
This chunk took 0 minutes
fixed_df<-subset(full_model_df,
# Variable != "s(HoursAfterSunrise)" &
# Variable != "s(month)" &
# Variable != "s(AgeAtSamplingYears)"&
Variable != "s(IndividID)"&
Variable != "s(GroupAtSampling)" &
Variable != "s(Seq_depth_nospikein)"&
Variable != "Storage" &
Variable != "Seq_runRUN2" &
Variable != "Seq_runRUN3" &
Variable != "Seq_runRUN4" &
Variable != "StorageFROZEN" &
Variable != "HoursTilFrozen")
fixed_df$Variable<-factor(fixed_df$Variable)
unique(fixed_df$Variable)
## [1] Condition_weight HourlyTemp HoursSinceForagingStart
## [4] SexM SocialStatusSubordinate sum_rainfall_month
## [7] Temp_max Temp_min
## 8 Levels: sum_rainfall_month Temp_max Temp_min HourlyTemp ... HoursSinceForagingStart
levels(fixed_df$Variable) <- c("Rainfall", "Max. temp.", "Min. temp.",
"Sampling temp.",
"Sex (Male)", "Body condition", "Social status (Sub.)", "Time foraging")
################
smooth_df<-subset(base_model_summary_df, Variable == "s(HoursAfterSunrise)" | Variable == "s(month)" | Variable == "s(AgeAtSamplingYears)")
smooth_df$Variable<-factor(smooth_df$Variable, levels = c("s(HoursAfterSunrise)", "s(month)", "s(AgeAtSamplingYears)"))
levels(smooth_df$Variable) <-c("Hours after sunrise", "Month", "Age (years)")
P1<- ggplot(smooth_df, aes(x = Effect, y = Taxa, fill = Sig_robust))+
theme_bw(base_size = 12)+
#geom_errorbarh(aes(xmin=upper_CI, xmax=lower_CI, height = 0), alpha = 0.5, col = "black", size = 1)+
geom_point( pch = 21, alpha = 0.9, col = "black", size = 2)+
facet_wrap(~Variable, nrow = 1, labeller = label_wrap_gen(width=10))+
scale_size(range = c(1.5,4))+
scale_fill_manual(values = c("darkgray" , "deepskyblue", "blue"))+
scale_color_manual(values = c("darkgray", "deepskyblue", "blue"))+
theme(legend.position = "none") +
theme(axis.title = element_blank())+
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))+
theme( # remove the vertical grid lines
panel.grid.major.x = element_blank() ,
panel.grid.minor.x = element_blank() ,
# explicitly set the horizontal lines (or they will disappear too)
panel.grid.major.y = element_line( size=.05, color="lightgrey" )) +
theme( plot.margin = margin(0.2, 0.1,0.5,0, "cm"))
P2<-ggplot(fixed_df, aes(x = Effect, y = Taxa, fill = Sig_robust))+
theme_bw(base_size = 12)+
geom_vline(xintercept = 0, alpha = 0.3, size = 0.3)+
geom_errorbarh(aes(xmax=upper_CI+0.08, xmin=lower_CI-0.08, height = 0, col = Sig_robust), size = 1)+
geom_point(size = 2, pch = 21, alpha = 0.9, col = "black")+
facet_wrap(~Variable, nrow = 1, labeller = label_wrap_gen(width=10), scales = "free_x")+
scale_size(range = c(1.5,3))+
scale_fill_manual(values = c("darkgray","pink" , "red"))+
scale_color_manual(values = c("darkgray","pink" , "red"))+
theme(legend.position = "none")+
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())+
theme(axis.title = element_blank())+
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))+
theme( # remove the vertical grid lines
panel.grid.major.x = element_blank() ,
panel.grid.minor.x = element_blank() ,
# explicitly set the horizontal lines (or they will disappear too)
panel.grid.major.y = element_line( size=.05, color="lightgrey" ))
## Plot R2 contributions of methods and biological variables
r2_partial_df<-distinct(full_model_df, Taxa, .keep_all = T)
head(r2_partial_df)
R2_df<-gather(r2_partial_df, Type, R2, c("R.sq_methods","r2_fixed", "r2_smooth"), factor_key=TRUE)
R2_df$R2<-ifelse(R2_df$R2 <0, 0, R2_df$R2)
P3<-ggplot(R2_df, aes(x = R2, y = Taxa, fill = Type))+
geom_bar(stat = "identity")+
scale_fill_manual(values = c("grey", "red", "dodgerblue"))+
theme_bw()+
theme(axis.text.y = element_blank(),
axis.title.y = element_blank())+
theme(legend.position = "none")+
theme( plot.margin = margin(1.6, 0.2,0.4,0.1, "cm"))+
# geom_vline(xintercept = 0.1, linetype = "dashed", size = 1)+
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
axis.title.x = element_blank(),
panel.grid = element_blank())
ggarrange(P1,P2,P3, ncol = 3, widths = c(3.5,5,1.5))
#ggsave("FIGURES/Figure5.pdf", width = 13, height = 6, units = "in")
This chunk took 0.05 minutes
taxanames<-as.character(taxa_names(core_genus))
confint_list_age<-list()
#model_stats_list<-list()
for (i in 1:length(taxanames)){
tryCatch({ #catch errors
#print(i)
#taxa1<-prune_taxa("Lachnoclostridium", core_genus)
taxa1<-prune_taxa(taxanames[i], core_genus)
sample_data(taxa1)$taxa_i_abundance<-sample_sums(taxa1)
metadata<-data.frame(sample_data(taxa1))
metadata$Seq_depth_nospikein<-as.numeric(scale(metadata$Seq_depth_nospikein))
m_taxa <- mgcv::gamm(log10(taxa_i_abundance+1)~
s(HoursAfterSunrise, bs = "cr", by = AgeCat)+
s(month, bs = "cc")+
# s(AgeAtSamplingYears, bs = "cr")+
s(Seq_depth_nospikein, bs = "cr")+
Storage+
Seq_run+
HoursTilFrozen+
s(IndividID, bs = "re")+
s(GroupAtSampling, bs = "re"),
correlation = corARMA(form = ~ 1|Year, p = 3),
data=metadata,
family = gaussian)
confint_df_time<-confint(m_taxa$gam, parm = "s(HoursAfterSunrise)", type = "confidence")
confint_df_time$Taxa<-taxanames[i]
names(confint_df_time)[3]<-"Time"
confint_list_age[[i]]<-confint_df_time
}, error=function(e){})
}
names(confint_list_age)<-taxanames
This chunk took 31.19 minutes
confint_summary_age<-do.call(rbind, confint_list_age)
unique(confint_summary_age$Taxa)
## [1] "Helicobacter" "Cellulomonas"
## [3] "[Ruminococcus] torques group" "Unknown Lachnospiraceae"
## [5] "Blautia" "Lachnoclostridium"
## [7] "Bacteroides" "Paeniclostridium"
## [9] "Romboutsia" "Enterococcus"
## [11] "Unknown Bacillaceae" "Lactococcus"
## [13] "Clostridium sensu stricto 1" "Peptococcus"
## [15] "Eubacterium" "Raoultibacter"
unique(confint_summary_age$AgeCat)
## [1] MIDDLEAGED OLD YOUNG
## Levels: MIDDLEAGED OLD YOUNG
confint_summary_age$AgeCat<-factor(confint_summary_age$AgeCat, levels = c("YOUNG", "MIDDLEAGED", "OLD"))
levels(confint_summary_age$AgeCat)<-c("Young", "Adult", "Old")
abundance_df<-data.frame(taxa_sums(core_genus))
names(abundance_df)<-"Abundance"
abundance_df$Genus<-row.names(abundance_df)
confint_summary_age$Abundance<-vlookup(confint_summary_age$Taxa, abundance_df, lookup_column = "Genus", result_column = "Abundance")
confint_summary_age<-arrange(confint_summary_age, desc(Abundance))
confint_summary_age$Taxa<-factor(confint_summary_age$Taxa, levels = rev(top_genera[-17]))
This chunk took 0 minutes
sample_data(scaled_core)$AgeCat<-factor(sample_data(scaled_core)$AgeCat, levels = c("YOUNG", "MIDDLEAGED", "OLD"))
levels(sample_data(scaled_core)$AgeCat)<-c("Young", "Adult", "Old")
age_histogram<-ggplot(sample_data(scaled_core), aes(x = HoursAfterSunrise))+
geom_histogram(fill = "grey", col = "black", binwidth = 0.5)+
facet_wrap(~AgeCat, scales = "free_y")+
theme_minimal(base_size = 14)+
theme( strip.background = element_blank(),
strip.text.x = element_blank())+
theme(plot.margin = unit(c(0.2,7.4, 0,0.3), "cm"))+
theme(axis.title.x = element_blank(),
axis.text.x = element_blank())+
ylab("Frequency")+
theme( # remove the vertical grid lines
panel.grid.major.x = element_blank() ,
panel.grid.minor.x = element_blank() ,
# explicitly set the horizontal lines (or they will disappear too)
panel.grid.major.y = element_line( size=.05, color="lightgrey" ))
####################
confint_summary_age_sig<-subset(confint_summary_age, Taxa != "Enterococcus" & Taxa != "Lactococcus" & Taxa != "Eubacterium")
unique(confint_summary_age$Taxa)
## [1] Clostridium sensu stricto 1 Blautia
## [3] Unknown Lachnospiraceae Bacteroides
## [5] [Ruminococcus] torques group Enterococcus
## [7] Paeniclostridium Romboutsia
## [9] Lachnoclostridium Lactococcus
## [11] Peptococcus Helicobacter
## [13] Unknown Bacillaceae Cellulomonas
## [15] Raoultibacter Eubacterium
## 16 Levels: Eubacterium Cellulomonas Raoultibacter ... Clostridium sensu stricto 1
Fig6abc<-ggplot(confint_summary_age_sig, aes(x = Time, y = est, col = Taxa))+
facet_wrap(~AgeCat)+
geom_line( aes(col = Taxa, size = Abundance))+
scale_size(range = c(1, 2))+
ylim(-1.6,1.1)+
scale_color_manual(values = rev(mypaly[c(-4, -10, -16, -17)]), labels = top_genera[c(-4, -10, -16, -17)])+
theme_bw(base_size = 14)+
ylab("Model estimate (log10)")+
xlab("Hours after sunrise")+
theme(plot.title = element_text(size=10))+
guides(size=FALSE)+
#guides(color = guide_legend(override.aes = list(size = 3))) +
theme(strip.background = element_blank(),
strip.text.x = element_blank())+
# theme(legend.position = "none")+
geom_hline(yintercept = 0, linetype = "dashed") +
guides(color = guide_legend(override.aes = list( size = 4, col = mypaly[c(-4, -10, -16, -17)],
labels = top_genera[c(-4, -10, -16, -17)])))+
labs(color='Genus')+
theme(plot.margin = unit(c(0,0.2, 0.2,0.3), "cm"))
Fig6top<-ggarrange(age_histogram, Fig6abc, ncol = 1, heights = c(0.5,2))
This chunk took 0.02 minutes
clostridium_age<-subset(confint_summary_age, Taxa == "Clostridium sensu stricto 1")
clostridium<-ggplot(clostridium_age, aes(x = Time, y = est, col = AgeCat))+
geom_line(size = 1.5)+
geom_ribbon(aes(ymin=lower, ymax=upper, fill = AgeCat), linetype=2, alpha=0.1)+
coord_cartesian(ylim = c(-2,1.5))+
scale_color_manual(values = c( "blue", "darkgrey", "red"))+
scale_fill_manual(values = c( "blue", "darkgrey", "red"))+
theme_bw(base_size = 14)+
ylab("Model estimate (log10)")+
xlab("")+
# theme(legend.position = "none")+
ggtitle("d) Clostridium")+
theme(plot.title = element_text(size=12))+
geom_hline(yintercept = 0, linetype = "dashed")+
labs(col='Age category')+
guides(fill = FALSE)
bacteroides_age<-subset(confint_summary_age, Taxa == "Bacteroides")
bacteroides<-ggplot(bacteroides_age, aes(x = Time, y = est, col = AgeCat))+
geom_line(size = 1.5)+
geom_ribbon(aes(ymin=lower, ymax=upper, fill = AgeCat), linetype=2, alpha=0.1)+
coord_cartesian(ylim = c(-2,1.5))+
scale_color_manual(values = c( "blue", "darkgrey", "red"))+
scale_fill_manual(values = c( "blue", "darkgrey", "red"))+
theme_bw(base_size = 14)+
theme(axis.title.y = element_blank(), axis.text.y = element_blank())+
xlab("")+
# theme(legend.position = "none")+
ggtitle("e) Bacteroides")+
theme(plot.title = element_text(size=12)) +
geom_hline(yintercept = 0, linetype = "dashed")+
labs(col='Age category')+
guides(fill = FALSE)
blautia_age<-subset(confint_summary_age, Taxa == "Blautia")
blautia<-ggplot(blautia_age, aes(x = Time, y = est, col = AgeCat))+
geom_line(size = 1.5)+
geom_ribbon(aes(ymin=lower, ymax=upper, fill = AgeCat), linetype=2, alpha=0.1)+
coord_cartesian(ylim = c(-2,1.5))+
scale_color_manual(values = c( "blue", "darkgrey", "red"))+
scale_fill_manual(values = c( "blue", "darkgrey", "red"))+
theme_bw(base_size = 14)+
theme(axis.title.y = element_blank(), axis.text.y = element_blank())+
xlab("Hours after sunrise")+
# theme(legend.position = "none")+
ggtitle("f) Blautia")+
theme(plot.title = element_text(size=12))+
geom_hline(yintercept = 0, linetype = "dashed")+
labs(col='Age category')+
guides(fill = FALSE)
ruminococcus_age<-subset(confint_summary_age, Taxa == "[Ruminococcus] torques group")
ruminococcus<-ggplot(ruminococcus_age, aes(x = Time, y = est, col = AgeCat))+
geom_line(size = 1.5)+
geom_ribbon(aes(ymin=lower, ymax=upper, fill = AgeCat), linetype=2, alpha=0.1)+
coord_cartesian(ylim = c(-2,1.5))+
scale_color_manual(values = c( "blue", "darkgrey", "red"))+
scale_fill_manual(values = c( "blue", "darkgrey", "red"))+
theme_bw(base_size = 14)+
theme(axis.title.y = element_blank(), axis.text.y = element_blank())+
xlab("Hours after sunrise")+
# theme(legend.position = "none")+
ggtitle("f) Ruminococcus")+
theme(plot.title = element_text(size=12))+
geom_hline(yintercept = 0, linetype = "dashed")+
labs(col='Age category')+
guides(fill = FALSE)
paeniclostridium_age<-subset(confint_summary_age, Taxa == "Paeniclostridium")
paeniclostridium<-ggplot(paeniclostridium_age, aes(x = Time, y = est, col = AgeCat))+
geom_line(size = 1.5)+
geom_ribbon(aes(ymin=lower, ymax=upper, fill = AgeCat), linetype=2, alpha=0.1)+
coord_cartesian(ylim = c(-2,1.5))+
scale_color_manual(values = c( "blue", "darkgrey", "red"))+
scale_fill_manual(values = c( "blue", "darkgrey", "red"))+
theme_bw(base_size = 14)+
theme(axis.title.y = element_blank(), axis.text.y = element_blank())+
xlab("")+
# theme(legend.position = "none")+
ggtitle("f) Paeniclostridium")+
theme(plot.title = element_text(size=12))+
geom_hline(yintercept = 0, linetype = "dashed")+
labs(col='Age category')+
guides(fill = FALSE)
cellulomonas_age<-subset(confint_summary_age, Taxa == "Cellulomonas")
cellulomonas<-ggplot(cellulomonas_age, aes(x = Time, y = est, col = AgeCat))+
geom_line(size = 1.5)+
geom_ribbon(aes(ymin=lower, ymax=upper, fill = AgeCat), linetype=2, alpha=0.1)+
coord_cartesian(ylim = c(-2,1.5))+
scale_color_manual(values = c( "blue", "darkgrey", "red"))+
scale_fill_manual(values = c( "blue", "darkgrey", "red"))+
theme_bw(base_size = 14)+
theme(axis.title.y = element_blank(), axis.text.y = element_blank())+
xlab("")+
# theme(legend.position = "none")+
ggtitle("g) Cellulomonas")+
theme(plot.title = element_text(size=12))+
geom_hline(yintercept = 0, linetype = "dashed")+
labs(col='Age category')+
guides(fill = FALSE)
raoultibacter_age<-subset(confint_summary_age, Taxa == "Raoultibacter")
raoultibacter<-ggplot(raoultibacter_age, aes(x = Time, y = est, col = AgeCat))+
geom_line(size = 1.5)+
geom_ribbon(aes(ymin=lower, ymax=upper, fill = AgeCat), linetype=2, alpha=0.1)+
coord_cartesian(ylim = c(-2,1.5))+
scale_color_manual(values = c( "blue", "darkgrey", "red"))+
scale_fill_manual(values = c( "blue", "darkgrey", "red"))+
theme_bw(base_size = 14)+
theme(axis.title.y = element_blank(), axis.text.y = element_blank())+
xlab("")+
# theme(legend.position = "none")+
ggtitle("h) Raoultibacter")+
theme(plot.title = element_text(size=12))+
geom_hline(yintercept = 0, linetype = "dashed")+
labs(col='Age category')+
guides(fill = FALSE)
Fig6def<-ggarrange(clostridium, bacteroides,paeniclostridium, cellulomonas, raoultibacter, ncol = 5, common.legend = TRUE, legend = "right", widths = c(1.2,1,1,1,1))+
theme(plot.margin = unit(c(0.2,3.5,0.2,0.2), "cm"))
ggarrange(Fig6top, Fig6def, ncol = 1, heights = c(1.5,1))
#ggsave("FIGURES/Figure6.pdf", width = 13, height = 8, units = "in")
This chunk took 0.09 minutes
meerkat_replicates<-readRDS("technical_replicate_data_phyloseq.RDS")
meerkat_replicates
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 3739 taxa and 32 samples ]:
## sample_data() Sample Data: [ 32 samples by 6 sample variables ]:
## tax_table() Taxonomy Table: [ 3739 taxa by 7 taxonomic ranks ]:
## phy_tree() Phylogenetic Tree: [ 3739 tips and 3738 internal nodes ]:
## taxa are rows
### add total abundance of reads to dataframe
sample_data(meerkat_replicates)$TotalAbundance<-sample_sums(meerkat_replicates)
# estimate alpha diversity
sample_data(meerkat_replicates)$Observed<-phyloseq::estimate_richness(meerkat_replicates, measures="Observed")
sample_data(meerkat_replicates)$Observed<-sample_data(meerkat_replicates)$Observed$Observed
sample_data(meerkat_replicates)$Shannon<-phyloseq::estimate_richness(meerkat_replicates, measures="Shannon")
sample_data(meerkat_replicates)$Shannon<-sample_data(meerkat_replicates)$Shannon$Shannon
####### beta diversity
# weighted unifrac
ord_wunifrac_mds<-ordinate(meerkat_replicates, "MDS", "wunifrac")
wunifrac_dist<-distance(meerkat_replicates, method = "wunifrac")
sample_data(meerkat_replicates)$MDS1_WU<-data.frame(ord_wunifrac_mds$vectors)$Axis.1
sample_data(meerkat_replicates)$MDS2_WU<-data.frame(ord_wunifrac_mds$vectors)$Axis.2
# unweighted
ord_uwunifrac_mds<-ordinate(meerkat_replicates, "MDS", "unifrac")
unifrac_dist<-distance(meerkat_replicates, method = "unifrac")
sample_data(meerkat_replicates)$MDS1_UU<-data.frame(ord_uwunifrac_mds$vectors)$Axis.1
sample_data(meerkat_replicates)$MDS2_UU<-data.frame(ord_uwunifrac_mds$vectors)$Axis.2
## jaccard
ord_jaccard_mds<-ordinate(meerkat_replicates, "MDS", "jaccard")
jaccard_dist<-distance(meerkat_replicates, method = "jaccard")
sample_data(meerkat_replicates)$MDS1_JC<-data.frame(ord_jaccard_mds$vectors)$Axis.1
sample_data(meerkat_replicates)$MDS2_JC<-data.frame(ord_jaccard_mds$vectors)$Axis.2
## morisita
ord_morisita_mds<-ordinate(meerkat_replicates, "MDS", "morisita")
morisita_dist<-distance(meerkat_replicates, method = "morisita")
sample_data(meerkat_replicates)$MDS1_Mor<-data.frame(ord_morisita_mds$vectors)$Axis.1
sample_data(meerkat_replicates)$MDS2_Mor<-data.frame(ord_morisita_mds$vectors)$Axis.2
replicates_df<-data.frame(sample_data(meerkat_replicates))
load_model_re<-lmer(TotalAbundance~1+(1|SampleNo), data = replicates_df)
observed_model_re<-lmer(Observed~1 + (1|SampleNo), data = replicates_df)
shannon_model_re<-lmer(Shannon~1 + (1|SampleNo), data = replicates_df)
MDS1_WU_re<-lmer(MDS1_WU~1 + (1|SampleNo), data = replicates_df)
MDS1_UU_re<-lmer(MDS1_UU~1 + (1|SampleNo), data = replicates_df)
MDS1_JC_re<-lmer(MDS2_JC~1 + (1|SampleNo), data = replicates_df)
MDS1_Mor_re<-lmer(MDS1_Mor~1 + (1|SampleNo), data = replicates_df)
## calculate contribution of sample on all diversity measures
performance::icc(load_model_re)
## # Intraclass Correlation Coefficient
##
## Adjusted ICC: 0.904
## Conditional ICC: 0.904
performance::icc(observed_model_re)
## # Intraclass Correlation Coefficient
##
## Adjusted ICC: 0.983
## Conditional ICC: 0.983
performance::icc(shannon_model_re)
## # Intraclass Correlation Coefficient
##
## Adjusted ICC: 0.971
## Conditional ICC: 0.971
performance::icc(MDS1_WU_re)
## # Intraclass Correlation Coefficient
##
## Adjusted ICC: 0.996
## Conditional ICC: 0.996
performance::icc(MDS1_UU_re)
## # Intraclass Correlation Coefficient
##
## Adjusted ICC: 0.988
## Conditional ICC: 0.988
performance::icc(MDS1_JC_re)
## # Intraclass Correlation Coefficient
##
## Adjusted ICC: 0.989
## Conditional ICC: 0.989
performance::icc(MDS1_Mor_re)
## # Intraclass Correlation Coefficient
##
## Adjusted ICC: 0.992
## Conditional ICC: 0.992
adonis_weighted<-vegan::adonis(wunifrac_dist ~ SampleNo, data = replicates_df)
adonis_unweighted<-vegan::adonis(unifrac_dist ~ SampleNo, data = replicates_df)
adonis_jaccard<-vegan::adonis(jaccard_dist ~ SampleNo, data = replicates_df)
adonis_morisita<-vegan::adonis(morisita_dist ~ SampleNo, data = replicates_df)
adonis_weighted
##
## Call:
## vegan::adonis(formula = wunifrac_dist ~ SampleNo, data = replicates_df)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## SampleNo 15 2.14509 0.143006 38.282 0.97289 0.001 ***
## Residuals 16 0.05977 0.003736 0.02711
## Total 31 2.20486 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis_unweighted
##
## Call:
## vegan::adonis(formula = unifrac_dist ~ SampleNo, data = replicates_df)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## SampleNo 15 6.5459 0.43639 4.4011 0.80492 0.001 ***
## Residuals 16 1.5865 0.09915 0.19508
## Total 31 8.1324 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis_jaccard
##
## Call:
## vegan::adonis(formula = jaccard_dist ~ SampleNo, data = replicates_df)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## SampleNo 15 11.6524 0.77683 8.9632 0.89365 0.001 ***
## Residuals 16 1.3867 0.08667 0.10635
## Total 31 13.0391 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis_morisita
##
## Call:
## vegan::adonis(formula = morisita_dist ~ SampleNo, data = replicates_df)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## SampleNo 15 9.6782 0.64521 366.58 0.9971 0.001 ***
## Residuals 16 0.0282 0.00176 0.0029
## Total 31 9.7064 1.0000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################### plot #########################
####################### plot #########################
####################### plot #########################
### colours
mypal = pal_lancet("lanonc", alpha = 1)(7)
mypal5<- pal_rickandmorty("schwifty")(8)
mypal6<-brewer.pal(8,"BrBG")[-1]
mypalx<-c(mypal, mypal5, mypal6)
rep_load<-ggplot(sample_data(meerkat_replicates), aes(x = fct_reorder(SampleNo, TotalAbundance, .fun = "median"),
y = TotalAbundance, group = SampleNo, shape = Storage))+
geom_line(aes(col = SampleNo), size = 2)+
geom_jitter( width = 0, height = 0.01, size = 4, aes(fill = SampleNo), col = "black")+
ylab("Bacterial load estimate")+
xlab("Sample")+
theme_bw(base_size = 14)+
scale_fill_manual(values = mypalx)+
scale_color_manual(values = mypalx)+
theme(axis.text.x = element_blank())+
scale_shape_manual(values = c(21,24))+
guides(fill = guide_legend(override.aes=list(shape=21)))
#### alpha diversity
rep_observed<-ggplot(sample_data(meerkat_replicates), aes(x = fct_reorder(SampleNo, Observed, .fun = "median"),
y = Observed, group = SampleNo, shape = Storage))+
geom_line(aes(col = SampleNo), size = 2)+
geom_jitter(width = 0, height = 0.01, size = 4, aes(fill = SampleNo), col = "black")+
ylab("Observed richness")+
xlab("Sample")+
theme_bw(base_size = 14)+
scale_fill_manual(values = mypalx)+
scale_color_manual(values = mypalx)+
theme(axis.text.x = element_blank())+
scale_shape_manual(values = c(21,24))+
guides(fill = guide_legend(override.aes=list(shape=21)))
rep_shannon<-ggplot(sample_data(meerkat_replicates), aes(x = fct_reorder(SampleNo, Shannon, .fun = "median"),
y = Shannon, group = SampleNo, shape = Storage))+
geom_line(aes(col = SampleNo), size = 2)+
geom_jitter( width = 0, height = 0.01, size = 4, aes(fill = SampleNo), col = "black")+
ylab("Shannon index")+
xlab("Sample")+
theme_bw(base_size = 14)+
scale_fill_manual(values = mypalx)+
scale_color_manual(values = mypalx)+
theme(axis.text.x = element_blank())+
scale_shape_manual(values = c(21,24))+
guides(fill = guide_legend(override.aes=list(shape=21)))
### axis
rep_MDS1_WU<-ggplot(sample_data(meerkat_replicates), aes(x = fct_reorder(SampleNo, MDS1_WU, .fun = "median"),
y = MDS1_WU, group = SampleNo, shape = Storage))+
geom_line(aes(col = SampleNo), size = 2)+
geom_jitter( width = 0, height = 0.01, size = 4, aes(fill = SampleNo), col = "black")+
ylab("Axis 1 Weighted Unifrac")+
xlab("Sample")+
theme_bw(base_size = 14)+
scale_fill_manual(values = mypalx)+
scale_color_manual(values = mypalx)+
theme(axis.text.x = element_blank())+
scale_shape_manual(values = c(21,24))+
guides(fill = guide_legend(override.aes=list(shape=21)))
rep_MDS1_UU<-ggplot(sample_data(meerkat_replicates), aes(x = fct_reorder(SampleNo, MDS1_UU, .fun = "median"),
y = MDS1_UU, group = SampleNo, shape = Storage))+
geom_line(aes(col = SampleNo), size = 2)+
geom_jitter( width = 0, height = 0.01, size = 4, aes(fill = SampleNo), col = "black")+
ylab("Axis 1 Unweighted Unifrac")+
xlab("Sample")+
theme_bw(base_size = 14)+
scale_fill_manual(values = mypalx)+
scale_color_manual(values = mypalx)+
theme(axis.text.x = element_blank())+
scale_shape_manual(values = c(21,24))+
guides(fill = guide_legend(override.aes=list(shape=21)))
rep_MDS1_JC<-ggplot(sample_data(meerkat_replicates), aes(x = fct_reorder(SampleNo, MDS1_JC, .fun = "median"),
y = MDS1_JC, group = SampleNo, shape = Storage))+
geom_line(aes(col = SampleNo), size = 2)+
geom_jitter( width = 0, height = 0.01, size = 4, aes(fill = SampleNo), col = "black")+
ylab("Axis 1 Jaccard")+
xlab("Sample")+
theme_bw(base_size = 14)+
scale_fill_manual(values = mypalx)+
scale_color_manual(values = mypalx)+
theme(axis.text.x = element_blank())+
scale_shape_manual(values = c(21,24))+
guides(fill = guide_legend(override.aes=list(shape=21)))
rep_MDS1_Mor<-ggplot(sample_data(meerkat_replicates), aes(x = fct_reorder(SampleNo, MDS1_Mor, .fun = "median"),
y = MDS1_Mor, group = SampleNo, shape = Storage))+
geom_line(aes(col = SampleNo), size = 2)+
geom_jitter( width = 0, height = 0.01, size = 4, aes(fill = SampleNo), col = "black")+
ylab("Axis 1 Morisita")+
xlab("Sample")+
theme_bw(base_size = 14)+
scale_fill_manual(values = mypalx)+
scale_color_manual(values = mypalx)+
theme(axis.text.x = element_blank())+
scale_shape_manual(values = c(21,24))+
guides(fill = guide_legend(override.aes=list(shape=21)))
ggarrange(rep_load, rep_observed,
rep_MDS1_JC,
rep_MDS1_Mor, rep_MDS1_UU,
rep_MDS1_WU,
# rep_weighted, rep_unweighted, rep_jaccard,
nrow = 2, ncol = 3,
common.legend = T, legend = "right",
labels = c("a)", "b)", "c)", "d)", "e)", "f)"),
font.label = list(size = 20))
###### ordination plots
rep_weighted<-ggplot(sample_data(meerkat_replicates), aes(x = MDS1_WU, y = MDS2_WU, group = SampleNo, shape = Storage))+
geom_line(aes(col = SampleNo), size = 2)+
geom_point(aes(fill = SampleNo), size = 4)+
theme_bw(base_size = 14)+
scale_fill_manual(values = mypalx)+
scale_color_manual(values = mypalx)+
scale_shape_manual(values = c(21,24))+
guides(fill = guide_legend(override.aes=list(shape=21)))+
xlab("Axis 1 WU (36%)")+
ylab("Axis 2 WU (13%)")
# unweighted
rep_unweighted<-ggplot(sample_data(meerkat_replicates), aes(x = MDS1_UU, y = MDS2_UU, group = SampleNo, shape = Storage))+
geom_line(aes(col = SampleNo), size = 2)+
geom_point(aes(fill = SampleNo), size = 4)+
theme_bw(base_size = 14)+
scale_fill_manual(values = mypalx)+
scale_color_manual(values = mypalx)+
scale_shape_manual(values = c(21,24))+
guides(fill = guide_legend(override.aes=list(shape=21)))+
xlab("Axis 1 UU (19%)")+
ylab("Axis 2 UU (11%)")
## jaccard
ord_jaccard_mds$values$Relative_eig[1:4]
## [1] 0.14039363 0.09086992 0.08670688 0.07489021
rep_jaccard<-ggplot(sample_data(meerkat_replicates), aes(x = MDS1_JC, y = MDS2_JC, group = SampleNo, shape = Storage))+
geom_line(aes(col = SampleNo), size = 2)+
geom_point(aes(fill = SampleNo), size = 4)+
theme_bw(base_size = 14)+
scale_fill_manual(values = mypalx)+
scale_color_manual(values = mypalx)+
scale_shape_manual(values = c(21,24))+
guides(fill = guide_legend(override.aes=list(shape=21)))+
xlab("Axis 1 JC (14%)")+
ylab("Axis 2 JC (9%)")
## morisita
ord_morisita_mds$values$Relative_eig[1:4]
## [1] 0.2374819 0.1885098 0.1392811 0.1335871
rep_morisita<-ggplot(sample_data(meerkat_replicates), aes(x = MDS1_Mor, y = MDS2_Mor, group = SampleNo, shape = Storage))+
geom_line(aes(col = SampleNo), size = 2)+
geom_point(aes(fill = SampleNo), size = 4)+
theme_bw(base_size = 14)+
scale_fill_manual(values = mypalx)+
scale_color_manual(values = mypalx)+
scale_shape_manual(values = c(21,24))+
guides(fill = guide_legend(override.aes=list(shape=21)))+
xlab("Axis 1 Morisita (24%)")+
ylab("Axis 2 Morista (19%)")
ggarrange(rep_jaccard, rep_morisita, rep_unweighted, rep_weighted, ncol = 2, nrow = 2, common.legend = T, legend = "right")
This chunk took 0.19 minutes
## pilot study
pilot<- readRDS("pilot_study_data_phyloseq.RDS")
pilot
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 899 taxa and 18 samples ]:
## sample_data() Sample Data: [ 18 samples by 3 sample variables ]:
## tax_table() Taxonomy Table: [ 899 taxa by 7 taxonomic ranks ]:
## phy_tree() Phylogenetic Tree: [ 899 tips and 898 internal nodes ]:
## taxa are rows
pilot_rarefied<-rarefy_even_depth(pilot, sample.size = min(sample_sums(pilot)), rngsee = 100, replace = TRUE, trimOTUs=TRUE,verbose=TRUE)
core_microbiome<-speedyseq::tax_glom(pilot_rarefied, taxrank = "Family")
core_microbiome<-microbiome::core(core_microbiome, detection = 500, prevalence = 0.6)
speedyseq::plot_bar(core_microbiome, fill = "Family")+scale_fill_manual(values = mypaly)+facet_wrap(~Storage, scales = "free")
### beta diversity
ord_wunifrac_mds<-ordinate(pilot_rarefied, "MDS", "wunifrac")
ord_unifrac_mds<-ordinate(pilot_rarefied, "MDS", "unifrac")
dist_wunifrac<-distance(pilot_rarefied, method="wunifrac")
dist_unifrac<-distance(pilot_rarefied, method="unifrac")
sample_data(pilot_rarefied)$MDS1<-data.frame(ord_wunifrac_mds$vectors)$Axis.1
sample_data(pilot_rarefied)$MDS2<-data.frame(ord_wunifrac_mds$vectors)$Axis.2
sample_data(pilot_rarefied)$MDS3<-data.frame(ord_wunifrac_mds$vectors)$Axis.3
sample_data(pilot_rarefied)$MDS1_UU<-data.frame(ord_unifrac_mds$vectors)$Axis.1
sample_data(pilot_rarefied)$MDS2_UU<-data.frame(ord_unifrac_mds$vectors)$Axis.2
ggplot(sample_data(pilot_rarefied), aes(x = MDS1, y = MDS2, group = IndividID, fill = Storage))+
geom_line(size = 2)+
geom_point( size = 4, pch = 21)+
theme_bw(base_size = 14)+
scale_fill_manual(values = mypalx)+
scale_color_manual(values = mypalx)
ggplot(sample_data(pilot_rarefied), aes(x = MDS1_UU, y = MDS2_UU, group = IndividID, fill = Storage))+
geom_line(size = 2)+
geom_point( size = 4, pch = 21)+
theme_bw(base_size = 14)+
scale_fill_manual(values = mypalx)+
scale_color_manual(values = mypalx)
metadata<-data.frame(sample_data(pilot_rarefied))
adonis2(dist_wunifrac~
Storage+
IndividID,
data = metadata,
by = "margin")
adonis2(dist_unifrac~
Storage+
IndividID,
data = metadata,
by = "margin")
This chunk took 0.04 minutes
end_time<-Sys.time()
round((end_time - start_time), 3)
## Time difference of 12.483 hours
This chunk took 0 minutes
Thanks to those who made their data and code open access for myself and others to copy and build upon. Also thanks to the many blog authors for providing explanations of code and stats that helped to build this analysis. Special thanks to Gavin Simpson for his blogs on GAMs, gratia and vegan (https://fromthebottomoftheheap.net/), which I found especially useful throughout this analysis.
If you are using this data and/or code please pay it forward and put your own data and code online in a format that is easily reproducible. People who use the line ‘data/code available on reasonable request’ are takers - don’t be that person. I am a researcher that supports equal opportunities for everyone, and stand in solidarity with those negatively affected by political systems and power structures that promote sexist, racist, colonial, and homophobic attitudes.