READ ME

This code processes the dataset ‘processed_data_phyloseq.RDS’, which is a phyloseq object that can be downloaded from https://zenodo.org/deposit/5337076. This dataset contains 16S microbiome data from 1109 meerkat faecal samples. The dataset is not normalised, but lab and sand contaminants have been filtered out, and only samples used in the final analysis are included. Original data contained an internal standard of Allobacillus and Imtechella - these have already been removed. Samples with low sequencing depth and any replicates have been removed. The code contains general data exploration and model validation that are additional to those presented in the paper. At the end it also analyzes the technical replicate data (‘technical_replicate_data_phyloseq.RDS’) and the pilot study data on captive meerkats (‘pilot_study_data_phyloseq.RDS’). Note that this code takes several hours to run (~12 hours). To do a test run on a reduced dataset, there is the option to run a chunk after data import that generates a small but representative dataset, which takes about an hour to process, although some of the sensitivity analyses will throw up an error for this dataset due to small sample sizes. This Rmarkdown script allows errors to occur without halting the report. To turn this option off, remove “error = TRUE” from the global options. Each code chunk ends with elapsed time, and the total time taken to run the script is as the end of the document. For details on the study population and study design, please refer to the manuscript.

Manuscript information

Diurnal oscillations in gut bacterial load and composition eclipse seasonal and lifetime dynamics in wild meerkats, Suricata suricatta

Risely, A.1*, Wilhelm, K.1, Clutton-Brock, T.2, 3, 4, Manser, M.B. 3, 4, 5, and Sommer, S.1

  1. Institute for Evolutionary Ecology and Conservation Genomics, Ulm, Germany
  2. Large Animal Research Group, Department of Zoology, University of Cambridge, Downing Street, CB2 3EJ Cambridge, United Kingdom
  3. University of Pretoria, Mammal Research Institute, Pretoria, ZA
  4. Kalahari Research Trust, Kuruman River Reserve, Northern Cape, ZA
  5. Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich

Corresponding author: Alice Risely, riselya@gmail.com

Abstract

Circadian rhythms in gut microbiota composition are crucial for metabolic function, yet the extent to which they govern microbial dynamics compared to seasonal and lifetime processes remains unknown. Here we investigate gut bacterial dynamics in wild meerkats (Suricata suricatta) over a 20-year period to compare diurnal, seasonal, and lifetime processes in concert, applying ratios of absolute abundance. We found that diurnal oscillations in bacterial load and composition eclipsed seasonal and lifetime dynamics. Diurnal oscillations were characterised by a peak in Clostridium abundance at dawn, were associated with temperature-constrained foraging schedules, and did not decay with age. Some genera exhibited seasonal fluctuations, whilst others developed with age, although we found little support for microbial senescence in very old meerkats. Strong microbial circadian rhythms in this species may reflect the extreme daily temperature fluctuations typical of arid-zone climates. Our findings demonstrate that accounting for circadian rhythms is essential for future gut microbiome research.

R setup

Load packages

# data analysis packages
library(phyloseq)
library(mgcv)
library(gratia)
library(vegan)
library(tidyverse)
library(caret)
library(metagMisc)
library(expss)
library(scales)
library(broom)
library(lme4)
library(performance)

# viz packages
library(ggplot2)
library(magick)
library(RColorBrewer)
library(ggsci)
library(viridis)
library(ggpubr)
library(sjPlot)
library(gridExtra)

memory.limit(1000000) # set large memory limit
## [1] 1e+06

This chunk took 0.18 minutes

Session info

sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
## [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
## [5] LC_TIME=English_Australia.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] gridExtra_2.3      sjPlot_2.8.4       ggpubr_0.4.0       viridis_0.5.1     
##  [5] viridisLite_0.3.0  ggsci_2.9          RColorBrewer_1.1-2 magick_2.4.0      
##  [9] performance_0.4.7  lme4_1.1-23        Matrix_1.2-18      broom_0.5.6       
## [13] scales_1.1.1       expss_0.10.2       metagMisc_0.0.4    caret_6.0-86      
## [17] forcats_0.5.0      stringr_1.4.0      dplyr_1.0.0        purrr_0.3.4       
## [21] readr_1.3.1        tidyr_1.1.0        tibble_3.0.1       ggplot2_3.3.2     
## [25] tidyverse_1.3.0    vegan_2.5-6        lattice_0.20-38    permute_0.9-5     
## [29] gratia_0.4.1       mgcv_1.8-31        nlme_3.1-143       phyloseq_1.30.0   
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1         backports_1.1.7      plyr_1.8.6          
##   [4] igraph_1.2.5         splines_3.6.2        TH.data_1.0-10      
##   [7] digest_0.6.27        foreach_1.5.0        htmltools_0.5.1.1   
##  [10] fansi_0.4.1          magrittr_2.0.1       checkmate_2.0.0     
##  [13] cluster_2.1.0        openxlsx_4.1.5       recipes_0.1.13      
##  [16] Biostrings_2.54.0    modelr_0.1.8         gower_0.2.2         
##  [19] matrixStats_0.56.0   sandwich_2.5-1       colorspace_1.4-1    
##  [22] blob_1.2.1           rvest_0.3.5          haven_2.3.1         
##  [25] xfun_0.22            crayon_1.3.4         jsonlite_1.7.0      
##  [28] zoo_1.8-8            survival_3.2-7       iterators_1.0.12    
##  [31] ape_5.4              glue_1.4.1           gtable_0.3.0        
##  [34] emmeans_1.4.8        ipred_0.9-9          zlibbioc_1.32.0     
##  [37] XVector_0.26.0       sjstats_0.18.0       sjmisc_2.8.5        
##  [40] car_3.0-8            Rhdf5lib_1.8.0       BiocGenerics_0.32.0 
##  [43] abind_1.4-5          mvtnorm_1.1-1        DBI_1.1.0           
##  [46] ggeffects_1.1.0      rstatix_0.6.0        Rcpp_1.0.4.6        
##  [49] xtable_1.8-4         htmlTable_2.0.0      foreign_0.8-74      
##  [52] stats4_3.6.2         lava_1.6.7           prodlim_2019.11.13  
##  [55] htmlwidgets_1.5.1    httr_1.4.1           ellipsis_0.3.1      
##  [58] pkgconfig_2.0.3      nnet_7.3-12          dbplyr_1.4.4        
##  [61] tidyselect_1.1.0     rlang_0.4.11         reshape2_1.4.4      
##  [64] effectsize_0.3.1     munsell_0.5.0        cellranger_1.1.0    
##  [67] tools_3.6.2          cli_2.0.2            generics_0.0.2      
##  [70] sjlabelled_1.1.6     ade4_1.7-15          evaluate_0.14       
##  [73] biomformat_1.14.0    yaml_2.2.1           ModelMetrics_1.2.2.2
##  [76] knitr_1.33           fs_1.4.1             zip_2.0.4           
##  [79] mvnfast_0.2.5.1      xml2_1.3.2           compiler_3.6.2      
##  [82] rstudioapi_0.11      curl_4.3             ggsignif_0.6.0      
##  [85] reprex_0.3.0         statmod_1.4.34       stringi_1.4.6       
##  [88] parameters_0.8.0     nloptr_1.2.2.1       multtest_2.42.0     
##  [91] vctrs_0.3.1          pillar_1.4.4         lifecycle_0.2.0     
##  [94] estimability_1.3     data.table_1.12.8    cowplot_1.1.0       
##  [97] insight_0.13.1       patchwork_1.1.0.9000 R6_2.4.1            
## [100] rio_0.5.16           IRanges_2.20.2       codetools_0.2-16    
## [103] boot_1.3-24          MASS_7.3-54          assertthat_0.2.1    
## [106] rhdf5_2.30.1         withr_2.2.0          multcomp_1.4-13     
## [109] S4Vectors_0.24.4     bayestestR_0.8.2     parallel_3.6.2      
## [112] hms_0.5.3            grid_3.6.2           rpart_4.1-15        
## [115] timeDate_3043.102    coda_0.19-3          class_7.3-15        
## [118] minqa_1.2.4          rmarkdown_2.8        carData_3.0-4       
## [121] pROC_1.16.2          Biobase_2.46.0       lubridate_1.7.9
start_time<-Sys.time()

This chunk took 0 minutes

Data import and normalisation

Import data and scale to internal standard

  • Data is in the format of a phyloseq object.
  • The scaling factor per sample is included as a metadata column called ‘scaling factor’. This is used to normalise the data.
  • Varables are defined below.
meerkat_final<-readRDS("processed_data_phyloseq.RDS") # import data as phyloseq object

meerkat_final #data summary 
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 26122 taxa and 1109 samples ]
## sample_data() Sample Data:       [ 1109 samples by 26 sample variables ]
## tax_table()   Taxonomy Table:    [ 26122 taxa by 7 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 26122 tips and 26121 internal nodes ]
## scale to internal standard
asv_table<-data.frame(meerkat_final@otu_table@.Data)
names(asv_table)<-sample_data(meerkat_final)$feature.id
meerkat_normalized<-sweep(asv_table, MARGIN=2, sample_data(meerkat_final)$scaling_factor, `*`)
meerkat_normalized<-round(meerkat_normalized, digits = 0)
meerkat_normalized<-otu_table(meerkat_normalized, taxa_are_rows = TRUE)
meerkat_scaled<-meerkat_final
otu_table(meerkat_scaled)<-meerkat_normalized

### add total abundance of reads to dataframe
sample_data(meerkat_scaled)$TotalAbundance<-sample_sums(meerkat_scaled)

paste("Number of individuals sampled:",length(unique(sample_data(meerkat_scaled)$IndividID)))
## [1] "Number of individuals sampled: 235"
# 1109 samples from 235 meerkats

## how many samples per individual?

summary(data.frame(table(sample_data(meerkat_scaled)$IndividID))$Freq)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   4.000   4.719   6.000  14.000

This chunk took 0.2 minutes

Optional test dataset

To speed up analysis to see if the script works, run this code which generates a smaller dataset (~300) samples) to work with for testing (change “eval = FALSE” to “eval = TRUE” option at the start of this chunk to use this dataset). This test dataset needs to be quite specific for models to work - the samples need to be distributed across limited numbers of individuals and social groups so random effects do not break models. Some validation code might not work with this reduced dataset, therefore I have allowed errors so that these do not halt the script.

#not run

samplecount<-data.frame(sample_data(meerkat_scaled)) %>% 
  group_by(IndividID) %>% 
  summarise(NoSamples = n())

samplecount<-samplecount%>%arrange(IndividID, -NoSamples)

sample_data(meerkat_scaled)$SampleCount<-vlookup(sample_data(meerkat_scaled)$IndividID, samplecount, lookup_column = "IndividID", result_column = "NoSamples")


test_data<-subset_samples(meerkat_scaled, SampleCount >5)
samplenames<-sample_names(test_data)
subsamples<-sample(samplenames, 300) # choose 150

# also need some samples from sequencing run 4 to that some of the code works
run4<-subset_samples(meerkat_scaled, Seq_run == "RUN4")
samplenames_run4<-sample_names(run4)
subsamples_final<-c(subsamples, samplenames_run4)

test_data<-prune_samples(subsamples_final, meerkat_scaled)
test_data<-microbiome::core(test_data, detection = 10, prevalence =0.01)

# remove rare social groups
majorgroups<-data.frame(table(sample_data(test_data)$GroupAtSampling))
majorgroups<-majorgroups[order(-majorgroups$Freq),]
majorgroups<-subset(majorgroups, Freq >10)
majorgroups<-as.character(majorgroups$Var1)
sample_data(test_data)$Test2<-sample_data(test_data)$GroupAtSampling %in% majorgroups

test_data<-subset_samples(test_data, Test2==TRUE) 

meerkat_scaled<-test_data

This chunk took 0 minutes

Description of sample metadata

str(data.frame(sample_data(meerkat_final)))
## 'data.frame':    1109 obs. of  26 variables:
##  $ IndividID              : Factor w/ 235 levels "ID 1","ID 10",..: 202 202 170 228 1 165 202 170 168 214 ...
##  $ feature.id             : Factor w/ 1109 levels "Feature1","Feature10",..: 1 379 587 793 897 2 192 201 212 222 ...
##  $ SampleNo               : Factor w/ 1109 levels "Sample1","Sample10",..: 1 251 359 467 572 680 789 893 1002 2 ...
##  $ Storage                : Factor w/ 2 levels "FREEZEDRIED",..: 2 1 1 1 2 1 1 2 2 2 ...
##  $ Seq_run                : Factor w/ 4 levels "RUN1","RUN2",..: 1 1 1 4 1 1 1 1 1 1 ...
##  $ Seq_depth_nospikein    : num  26904 40273 30469 31224 31869 ...
##  $ HoursTilFrozen         : num  3.961 1.976 2.611 0.504 3.751 ...
##  $ scaling_factor         : num  4.769 2.809 5.245 0.063 1.846 ...
##  $ SampleDate             : Date, format: "1997-04-18" "2002-01-06" ...
##  $ SampleTime             : POSIXct, format: "1899-12-30 08:29:00" "1899-12-30 19:05:00" ...
##  $ SampleTimeCat          : Factor w/ 2 levels "AFTERNOON","MORNING": 2 1 2 2 2 2 1 2 2 1 ...
##  $ HoursAfterSunrise      : num  1.55 13.23 2.81 5.21 2.24 ...
##  $ month                  : num  4 1 4 5 8 8 9 9 9 10 ...
##  $ Season                 : Factor w/ 2 levels "DRY WINTER","WET SUMMER": 2 2 2 1 1 1 1 1 1 2 ...
##  $ Year                   : num  1997 2002 2002 2002 2002 ...
##  $ AgeAtSamplingYears     : num  0.126 4.849 3.521 2.173 6.633 ...
##  $ AgeCat                 : Factor w/ 3 levels "MIDDLEAGED","OLD",..: 3 1 1 1 2 1 2 1 1 1 ...
##  $ Sex                    : Factor w/ 2 levels "F","M": 1 1 2 1 1 2 1 2 2 2 ...
##  $ GroupAtSampling        : Factor w/ 42 levels "Group 1","Group 10",..: 30 19 30 31 2 35 19 30 40 33 ...
##  $ Condition_weight       : num  -101.94 -52.34 6.77 -4.15 -61.91 ...
##  $ SocialStatus           : Factor w/ 2 levels "Dominant","Subordinate": 2 2 1 1 2 2 2 1 2 2 ...
##  $ HoursSinceForagingStart: num  0 0.629 1.298 3.504 0.278 ...
##  $ Temp_max               : num  32.7 37.7 34.3 29 23.1 26.6 29.8 26.2 34 21.1 ...
##  $ Temp_min               : num  15.2 25.3 16.2 5.3 -0.4 7.5 11.6 1.1 10.4 7.3 ...
##  $ HourlyTemp             : num  16.7 34.4 24 25.7 12.5 23.6 29 20.2 27.7 19.7 ...
##  $ sum_rainfall_month     : num  25.4 13.2 19.2 18.4 0.2 0.2 0.2 0.2 23.8 23.8 ...

This chunk took 0 minutes

VARIABLE DEFINITIONS

[1] IndividID = Meerkat ID

[2] feature.id = DNA sample ID

[3] SampleNo = faecal sample ID

[4] Storage = Sample storage (freezedried or frozen)

[5] Seq_run = Sequencing run

[6] Seq_depth_nospikein = Sequencing depth after internal standard removed

[7] HoursTilFrozen = Number of hours sample was in field before frozen

[8] scaling_factor = Scaling factor to scale sample to internal standard

[9] SampleDate = Date sample was collected

[10] SampleTime = Time sample was collected

[11] SampleTimeCat = Categorical time variable (morning/afternoon)

[12] HoursAfterSunrise = Number of hours after sunrise at sample collection

[13] month = the month the sample was collected

[14] Season = Wet or dry season

[15] Year = Year sample was collected

[16] AgeAtSamplingYears = Meerkat age in years at sample collection

[17] AgeCat = Categorical age variable (young/adult/old)

[18] Sex = Meerkat sex

[19] GroupAtSampling = Social group at sampling

[20] Condition_weight = Residual body mass from GAM (mass ~ age)

[21] SocialStatus = Social status at time of sample collection (dom/sub)

[22] HoursSinceForagingStart = Time meerkat was foraging before sample collection

[23] Temp_max = Maximum temperature on day of sample collection

[24] Temp_min = Minimum temperature on day of sample collection

[25] HourlyTemp = Temperature (to the nearest hour) at sample collection

[26] sum_rainfall_month = Cumulative rainfall in mm over the previous month

Figure 1: Sampling distribution and study design

metadata <- data.frame(sample_data(meerkat_scaled)) #extract metadata as DF

# order meerkat ID my their median sampling date
metadata1 <- metadata %>%
  group_by(IndividID) %>%
  summarize(first = mean(SampleDate))

metadata1 <- metadata1[order(metadata1$first), ]
ID_order <- as.character(metadata1$IndividID)

# order ID by median sampling date
sample_data(meerkat_scaled)$IndividID <-
  factor(sample_data(meerkat_scaled)$IndividID, levels = ID_order) 

This chunk took 0.01 minutes

Figure 1a: Sample timeline

fig1a<-ggplot(data=sample_data(meerkat_scaled),aes(x=SampleDate,y=IndividID,group=IndividID))+
  geom_line(size=0.5,alpha=0.5)+
  geom_point(aes(fill=HoursAfterSunrise),colour="black",pch=21,size=1.5)+
  theme_bw(base_size=14)+
  theme(panel.grid.minor=element_blank(),
        panel.background=element_blank(),axis.line=element_line(colour="black"))+
  theme(axis.text.y=element_blank(),
        axis.ticks.y=element_blank())+
  scale_fill_viridis_c(option="plasma",direction=-1,name="Time of day")+
  scale_x_date(date_breaks = "1 year")+
  theme(axis.text.x = element_text(angle = 45, vjust = 0.9, hjust=1))+
  ylab("Meerkat ID")+
  xlab("")+
  scale_y_discrete(expand=expansion(add=c(0.2,1.5)))+
  theme(legend.position="none")

This chunk took 0 minutes

Figure 1e: Sampling distributions across scales

  • Summarise sampling distribution across the three temporal scales.
hist1<-ggplot(sample_data(meerkat_scaled), aes(x = HoursAfterSunrise))+
  geom_histogram(fill = "grey", col = "black", binwidth = 1, boundary = 0)+
  theme_bw(base_size = 22)+
  xlab("Hours after sunrise")+
  ylab("Frequency")

hist2<-ggplot(sample_data(meerkat_scaled), aes(x = as.factor(month)))+
  geom_histogram(fill = "grey", col = "black", binwidth = 1, stat = "count")+
  theme_bw(base_size = 22)+
  xlab("Month")+
  ylab("Frequency")+
  scale_x_discrete(breaks = c("2","4", "6","8","10","12"))+
  theme(axis.title.y = element_blank())

hist3<-  ggplot(sample_data(meerkat_scaled), aes(x = AgeAtSamplingYears))+
geom_histogram(fill = "grey", col = "black", binwidth = 1, boundary = 0)+
  theme_bw(base_size = 22)+
  xlab("Age (years)")+
  ylab("Frequency")+
  scale_x_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9,10))+
  theme(axis.title.y = element_blank())


Fig1e<- ggarrange(hist1, hist2, hist3, ncol = 3, widths = c(1.2,1,1))
#ggarrange(hist1, hist2, hist3, ncol = 3, widths = c(1.2,1,1))

# save as pdf
#ggsave("FIGURES/fig1e.pdf")

This chunk took 0.01 minutes

Build figure 1

## import Fig1bcd

fig1bcd <- magick::image_read_pdf("C:/Users/risel/Dropbox/Sommer postdoc/Meerkat project/PROJECTS/MeerkatSpikeInData/Analysis updated/TEMPORAL DYNAMICS ANALYSIS/FIGURES/Fig1abc.pdf")

## give it a border

fig1bcd_border <- fig1bcd  %>%
  magick::image_border("black", "15 x 15")

## import fig1e that i saved earlier

fig1e <- magick::image_read_pdf("C:/Users/risel/Dropbox/Sommer postdoc/Meerkat project/PROJECTS/MeerkatSpikeInData/Analysis updated/TEMPORAL DYNAMICS ANALYSIS/FIGURES/fig1e.pdf")

# give border

fig1e_border <- fig1e %>%
  magick::image_border("black", "15 x 15")

# this figure won't wont fully work with reduced datset.

fig1a +annotation_raster(fig1bcd_border, xmin = as.Date("1997-01-01"), xmax = as.Date("2010-01-01"), ymin =  "ID 134", ymax = "ID 231") +
annotation_raster(fig1e_border, xmin = as.Date("2010-01-01"), xmax = as.Date("2020-01-01"), ymin =  "ID 39", ymax = "ID 61") 

# ggsave("FIGURES/Figure_1.pdf", width = 12, height = 6, units = c("in"))

This chunk took 0.11 minutes

Temporal dynamics of bacterial load, alpha and beta diversity

Bacterial load

We model log10-transformed bacterial load (i.e. total 16S copy number per each scaled sample) against time of day, month, and meerkat age using Generalized Additive Mixed Models whilst accounting for methodological variables (storage method, sampling depth, sequencing run, time in field prior to freezing) as fixed linear effects, individual ID and social group as random effects, and including an auto-regression term to account for temporal autocorrelation (Wood 2017). The auto-regression term doesn’t really change models but I included it anyway as a precaution.

Fit GAMM predicting bacterial load

  • Extract metadata as dataframe.
  • Assess data distribution.
  • Scale variables around the mean and log-transform if neccessary.
  • Fit GAMM predicting load.
  • Load is log-transformed because fitting other distributions (Poisson/neg binomial/zero inflated neg binomial) performed worse (although overall results were the same).
metadata<-data.frame(sample_data(meerkat_scaled))

### bacterial load distribution
ggplot(metadata, aes(x = TotalAbundance))+geom_histogram(fill = "grey", col = "black", boundary = 0) + ggtitle("Untransformed")+
ggplot(metadata, aes(x = log10(TotalAbundance)))+geom_histogram(fill = "grey", col = "black", boundary = 0) + ggtitle("Log10-transformed")

This chunk took 0.01 minutes

#scale variables
metadata$Seq_depth_nospikein<-as.numeric(scale(metadata$Seq_depth_nospikein))
metadata$HoursTilFrozen<-as.numeric(scale(metadata$HoursTilFrozen))


## GAMM MODEL

abundance_model <- mgcv::gamm(log10(TotalAbundance)~
    s(HoursAfterSunrise, bs = "cr")+
    s(month, bs = "cc")+
    s(AgeAtSamplingYears, bs = "cr")+
    s(Seq_depth_nospikein, bs = "cr")+
    Storage+
    Seq_run+
    HoursTilFrozen+
    s(IndividID, bs = "re")+
    s(GroupAtSampling, bs = "re"),
  correlation = corARMA(form = ~ 1|Year, p = 3),
  data=metadata,
  family = gaussian)

# GAMM summary

sjPlot::tab_model(abundance_model$gam,  
                  show.stat = TRUE, 
                  show.se = TRUE, 
                  title = "Table S2: GAMM predicting bacterial load. Note that the estimate for smooth terms is actually EDF (estimated degrees of freedom), it is not an 'estimate'. EDF reflects the degree of non-linearity of a curve (1 = linear).",
                  dv.labels = "Bacterial load (log-transformed)",
                   pred.labels = c("Intercept", "Storage (frozen)", "Seq. run (2)", "Seq run (3)", "Seq. run (4)", "Time in field", "Hours after sunrise (non-linear)", "Month (non-linear)", "Age in years (non-linear)", "Seq. depth (non-linear)", "Meerkat ID (random)", "Social group (random)"))
Table S2: GAMM predicting bacterial load. Note that the estimate for smooth terms is actually EDF (estimated degrees of freedom), it is not an ‘estimate’. EDF reflects the degree of non-linearity of a curve (1 = linear).
  Bacterial load (log-transformed)
Predictors Estimates std. Error CI Statistic p
Intercept 4.94 0.04 4.86 – 5.01 125.97 <0.001
Storage (frozen) -0.12 0.04 -0.20 – -0.04 -3.10 0.002
Seq. run (2) -0.00 0.04 -0.08 – 0.08 -0.06 0.955
Seq run (3) -0.02 0.04 -0.11 – 0.06 -0.46 0.642
Seq. run (4) -0.65 0.10 -0.86 – -0.45 -6.26 <0.001
Time in field 0.05 0.03 -0.01 – 0.10 1.74 0.082
Hours after sunrise (non-linear) 4.64 54.36 <0.001
Month (non-linear) 2.06 1.05 0.007
Age in years (non-linear) 1.65 9.10 0.003
Seq. depth (non-linear) 5.11 53.71 <0.001
Meerkat ID (random) 9.64 0.04 0.278
Social group (random) 10.20 0.61 0.001
Observations 1109
R2 0.466
# Temporal effects (without residuals)
gratia::draw(abundance_model$gam, select = c(1:3), ncol = 3, scales = "fixed")

This chunk took 1.62 minutes

Check model

model_check_load<-gratia::appraise(abundance_model$gam)

annotate_figure(model_check_load, top = text_grob("Model check of bacterial load")) # check model

This chunk took 0.03 minutes

Plot estimates for bacterial load

  • Extract confidence intervals per term from GAMM.
  • Generate rug data (for sample distribution layer).
  • Generate partial residuals dataframe per term.
  • Backtransform estimates into bacterial load.
  • Add lines where knots are placed

This chunk took 0 minutes

confint_df_time<-confint(abundance_model$gam, parm = "s(HoursAfterSunrise)",  type = "confidence")
confint_df_season<-confint(abundance_model$gam, parm = "s(month)",  type = "confidence")
confint_df_age<-confint(abundance_model$gam, parm = "s(AgeAtSamplingYears)",  type = "confidence")

confint_df_time$Variable<-"HoursAfterSunrise"
names(confint_df_time)[3]<-"Time"
confint_df_season$Variable<-"month"
names(confint_df_season)[3]<-"Time"
confint_df_age$Variable<-"AgeAtSamplingYears"
names(confint_df_age)[3]<-"Time"

confint_df_load<-rbind(confint_df_time, confint_df_season, confint_df_age)

###########add geom_rug data ##################

rugdata<- data.frame(cbind(metadata$HoursAfterSunrise, metadata$month, metadata$AgeAtSamplingYears))
names(rugdata)<-c("HoursAfterSunrise", "month", "AgeAtSamplingYears")
rugdata<-gather(rugdata, Variable, Time, HoursAfterSunrise:AgeAtSamplingYears, factor_key=TRUE)
rugdata$est = 0
rugdata$Variable<-factor(rugdata$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))

################# add residuals #######

fv <- predict(abundance_model$gam,type="terms") ## get term estimates

prsd_time <- data.frame(as.numeric(residuals(abundance_model$gam,type="working")) + fv[,3])
names(prsd_time)<-"Residuals"
prsd_time$Time<-metadata$HoursAfterSunrise
prsd_time$Variable<-"HoursAfterSunrise"
prsd_time$predicted<-predict(abundance_model$gam,type="response")

prsd_month <-data.frame(as.numeric(residuals(abundance_model$gam,type="working")) + fv[,4])
names(prsd_month)<-"Residuals"
prsd_month$Time<-metadata$month
prsd_month$Variable<-"month"
prsd_month$predicted<-predict(abundance_model$gam,type="response")

prsd_age <- data.frame(as.numeric(residuals(abundance_model$gam,type="working")) + fv[,5])
names(prsd_age)<-"Residuals"
prsd_age$Time<-metadata$AgeAtSamplingYears
prsd_age$Variable<-"AgeAtSamplingYears"
prsd_age$predicted<-predict(abundance_model$gam,type="response")

resids_df<-rbind(prsd_time, prsd_month, prsd_age)
resids_df$est<-resids_df$predicted+resids_df$Residuals

## rearrange level order 
resids_df$Variable<-factor(resids_df$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))
confint_df_load$Variable<-factor(confint_df_load$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))

#### make y axis the real abunance values, not estimates

#10^summary(abundance_model$gam)$p.coeff[1] # mean

confint_df_load$est<-confint_df_load$est+summary(abundance_model$gam)$p.coeff[1] 
confint_df_load$lower<-confint_df_load$lower+summary(abundance_model$gam)$p.coeff[1] 
confint_df_load$upper<-confint_df_load$upper+summary(abundance_model$gam)$p.coeff[1] 

confint_df_load$est<-10^confint_df_load$est
confint_df_load$lower<-10^confint_df_load$lower 
confint_df_load$upper<-10^confint_df_load$upper

resids_df$est<-10^resids_df$est


resids_load<-resids_df

####


P1_resids<-ggplot(subset(confint_df_load, Variable == "HoursAfterSunrise"), aes(x = Time, y = est))+
  theme_light(base_size = 14)+
  geom_point(data = subset(resids_load, Variable =="HoursAfterSunrise"),alpha = 0.5, aes(fill=Time), col = "white", pch = 21, size = 2)+
#  geom_line( size = 3,aes(col = Time))+
  geom_line( size = 1,col = "black", alpha = 0.7)+
  scale_color_viridis_c(option = "plasma", direction= -1)+
  scale_fill_viridis_c(option = "plasma", direction= -1)+
  geom_ribbon(aes(ymin=lower, ymax=upper), linetype=2, alpha=0.3)+
  theme( strip.background = element_blank(),
    strip.text.x = element_blank())+
  ylab("Bacterial abundance")+
  xlab("Hours after sunrise")+

  scale_x_continuous(breaks = int_breaks)+
  theme(legend.position = "none")+
  coord_cartesian(ylim = c(1000,1000000))+
  theme( panel.grid = element_blank(),
    panel.background = element_blank(), axis.line = element_line(colour = "black"))+
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
    labels = trans_format("log10", math_format(10^.x)))+
  geom_vline(xintercept = c(as.numeric(abundance_model$gam$smooth[[1]]$xp)), col = "darkgrey", linetype = "dashed")


P2_resids<-ggplot(subset(confint_df_load, Variable == "month"), aes(x = Time, y = est))+
  theme_light(base_size = 14)+
  geom_jitter(data = subset(resids_load, Variable =="month"),alpha = 0.5, aes(fill=Time), col = "white", pch = 21, size = 2, width = 0.15)+
  
  geom_line( size = 1,col = "black", alpha = 0.7)+
  scale_color_gradientn(colours = c( "seagreen","seagreen","seagreen","seagreen","khaki","khaki","khaki","khaki","khaki","seagreen",  "seagreen"))+
  scale_fill_gradientn(colours = c( "seagreen","seagreen","seagreen","seagreen","khaki","khaki","khaki","khaki","khaki","seagreen",  "seagreen"))+
  geom_ribbon(aes(ymin=lower, ymax=upper), linetype=2, alpha=0.3)+
  theme( strip.background = element_blank(),
    strip.text.x = element_blank())+
  ylab("Model estimate")+
  xlab("Month")+
  coord_cartesian(ylim = c(1000,1000000))+
  theme(axis.title.y = element_blank(), 
    axis.text.y = element_blank())+
  scale_x_continuous(breaks = int_breaks)+
  theme(legend.position = "none")+
  # ylim(-2,1)+
  theme( panel.grid = element_blank(),
    panel.background = element_blank(), axis.line = element_line(colour = "black"))+
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
    labels = trans_format("log10", math_format(10^.x)))+
  geom_vline(xintercept = c(as.numeric(abundance_model$gam$smooth[[2]]$xp)), col = "darkgrey", linetype = "dashed")


P3_resids<-ggplot(subset(confint_df_load, Variable == "AgeAtSamplingYears"), aes(x = Time, y = est))+
  theme_light(base_size = 14)+
  geom_point(data = subset(resids_load, Variable =="AgeAtSamplingYears"),alpha = 0.5, aes(fill=Time), col = "white", pch = 21, size = 2)+
  geom_line( size = 1,col = "black", alpha = 0.7)+
  scale_color_gradientn(colours = c( "blue","blue","darkgray","darkgray","darkgray","red", "red", "red", "red", "red", "red"))+
  scale_fill_gradientn(colours = c( "blue","blue","darkgray","darkgray","darkgray","red", "red", "red", "red", "red", "red"))+
  geom_ribbon(aes(ymin=lower, ymax=upper), linetype=2, alpha=0.3)+
  theme( strip.background = element_blank(),
    strip.text.x = element_blank())+
  ylab("Model estimate")+
  xlab("Age (years)")+
  coord_cartesian(ylim = c(1000,1000000))+
  theme(axis.title.y = element_blank(), 
    axis.text.y = element_blank())+
  scale_x_continuous(breaks = int_breaks)+
  theme(legend.position = "none")+
  # ylim(-2,1)+
  theme( panel.grid = element_blank(),
    panel.background = element_blank(), axis.line = element_line(colour = "black"))+
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
    labels = trans_format("log10", math_format(10^.x)))+
  geom_vline(xintercept = c(as.numeric(abundance_model$gam$smooth[[3]]$xp)), col = "darkgrey", linetype = "dashed")

Fig2abc<- ggarrange(P1_resids, P2_resids, P3_resids, ncol = 3, align = "h", widths = c(1.2,1,1))

This chunk took 0.02 minutes

Partial R^2 of terms

  • Remove variables and examine drop in R^2.
# just methods 

abundance_model_methods <- mgcv::gamm(log10(TotalAbundance)~
    s(Seq_depth_nospikein, bs = "cr")+
    Storage+
    Seq_run+
    HoursTilFrozen,
  correlation = corARMA(form = ~ 1|Year, p = 3),
  data=metadata,
  family = gaussian)

paste("The partial R2 for methods is", summary(abundance_model_methods$gam)$r.sq)
## [1] "The partial R2 for methods is 0.313956729949413"
## no methods
abundance_model_nomethods <- mgcv::gamm(log10(TotalAbundance)~
    s(HoursAfterSunrise, bs = "cr")+
    s(month, bs = "cc")+
    s(AgeAtSamplingYears, bs = "cr")+
   #    s(Seq_depth_nospikein, bs = "cr")+
    #Storage+
  #  Seq_run+
   # HoursTilFrozen,
    s(IndividID, bs = "re")+
   s(GroupAtSampling, bs = "re"),
  correlation = corARMA(form = ~ 1|Year, p = 3),
  data=metadata,
  family = gaussian)



### minus time of day

load_r2_time<-mgcv::gamm(log10(TotalAbundance)~
   # s(HoursAfterSunrise, bs = "cr")+
    s(month, bs = "cc")+
    s(AgeAtSamplingYears, bs = "cr")+
    s(Seq_depth_nospikein, bs = "cr")+
    Storage+
    Seq_run+
    HoursTilFrozen+
    s(IndividID, bs = "re")+
    s(GroupAtSampling, bs = "re"),
  correlation = corARMA(form = ~ 1|Year, p = 3),
  data=metadata,
  family = gaussian)


paste("The partial R2 for time of day is", summary(abundance_model$gam)$r.sq - summary(load_r2_time$gam)$r.sq)
## [1] "The partial R2 for time of day is 0.123427692985502"
# minus month

load_r2_month<-mgcv::gamm(log10(TotalAbundance)~
    s(HoursAfterSunrise, bs = "cr")+
  #  s(month, bs = "cc")+
    s(AgeAtSamplingYears, bs = "cr")+
    s(Seq_depth_nospikein, bs = "cr")+
    Storage+
    Seq_run+
    HoursTilFrozen+
    s(IndividID, bs = "re")+
    s(GroupAtSampling, bs = "re"),
  correlation = corARMA(form = ~ 1|Year, p = 3),
  data=metadata,
  family = gaussian)


paste("The partial R2 for month is", summary(abundance_model$gam)$r.sq - summary(load_r2_month$gam)$r.sq)
## [1] "The partial R2 for month is 0.0106767169116505"
# minus age

load_r2_age<-mgcv::gamm(log10(TotalAbundance)~
    s(HoursAfterSunrise, bs = "cr")+
   s(month, bs = "cc")+
   # s(AgeAtSamplingYears, bs = "cr")+
    s(Seq_depth_nospikein, bs = "cr")+
    Storage+
    Seq_run+
    HoursTilFrozen+
    s(IndividID, bs = "re")+
    s(GroupAtSampling, bs = "re"),
  correlation = corARMA(form = ~ 1|Year, p = 3),
  data=metadata,
  family = gaussian)


paste("The partial R2 for age is", summary(abundance_model$gam)$r.sq - summary(load_r2_age$gam)$r.sq)
## [1] "The partial R2 for age is 0.00964358646132946"
## id and social group
abundance_model_ID <- mgcv::gamm(log10(TotalAbundance)~
    s(HoursAfterSunrise, bs = "cr")+
    s(month, bs = "cc")+
    s(AgeAtSamplingYears, bs = "cr")+
       s(Seq_depth_nospikein, bs = "cr")+
    Storage+
    Seq_run+
    HoursTilFrozen,
  #  s(IndividID, bs = "re")+
  #  s(GroupAtSampling, bs = "re"),
  correlation = corARMA(form = ~ 1|Year, p = 3),
  data=metadata,
  family = gaussian)


paste("The partial R2 for meerkat ID is", summary(abundance_model$gam)$r.sq - summary(abundance_model_ID$gam)$r.sq)
## [1] "The partial R2 for meerkat ID is 0.0211040082314377"

This chunk took 6.15 minutes

Sensitivity analyses for bacterial load (Fig. S10)

  • Select 20 samples per hour and repeat model to test for effects of unequal sampling distribution.
metadata$TimeInterval <- cut(metadata$HoursAfterSunrise, breaks = c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))

table(metadata$TimeInterval)
## 
##   (0,1]   (1,2]   (2,3]   (3,4]   (4,5]   (5,6]   (6,7]   (7,8]   (8,9]  (9,10] 
##      35     145     204     203     109      42       4       6      20      57 
## (10,11] (11,12] (12,13] (13,14] (14,15] 
##      71      77      77      55       4
# remove intervals with less than 20 samples

newdata<-subset(metadata, TimeInterval != "(6,7]" & TimeInterval != "(7,8]" & TimeInterval != "(14,15]" )
table(newdata$TimeInterval)
## 
##   (0,1]   (1,2]   (2,3]   (3,4]   (4,5]   (5,6]   (6,7]   (7,8]   (8,9]  (9,10] 
##      35     145     204     203     109      42       0       0      20      57 
## (10,11] (11,12] (12,13] (13,14] (14,15] 
##      71      77      77      55       0
newdata$TimeInterval<-factor(newdata$TimeInterval)

# sample 20 rows per time inetrval
set.seed(999)
newdata1<-newdata%>% group_by(TimeInterval) %>% sample_n(20)

## repeat model


reduced_model_load<-mgcv::gamm(log10(TotalAbundance)~
    s(HoursAfterSunrise, bs = "cr")+
    s(month, bs = "cc")+
    s(AgeAtSamplingYears, bs = "cr")+
    s(Seq_depth_nospikein, bs = "cr")+
    Storage+
    Seq_run+
    HoursTilFrozen,
  correlation = corARMA(form = ~ 1|Year, p = 3),
  data=newdata1,
  family = gaussian)



figS10a<-draw(reduced_model_load$gam, select = c(1:3), ncol = 3, scales = "fixed") 
figS10a<-annotate_figure(figS10a, top = text_grob("a) 20 samples per hour", size = 16, face = "bold"))
figS10b<-draw(abundance_model_nomethods$gam, scales = "fixed", ncol = 3, select = c(1:3)) 
figS10b<-annotate_figure(figS10b, top = text_grob("b) Methods excluded", size = 16, face = "bold"))


ggarrange(figS10a, figS10b, ncol = 1)

This chunk took 0.06 minutes

Model validation

  • Create test data (70% of data) and training data (30%) and calculate R^2.
  • Do this 100 times and get average R^2.
# Test predictive power of model
# Create the training and test datasets

rsq_list<-list()

for(i in 1:100){
  # Step 1: Get row numbers for the training data
  trainRowNumbers <- createDataPartition(metadata$Storage, p=0.7, list=FALSE)
  
  # Step 2: Create the training  dataset
  trainData <- metadata[trainRowNumbers,]
  dim(trainData)
  
  # Step 3: Create the test dataset
  testData <- metadata[-trainRowNumbers,]
  dim(testData)
  
  # new model of train data
  
abundance_model_train <- mgcv::gamm(log10(TotalAbundance)~
    s(HoursAfterSunrise, bs = "cr")+
    s(month, bs = "cc")+
    s(AgeAtSamplingYears, bs = "cr")+
    s(Seq_depth_nospikein, bs = "cr")+
    Storage+
    Seq_run+
    HoursTilFrozen+
    s(IndividID, bs = "re")+
    s(GroupAtSampling, bs = "re"),
  correlation = corARMA(form = ~ 1|Year, p = 3),
  data=trainData,
  family = gaussian)
  
  
  # Predict on testData
  testData$predicted <- mgcv::predict.gam(abundance_model_train$gam, testData)
  ggplot(testData, aes(x = predicted, y = log10(TotalAbundance)))+geom_point()+geom_smooth(method = "lm")
  rsq_list[[i]]<-summary(lm(log10(TotalAbundance) ~ predicted, data =testData))$r.sq
  
}

abundance_rsqs<-do.call(c, rsq_list)

summary(abundance_rsqs)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3229  0.4021  0.4366  0.4340  0.4603  0.5246
# R2 distribution of 100 test models

# black line is mean for test datasets
# red line is true R2 from full dataset

ggplot()+aes(abundance_rsqs)+ 
  geom_histogram(fill = "grey", col = "black")+
  theme_light(base_size = 10)+
  xlab("R^2 distribution for test dataset - bacterial load")+
  ylab("Frequency")+
  geom_vline(xintercept = mean(abundance_rsqs), size = 2, linetype = "dashed")+
  geom_vline(xintercept = summary(abundance_model$gam)$r.sq , size = 2, linetype = "dashed", col = "red")

# one sided t-test 
t.test(abundance_rsqs, mu = summary(abundance_model$gam)$r.sq, alternative = "two.sided")
## 
##  One Sample t-test
## 
## data:  abundance_rsqs
## t = -8.5314, df = 99, p-value = 1.722e-13
## alternative hypothesis: true mean is not equal to 0.4662944
## 95 percent confidence interval:
##  0.4265305 0.4415368
## sample estimates:
## mean of x 
## 0.4340336

This chunk took 93.35 minutes

Alpha diversity

We model observed ASV richness against time of day, month, and meerkat age using Generalized Additive Mixed Models whilst accounting for methodological variables (storage method, sampling depth, and sequencing run) as fixed linear effects, individual ID and social group as random effects, and including a autoregression term to account for temporal autocorrelation (Wood 2017).

Estimating alpha diversity

  • Estimate ASV richness and Shannon diversity
  • Scale variables around the mean and log10 transform if necessary.
# estimate alpha diversity
sample_data(meerkat_scaled)$Observed<-phyloseq::estimate_richness(meerkat_scaled, measures="Observed")
sample_data(meerkat_scaled)$Observed<-sample_data(meerkat_scaled)$Observed$Observed

sample_data(meerkat_scaled)$Shannon<-phyloseq::estimate_richness(meerkat_scaled, measures="Shannon")
sample_data(meerkat_scaled)$Shannon<-sample_data(meerkat_scaled)$Shannon$Shannon

## extract metadata as dataframe
metadata<-data.frame(sample_data(meerkat_scaled))

alpha_histograms<-ggarrange(ggplot(metadata, aes(x = Observed))+geom_histogram(fill = "grey", col = "black", boundary = 0) + ggtitle("ASV richness"),
  ggplot(metadata, aes(x = Shannon))+geom_histogram(fill = "grey", col = "black", boundary = 0) + ggtitle("Shannon"))

annotate_figure(alpha_histograms, top = text_grob("Distribution of alpha diversity measures", size = 16, face = "bold"))

# scale variables around the mean

metadata$Seq_depth_nospikein<-as.numeric(scale(metadata$Seq_depth_nospikein))
metadata$HoursTilFrozen<-as.numeric(scale(metadata$HoursTilFrozen))

This chunk took 0.29 minutes

Fit GAMM to observed ASV richness

m_alpha <- mgcv::gamm(Observed~
    s(HoursAfterSunrise, bs = "cr")+
    s(month, bs = "cc")+
    s(AgeAtSamplingYears, bs = "cr")+
    s(Seq_depth_nospikein, bs = "cr")+
    Storage+
    Seq_run+
    HoursTilFrozen+
    s(IndividID, bs = "re")+
    s(GroupAtSampling, bs = "re"),
  correlation = corARMA(form = ~ 1|Year, p = 3),
  data=metadata)


sjPlot::tab_model(m_alpha$gam,  show.stat = TRUE, show.se = TRUE, title = "Table S3: GAMM predicting observed ASV richness. Note that the estimate for smooth terms is actually EDF (estimated degrees of freedom), it is not an 'estimate'. EDF reflects the degree of non-linearity of a curve (1 = linear)",
                   dv.labels = "Observed ASV richness",
                   pred.labels = c("Intercept", "Storage (frozen)", "Seq. run (2)", "Seq run (3)", "Seq. run (4)", "Time in field", "Hours after sunrise (non-linear)", "Month (non-linear)", "Age in years (non-linear)", "Seq. depth (non-linear)", "Meerkat ID (random)", "Social group (random)"))
Table S3: GAMM predicting observed ASV richness. Note that the estimate for smooth terms is actually EDF (estimated degrees of freedom), it is not an ‘estimate’. EDF reflects the degree of non-linearity of a curve (1 = linear)
  Observed ASV richness
Predictors Estimates std. Error CI Statistic p
Intercept 153.23 7.27 138.97 – 167.49 21.08 <0.001
Storage (frozen) -8.29 6.59 -21.22 – 4.65 -1.26 0.209
Seq. run (2) 32.70 8.42 16.16 – 49.23 3.88 <0.001
Seq run (3) 25.18 8.64 8.22 – 42.14 2.91 0.004
Seq. run (4) 119.10 19.12 81.59 – 156.62 6.23 <0.001
Time in field 1.79 4.82 -7.67 – 11.25 0.37 0.711
Hours after sunrise (non-linear) 3.60 20.80 <0.001
Month (non-linear) 2.92 3.39 <0.001
Age in years (non-linear) 2.68 3.76 0.072
Seq. depth (non-linear) 3.73 11.03 <0.001
Meerkat ID (random) 54.32 0.33 0.001
Social group (random) 0.00 0.00 0.616
Observations 1109
R2 0.298
draw(m_alpha$gam, select = c(1:3), ncol = 3, scales = "fixed")

This chunk took 1.93 minutes

Check model

model_check_alpha<-gratia::appraise(m_alpha$gam)

annotate_figure(model_check_alpha, top = text_grob("Model check of observed ASV richness")) # check model

This chunk took 0.03 minutes

Plot effects for each temporal variable with partial residuals

  • Extract confidence intervals for each term usinging gratia’s confint() function.
  • Extract residuals using ‘working’ residuals
  • rbind into a dataframe.
# generate table of model estimates
confint_df_time<-confint(m_alpha$gam, parm = "s(HoursAfterSunrise)",  type = "confidence")
confint_df_season<-confint(m_alpha$gam, parm = "s(month)",  type = "confidence")
confint_df_age<-confint(m_alpha$gam, parm = "s(AgeAtSamplingYears)",  type = "confidence")

confint_df_time$Variable<-"HoursAfterSunrise"
names(confint_df_time)[3]<-"Time"
confint_df_season$Variable<-"month"
names(confint_df_season)[3]<-"Time"
confint_df_age$Variable<-"AgeAtSamplingYears"
names(confint_df_age)[3]<-"Time"

confint_df_alpha<-rbind(confint_df_time, confint_df_season, confint_df_age)
confint_df_alpha$Variable<-factor(confint_df_alpha$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))

### make y axis the real values, not estimates, by adding the model intercept (mean)
confint_df_alpha$est<-confint_df_alpha$est+summary(m_alpha$gam)$p.coeff[1] 
confint_df_alpha$lower<-confint_df_alpha$lower+summary(m_alpha$gam)$p.coeff[1] 
confint_df_alpha$upper<-confint_df_alpha$upper+summary(m_alpha$gam)$p.coeff[1] 

## parital residuals

fv <- predict(m_alpha$gam,type="terms") ## get term estimates

prsd_time <- data.frame(as.numeric(residuals(m_alpha$gam,type="working")) + fv[,3])
names(prsd_time)<-"Residuals"
prsd_time$Time<-metadata$HoursAfterSunrise
prsd_time$Variable<-"HoursAfterSunrise"
prsd_time$predicted<-predict(m_alpha$gam,type="response")

prsd_month <-data.frame(as.numeric(residuals(m_alpha$gam,type="working")) + fv[,4])
names(prsd_month)<-"Residuals"
prsd_month$Time<-metadata$month
prsd_month$Variable<-"month"
prsd_month$predicted<-predict(m_alpha$gam,type="response")


prsd_age <- data.frame(as.numeric(residuals(m_alpha$gam,type="working")) + fv[,5])
names(prsd_age)<-"Residuals"
prsd_age$Time<-metadata$AgeAtSamplingYears
prsd_age$Variable<-"AgeAtSamplingYears"
prsd_age$predicted<-predict(m_alpha$gam,type="response")


resids_df<-rbind(prsd_time, prsd_month, prsd_age)
resids_df$est<-resids_df$predicted+resids_df$Residuals

## rearrange level order 
resids_df$Variable<-factor(resids_df$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))

resids_alpha<-resids_df

This chunk took 0.02 minutes

  • Plot Figure 2b-d
  • Add lines where knots are placed
alpha1_resids<-ggplot(subset(confint_df_alpha, Variable == "HoursAfterSunrise"), aes(x = Time, y = est))+
  theme_light(base_size = 14)+
  geom_point(data = subset(resids_alpha, Variable =="HoursAfterSunrise"),alpha = 0.5, aes(fill=Time), col = "white", pch = 21, size = 2)+
  geom_line( size = 1, col = "black", alpha = 0.7)+
  geom_ribbon(aes(ymin=lower, ymax=upper), linetype=2, alpha=0.3)+
  theme( strip.background = element_blank(),
    strip.text.x = element_blank())+
  ylab("Observed ASV richness")+
  xlab("Hours after sunrise")+
  ylim(0,400)+
  scale_x_continuous(breaks = int_breaks)+
  theme(legend.position = "none")+
  scale_color_viridis_c(option = "plasma", direction= -1)+
  scale_fill_viridis_c(option = "plasma", direction= -1)+
  theme( panel.grid = element_blank(),
    panel.background = element_blank(), axis.line = element_line(colour = "black"))+
    geom_vline(xintercept = c(as.numeric(m_alpha$gam$smooth[[1]]$xp)), col = "darkgrey", linetype = "dashed")


alpha2_resids<-ggplot(subset(confint_df_alpha, Variable == "month"), aes(x = Time, y = est))+
  theme_light(base_size = 14)+
  geom_jitter(data = subset(resids_alpha, Variable =="month"),alpha = 0.5, aes(fill=Time), col = "white", pch = 21, size = 2, width = 0.15)+
  geom_line( size = 1, col = "black", alpha = 0.7)+
  geom_ribbon(aes(ymin=lower, ymax=upper), linetype=2, alpha=0.3)+
  ylim(0,400)+
  xlab("Month")+
  theme( strip.background = element_blank(),
    strip.text.x = element_blank())+
  theme( axis.title.y= element_blank(), axis.text.y = element_blank())+
  scale_x_continuous(breaks = int_breaks)+
  theme(legend.position = "none")+
  scale_colour_gradientn(colours = c( "seagreen","seagreen","seagreen","seagreen","khaki","khaki","khaki","khaki","khaki","seagreen",  "seagreen"))+
  scale_fill_gradientn(colours = c( "seagreen","seagreen","seagreen","seagreen","khaki","khaki","khaki","khaki","khaki","seagreen",  "seagreen"))+
  theme( panel.grid = element_blank(),
    panel.background = element_blank(), axis.line = element_line(colour = "black"))+
      geom_vline(xintercept = c(as.numeric(m_alpha$gam$smooth[[2]]$xp)), col = "darkgrey", linetype = "dashed")



alpha3_resids<-ggplot(subset(confint_df_alpha, Variable == "AgeAtSamplingYears"), aes(x = Time, y = est))+
  theme_light(base_size = 14)+
  geom_point(data = subset(resids_alpha, Variable =="AgeAtSamplingYears"),alpha = 0.5, aes(fill=Time), col = "white", pch = 21, size = 2)+
  geom_line( size = 1, col = "black", alpha = 0.7)+
  geom_ribbon(aes(ymin=lower, ymax=upper), linetype=2, alpha=0.3)+
  ylim(0,400)+
  xlab("Age (years)")+
  theme( strip.background = element_blank(),
    strip.text.x = element_blank())+
  theme(axis.title.y= element_blank(), axis.text.y = element_blank())+
  scale_x_continuous(breaks = int_breaks)+
  theme(legend.position = "none")+
  scale_colour_gradientn(colours = c( "blue","blue","darkgray","darkgray","darkgray","darkgray", "red", "red", "red", "red", "red", "red"))+
  scale_fill_gradientn(colours = c( "blue","blue","darkgray","darkgray","darkgray","darkgray", "red", "red", "red", "red", "red", "red"))+
  theme( panel.grid = element_blank(),
    panel.background = element_blank(), axis.line = element_line(colour = "black"))+
      geom_vline(xintercept = c(as.numeric(m_alpha$gam$smooth[[3]]$xp)), col = "darkgrey", linetype = "dashed")



Fig2def<-ggarrange(alpha1_resids, alpha2_resids, alpha3_resids, ncol = 3, widths = c(1.2,1,1))

This chunk took 0 minutes

Partial R^2 of terms

  • Remove variables and examine drop in R^2.
## methods

partial_r2_methods <- mgcv::gamm(Observed~
    s(Seq_depth_nospikein, bs = "cr")+
    Storage+
    Seq_run+
    HoursTilFrozen,
  correlation = corARMA(form = ~ 1|Year, p = 3),
  data=metadata)

paste("The partial R2 for methods is", summary(partial_r2_methods$gam)$r.sq)
## [1] "The partial R2 for methods is 0.166001049590537"
## no methods
partial_r2_nomethods <- mgcv::gamm(Observed~
    s(HoursAfterSunrise, bs = "cr")+
    s(month, bs = "cc")+
    s(AgeAtSamplingYears, bs = "cr")+
    s(IndividID, bs = "re")+
   s(GroupAtSampling, bs = "re"),
  correlation = corARMA(form = ~ 1|Year, p = 3),
  data=metadata,
  family = gaussian)


## Partial R2 time of day

partial_r2_time <- mgcv::gamm(Observed~
    s(month, bs = "cc")+
    s(AgeAtSamplingYears, bs = "cr")+
    s(Seq_depth_nospikein, bs = "cr")+
    Storage+
    Seq_run+
    HoursTilFrozen+
    s(IndividID, bs = "re")+
    s(GroupAtSampling, bs = "re"),
  correlation = corARMA(form = ~ 1|Year, p = 3),
  data=metadata)

paste("The partial R2 for time of day is", summary(m_alpha$gam)$r.sq - summary(partial_r2_time$gam)$r.sq)
## [1] "The partial R2 for time of day is 0.0581281837543095"
## Partial R2 month

partial_r2_month <- mgcv::gamm(Observed~
    s(HoursAfterSunrise, bs = "cr")+
    s(AgeAtSamplingYears, bs = "cr")+
    s(Seq_depth_nospikein, bs = "cr")+
    Storage+
    Seq_run+
    HoursTilFrozen+
    s(IndividID, bs = "re")+
    s(GroupAtSampling, bs = "re"),
  correlation = corARMA(form = ~ 1|Year, p = 3),
  data=metadata)

paste("The partial R2 for month is", summary(m_alpha$gam)$r.sq - summary(partial_r2_month$gam)$r.sq)
## [1] "The partial R2 for month is 0.0310567707985488"
## Partial R2 age

partial_r2_age <- mgcv::gamm(Observed~
    s(HoursAfterSunrise, bs = "cr")+
    s(month, bs = "cc")+
    s(Seq_depth_nospikein, bs = "cr")+
    Storage+
    Seq_run+
    HoursTilFrozen+
    s(IndividID, bs = "re")+
    s(GroupAtSampling, bs = "re"),
  correlation = corARMA(form = ~ 1|Year, p = 3),
  data=metadata)

paste("The partial R2 for age is", summary(m_alpha$gam)$r.sq - summary(partial_r2_age$gam)$r.sq)
## [1] "The partial R2 for age is 0.00704094084021467"
## id and social group

partial_r2_ID <- mgcv::gamm(Observed~
    s(HoursAfterSunrise, bs = "cr")+
    s(month, bs = "cc")+
    s(AgeAtSamplingYears, bs = "cr")+
    s(Seq_depth_nospikein, bs = "cr")+
    Storage+
    Seq_run+
    HoursTilFrozen,
  correlation = corARMA(form = ~ 1|Year, p = 3),
  data=metadata)

paste("The partial R2 for ID and social group is", summary(m_alpha$gam)$r.sq - summary(partial_r2_ID$gam)$r.sq)
## [1] "The partial R2 for ID and social group is 0.044227390218747"

This chunk took 5.87 minutes

Sensitivity analyses for alpha diversity (Fig. S11)

  • Select 20 samples per hour and repeat model to test for effects of unequal sampling distribution.
metadata$TimeInterval <- cut(metadata$HoursAfterSunrise, breaks = c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))

table(metadata$TimeInterval)
## 
##   (0,1]   (1,2]   (2,3]   (3,4]   (4,5]   (5,6]   (6,7]   (7,8]   (8,9]  (9,10] 
##      35     145     204     203     109      42       4       6      20      57 
## (10,11] (11,12] (12,13] (13,14] (14,15] 
##      71      77      77      55       4
unique(metadata$TimeInterval)
##  [1] (1,2]   (13,14] (2,3]   (5,6]   (3,4]   (10,11] (4,5]   (11,12] (12,13]
## [10] (0,1]   (9,10]  (8,9]   (14,15] (7,8]   (6,7]  
## 15 Levels: (0,1] (1,2] (2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] ... (14,15]
newdata<-subset(metadata, TimeInterval != "(6,7]" & TimeInterval != "(7,8]" & TimeInterval != "(14,15]" )
table(newdata$TimeInterval)
## 
##   (0,1]   (1,2]   (2,3]   (3,4]   (4,5]   (5,6]   (6,7]   (7,8]   (8,9]  (9,10] 
##      35     145     204     203     109      42       0       0      20      57 
## (10,11] (11,12] (12,13] (13,14] (14,15] 
##      71      77      77      55       0
newdata$TimeInterval<-factor(newdata$TimeInterval)

set.seed(999)
newdata1<-newdata%>% group_by(TimeInterval) %>% sample_n(20)

## repeat model

dim(newdata1)
## [1] 240  30
reduced_model_alpha<-mgcv::gamm(Observed~
    s(HoursAfterSunrise, bs = "cr")+
    s(month, bs = "cc")+
    s(AgeAtSamplingYears, bs = "cr")+
    s(Seq_depth_nospikein, bs = "cr")+
    Storage+
    Seq_run+
    HoursTilFrozen,
  correlation = corARMA(form = ~ 1|Year, p = 3),
  data=newdata1,
  family = gaussian)

figS11a<-draw(reduced_model_alpha$gam, select = c(1:3), ncol = 3, scales = "fixed") # Fig S11a
figS11a<-annotate_figure(figS11a, top = text_grob("a) 20 samples per hour", size = 14, face = "bold"))

figS11b<-draw(partial_r2_nomethods$gam, scales = "fixed", ncol = 3, select = c(1:3)) # Fig S11b
figS11b<-annotate_figure(figS11b, top = text_grob("b) Methods excluded", size = 14, face = "bold"))


ggarrange(figS11a, figS11b, ncol = 1)

This chunk took 0.05 minutes

Model validation

  • Create test data (70% of data) and training data (30%) and calculate R^2.
  • Do this 100 times and get average.
# Test predictive power of model
# Create the training and test datasets

rsq_list<-list()

for(i in 1:100){
  
  # Step 1: Get row numbers for the training data
  trainRowNumbers <- createDataPartition(metadata$Storage, p=0.7, list=FALSE)
  
  # Step 2: Create the training  dataset
  trainData <- metadata[trainRowNumbers,]
  
  # Step 3: Create the test dataset
  testData <- metadata[-trainRowNumbers,]
  dim(testData)
  
  m_alpha_train <- mgcv::gamm(Observed~
    s(HoursAfterSunrise, bs = "cr")+
    s(month, bs = "cc")+
    s(AgeAtSamplingYears, bs = "cr")+
    s(Seq_depth_nospikein, bs = "cr")+
    Storage+
    Seq_run+
    HoursTilFrozen+
    s(IndividID, bs = "re")+
    s(GroupAtSampling, bs = "re"),
  correlation = corAR1(), #residual correlation structure
  data=trainData)
  
  
  # Predict on testData
  testData$predicted <- mgcv::predict.gam(m_alpha_train$gam, testData)
  ggplot(testData, aes(x = predicted, y = Observed))+geom_point()+geom_smooth(method = "lm")
  rsq_list[[i]]<-summary(lm(Observed ~ predicted, data =testData))$r.sq
  
}

alpha_rsqs<-do.call(c, rsq_list)

summary(alpha_rsqs)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1343  0.1983  0.2383  0.2318  0.2658  0.3019
# black line is mean R2 for test data
# red line is true R2 from all data

ggplot()+aes(alpha_rsqs)+ 
  geom_histogram(fill = "grey", col = "black")+
  theme_light(base_size = 10)+
  xlab("R square distribution")+
  ylab("Frequency")+
  geom_vline(xintercept = mean(alpha_rsqs), size = 2, linetype = "dashed")+
  geom_vline(xintercept = summary(m_alpha$gam)$r.sq , size = 2, linetype = "dashed", col = "red")

# one sided t-test 
t.test(alpha_rsqs, mu = summary(m_alpha$gam)$r.sq, alternative = "two.sided")
## 
##  One Sample t-test
## 
## data:  alpha_rsqs
## t = -15.881, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0.2977478
## 95 percent confidence interval:
##  0.2235439 0.2400265
## sample estimates:
## mean of x 
## 0.2317852

This chunk took 124.73 minutes

Fitting GAMM to Shannon diversity

m_shannon <- mgcv::gamm(Shannon~
    s(HoursAfterSunrise, bs = "cr")+
    s(month, bs = "cc")+
    s(AgeAtSamplingYears, bs = "cr")+
    s(Seq_depth_nospikein, bs = "cr")+
    Storage+
    Seq_run+
    HoursTilFrozen+
    s(IndividID, bs = "re")+
    s(GroupAtSampling, bs = "re"),
  correlation = corARMA(form = ~ 1|Year, p = 3),
  data=metadata)

## model summary

sjPlot::tab_model(m_shannon$gam,  show.stat = TRUE, show.se = TRUE, title = "Extra table: GAMM predicting Shannon diversity. Note that the estimate for smooth terms is actually EDF (estimated degrees of freedom), it is not an 'estimate'. EDF reflects the degree of non-linearity of a curve (1 = linear)",
                   dv.labels = "Shannon diversity",
                   pred.labels = c("Intercept", "Storage (frozen)", "Seq. run (2)", "Seq run (3)", "Seq. run (4)", "Time in field", "Hours after sunrise (non-linear)", "Month (non-linear)", "Age in years (non-linear)", "Seq. depth (non-linear)", "Meerkat ID (random)", "Social group (random)"))
Extra table: GAMM predicting Shannon diversity. Note that the estimate for smooth terms is actually EDF (estimated degrees of freedom), it is not an ‘estimate’. EDF reflects the degree of non-linearity of a curve (1 = linear)
  Shannon diversity
Predictors Estimates std. Error CI Statistic p
Intercept 3.23 0.06 3.11 – 3.35 52.39 <0.001
Storage (frozen) -0.31 0.06 -0.43 – -0.20 -5.40 <0.001
Seq. run (2) 0.32 0.07 0.17 – 0.46 4.32 <0.001
Seq run (3) 0.31 0.08 0.16 – 0.46 4.09 <0.001
Seq. run (4) 0.83 0.15 0.53 – 1.13 5.45 <0.001
Time in field 0.03 0.04 -0.05 – 0.11 0.76 0.446
Hours after sunrise (non-linear) 2.82 4.59 0.009
Month (non-linear) 1.37 0.40 0.077
Age in years (non-linear) 3.46 6.50 <0.001
Seq. depth (non-linear) 1.00 21.32 <0.001
Meerkat ID (random) 23.02 0.12 0.067
Social group (random) 0.00 0.00 0.911
Observations 1109
R2 0.152
# model estimates (plot)
shannon_effects<-draw(m_shannon$gam, select = c(1:3), ncol = 3, scales = "fixed")
annotate_figure(shannon_effects, top = text_grob("GAMM estimates for Shannon diversity", size = 14, face = "bold")) # check model

This chunk took 2.1 minutes

Beta diversity

We next explore shifts in beta diversity across the three temporal scales by ordinating taxa composition using a Multi-Dimensional Scaling (MDS) using Weighted Unifrac distance. We also present results for unweighted unifrac, and results of PERMANOVAS.

Ordinate with Weighted Unifrac

  • Filter dataset to only ASVs with over 50 reads
  • Ordinate and check relative contribution of axes
scaled_core<-microbiome::core(meerkat_scaled, detection = 50, prevalence = 0)

paste("This filtering step removed", 1-round(sum(sample_sums(scaled_core))/sum(sample_sums(meerkat_scaled)), 3), "% of reads")
## [1] "This filtering step removed 0.002 % of reads"

This chunk took 0.02 minutes

ord_wunifrac_mds<-ordinate(scaled_core, "MDS", "wunifrac")
ord_unifrac_mds<-ordinate(scaled_core, "MDS", "unifrac")

dist_wunifrac<-distance(scaled_core, method="wunifrac")
dist_unifrac<-distance(scaled_core, method="unifrac")

This chunk took 29.19 minutes

Axis contributions to variation

qplot(y = ord_wunifrac_mds$values$Relative_eig[1:10], x = factor(1:10),  geom="col", ylab = "Relative eingenvalue", xlab = "Axis",  main = "Scree plot")+ ggtitle("Weighted Unifrac axis contributions")+ylim(0,0.65)+
qplot(y = ord_unifrac_mds$values$Relative_eig[1:10], x = factor(1:10),  geom="col", ylab = "Relative eingenvalue", xlab = "Axis",  main = "Scree plot")+ ggtitle("Unweighted Unifrac axis contributions")+ylim(0,0.65)

This chunk took 0.01 minutes

Plotting beta diversity

  • Add MDS scores to sample metadata.
  • Split continous variables into categories for plotting.
  • For each plot, generate centroid (‘cent’) and segement (‘segs’) dataframes.
### ADD COLUMNS TO METADATA

sample_data(meerkat_scaled)$MDS1<-data.frame(ord_wunifrac_mds$vectors)$Axis.1
sample_data(meerkat_scaled)$MDS2<-data.frame(ord_wunifrac_mds$vectors)$Axis.2
sample_data(meerkat_scaled)$MDS3<-data.frame(ord_wunifrac_mds$vectors)$Axis.3
sample_data(meerkat_scaled)$MDS4<-data.frame(ord_wunifrac_mds$vectors)$Axis.4


## unweighted unifrac

sample_data(meerkat_scaled)$MDS1_UU<-data.frame(ord_unifrac_mds$vectors)$Axis.1
sample_data(meerkat_scaled)$MDS2_UU<-data.frame(ord_unifrac_mds$vectors)$Axis.2


### convert to dataframe 

metadata<-data.frame(sample_data(meerkat_scaled))


########

cent <- aggregate(cbind(MDS1, MDS2) ~ SampleTimeCat, data = metadata, FUN = mean)

beta1<-ggplot(metadata, aes(x = MDS1, y = MDS2)) +
  theme_bw(base_size = 14)+
  stat_ellipse(geom = "polygon",aes(col = SampleTimeCat), alpha = 0, size = 1)+
  geom_point(aes(fill = HoursAfterSunrise), pch = 21, size = 2, alpha = 0.8, col = "white") + # spiders
  #geom_point(aes(fill = HoursAfterSunrise), pch = 21, size = 1.5) + 
  scale_fill_viridis_c(option = "plasma", direction= -1)+
  scale_color_manual(values = c("darkmagenta", "darkgoldenrod1"))+
  geom_point(data = cent, size = 6, aes(col = SampleTimeCat), stroke = 1, alpha = 0.8) +  
  geom_point(data = cent, size = 6, col = "black", stroke = 1, pch = 21, alpha = 0.8) +  
  theme(legend.position = "none")+
  theme(plot.margin = unit(c(0.2,0.3,0.3,0.4), "cm"))+
  ylab("Axis 2 (17.1%)")+
  xlab("Axis 1 (25.8%)")+
    theme( panel.grid = element_blank())


#### season
cent <- aggregate(cbind(MDS1, MDS2) ~ Season, data = metadata, FUN = mean)


beta2<-ggplot(metadata, aes(x = MDS1, y = MDS2)) +
  theme_bw(base_size = 14)+
    stat_ellipse(geom = "polygon",aes(col = Season), alpha = 0, size = 1)+
 geom_point(aes(fill = month), pch = 21, size = 2, alpha = 0.8, col = "white") + 
  scale_fill_gradientn(colours = c( "seagreen","seagreen","seagreen","seagreen","khaki","khaki","khaki","khaki","khaki","seagreen",  "seagreen"))+
  # geom_point(aes(fill = Season), pch = 21, size = 1.5) + 
#  scale_fill_manual(values = c( "seagreen","khaki" ))+
  scale_color_manual(values = c( "seagreen","khaki"))+
 geom_point(data = cent, size = 6, aes(col = Season), stroke = 1, alpha = 0.8) +  
  geom_point(data = cent, size = 6, col = "black", stroke = 1, pch = 21, alpha = 0.7) +   
  theme(axis.text.y = element_blank(), axis.title.y = element_blank())+
  theme(legend.position = "none")+
   xlab("Axis 1 (25.8%)")+
    theme( panel.grid = element_blank())


#### age
cent <- aggregate(cbind(MDS1, MDS2) ~ AgeCat, data = metadata, FUN = mean)


beta3<-ggplot(metadata, aes(x = MDS1, y = MDS2)) +
  theme_bw(base_size = 14)+
      stat_ellipse(geom = "polygon",aes(col = AgeCat), alpha = 0, size = 1)+
 geom_point(aes(fill = AgeAtSamplingYears),  pch = 21, size = 2, alpha = 0.8, col = "white") + # spiders
  #geom_point(aes(fill = AgeCat), pch = 21, size = 1.5) + 
  scale_fill_gradientn(colours = c( "blue","blue","darkgray","darkgray","darkgray","darkgray", "red", "red", "red", "red", "red", "red"))+
  scale_color_manual(values = c( "red", "blue", "darkgrey"))+
 geom_point(data = cent, size = 6, aes(col = AgeCat), stroke = 1, alpha = 0.8) +  
  geom_point(data = cent, size = 6, col = "black", stroke = 1, pch = 21, alpha = 0.7) +   
  theme(axis.text.y = element_blank(), axis.title.y = element_blank())+
  theme(legend.position = "none")+
   xlab("Axis 1 (25.8%)")+
    theme( panel.grid = element_blank())


Fig2ghi<-ggarrange(beta1, beta2, beta3, ncol = 3, widths = c(1.2,1,1))

This chunk took 0.12 minutes

Unweighted unifrac

cent <- aggregate(cbind(MDS1_UU, MDS2_UU) ~ SampleTimeCat, data = metadata, FUN = mean)

beta1_UU<-ggplot(metadata, aes(x = MDS1_UU, y = MDS2_UU)) +
  theme_bw(base_size = 12)+
  stat_ellipse(geom = "polygon",aes(col = SampleTimeCat), alpha = 0, size = 1)+
  geom_point(aes(fill = HoursAfterSunrise), pch = 21, size = 2, alpha = 0.8, col = "white") + # spiders
  #geom_point(aes(fill = HoursAfterSunrise), pch = 21, size = 1.5) + 
  scale_fill_viridis_c(option = "plasma", direction= -1)+
  scale_color_manual(values = c("darkmagenta", "darkgoldenrod1"))+
  geom_point(data = cent, size = 6, aes(col = SampleTimeCat), stroke = 1, alpha = 0.8) +  
  geom_point(data = cent, size = 6, col = "black", stroke = 1, pch = 21, alpha = 0.8) +  
  #theme(legend.position = "none")+
  theme(plot.margin = unit(c(0.2,0.3,0.3,0.4), "cm"))+
  ylab("Axis 2 (9%)")+
  xlab("Axis 1 (16%)")+
    theme( panel.grid = element_blank())


#### season
cent <- aggregate(cbind(MDS1_UU, MDS2_UU) ~ Season, data = metadata, FUN = mean)


beta2_UU<-ggplot(metadata, aes(x = MDS1_UU, y = MDS2_UU)) +
  theme_bw(base_size = 12)+
    stat_ellipse(geom = "polygon",aes(col = Season), alpha = 0, size = 1)+
 geom_point(aes(fill = month), pch = 21, size = 2, alpha = 0.8, col = "white") + 
  scale_fill_gradientn(colours = c( "seagreen","seagreen","seagreen","seagreen","khaki","khaki","khaki","khaki","khaki","seagreen",  "seagreen"))+
  scale_color_manual(values = c( "seagreen","khaki"))+
 geom_point(data = cent, size = 6, aes(col = Season), stroke = 1, alpha = 0.8) +  
  geom_point(data = cent, size = 6, col = "black", stroke = 1, pch = 21, alpha = 0.7) +   
  theme(axis.text.y = element_blank(), axis.title.y = element_blank())+
 # theme(legend.position = "none")+
   xlab("Axis 1 (16%)")+
    theme( panel.grid = element_blank())


#### age
cent <- aggregate(cbind(MDS1_UU, MDS2_UU) ~ AgeCat, data = metadata, FUN = mean)


beta3_UU<-ggplot(metadata, aes(x = MDS1_UU, y = MDS2_UU)) +
  theme_bw(base_size = 12)+
      stat_ellipse(geom = "polygon",aes(col = AgeCat), alpha = 0, size = 1)+
 geom_point(aes(fill = AgeAtSamplingYears),  pch = 21, size = 2, alpha = 0.8, col = "white") + # spiders
  scale_fill_gradientn(colours = c( "blue","blue","darkgray","darkgray","darkgray","darkgray", "red", "red", "red", "red", "red", "red"))+
  scale_color_manual(values = c( "red", "blue", "darkgrey"))+
 geom_point(data = cent, size = 6, aes(col = AgeCat), stroke = 1, alpha = 0.8) +  
  geom_point(data = cent, size = 6, col = "black", stroke = 1, pch = 21, alpha = 0.7) +   
  theme(axis.text.y = element_blank(), axis.title.y = element_blank())+
  #theme(legend.position = "none")+
   xlab("Axis 1 (16%)")+
    theme( panel.grid = element_blank())

ggarrange(beta1_UU, beta2_UU, beta3_UU, ncol = 3, widths = c(1.2,1,1))

This chunk took 0.03 minutes

Marginal PERMANOVAS

metadata<-data.frame(sample_data(scaled_core))
dim(metadata)
## [1] 1109   29
perm_wunifrac_marginal<-adonis2(dist_wunifrac~
    Storage+
    Seq_run+
    HoursTilFrozen+
    Seq_depth_nospikein + 
    month+
    AgeAtSamplingYears+
    HoursAfterSunrise+
      GroupAtSampling+
      IndividID, 
  data = metadata,
  by = "margin")


perm_unifrac_marginal<-adonis2(dist_unifrac~
    Storage+
    Seq_run+
    HoursTilFrozen+
    Seq_depth_nospikein + 
    month+
    AgeAtSamplingYears+
    HoursAfterSunrise+
      GroupAtSampling+
      IndividID, 
  data = metadata,
  by = "margin")

This chunk took 233.6 minutes

Table S4: Beta diversity permanovas

# weighted unifrac
perm_wunifrac_marginal
# unweighted unifrac
perm_unifrac_marginal

This chunk took 0.01 minutes

Figure 2: Temporal dynamics of load, alpha diversity, and beta diversity

ggarrange(Fig2abc,Fig2def,Fig2ghi ,ncol = 1)

#ggsave("FIGURES/figure2.pdf",width = 9, height = 9, units = c("in"))

This chunk took 0.18 minutes

Figure 3: Identifying genera driving temporal shifts in beta diversity

# replace unknown genera with eg 'unknown Lachnospiraceae genus'
taxtable<-data.frame(scaled_core@tax_table@.Data)
taxtable$Family<-as.character(taxtable$Family)
taxtable$Genus<-as.character(taxtable$Genus)

taxtable$Genus<-ifelse(is.na(taxtable$Genus) , paste("Unknown", taxtable$Family,  sep = " "), taxtable$Genus)

taxtable$Genus<-ifelse(taxtable$Genus== "uncultured" , paste("Unknown", taxtable$Family,  sep = " "), taxtable$Genus)
taxtable$Genus<-ifelse(taxtable$Genus== "uncultured bacterium" , paste("Unknown", taxtable$Family, sep = " "), taxtable$Genus)
row.names(taxtable)<-taxa_names(scaled_core)
taxtable<-as.matrix(taxtable)
tax_table(scaled_core)<-taxtable

#scaled_core<-tip_glom(scaled_core, h = 0.2)
scaled_core_genus<-speedyseq::tax_glom(scaled_core, taxrank = "Genus")

#### remove some unknown or strange genera

scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "Unknown uncultured bacterium")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "uncultured")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "uncultured organism")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "uncultured bacterium")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "Unknown uncultured")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "gut metagenome")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "uncultured Chloroflexi bacterium")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "metagenome")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "uncultured Firmicutes bacterium")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "uncultured rumen bacterium")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "uncultured soil bacterium")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "Unknown Unknown Family")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus !=  "BCf9-17 termite group")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus !=  "Blattella germanica (German cockroach)")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus !=  "CPla-4 termite group" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus !=  "DSSF69" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus !=   "E1B-B3-114"  )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus !=   "Family XIII AD3011 group"  )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus !=   "FFCH7168" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus !=   "GCA-900066225"   )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus !=   "hydrothermal vent metagenome"   )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus !=   "Incertae Sedis"   )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus !=   "M2PB4-61 termite group" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus !=   "OLB13"  )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus !=  "OM27 clade"  )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus !=  "Allorhizobium-Neorhizobium-Pararhizobium-Rhizobium"  )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus !=  "OM27 clade"  )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "CHKCI001"  )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "CHKCI001"  )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "Termite planctomycete cluster" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "uncultured Lachnospiraceae bacterium")
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "Unknown gir-aah93h0" )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus !=  "Unknown JG30-KF-CM45"  )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "Unknown Rs-E47 termite group"   )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus != "UBA1819"   )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus !=  "ZOR0006"   )
scaled_core_genus<-subset_taxa(scaled_core_genus, Genus !=  "GCA-900066575" )

taxtable<-data.frame(scaled_core_genus@tax_table@.Data)$Genus

taxa_names(scaled_core_genus)<-taxtable

#taxa_names(scaled_core_genus)=="Unknown Bacillaceae"

This chunk took 0.11 minutes

dominant_phylo<-scaled_core_genus
sample_data(dominant_phylo)<-sample_data(dominant_phylo)[,"feature.id"]

# melt data
psmelt_df<-speedyseq::psmelt(dominant_phylo)
head(psmelt_df)
dominant_taxa_df<-psmelt_df %>% 
  group_by(feature.id) %>% 
  arrange(feature.id, -Abundance) %>% 
  dplyr::mutate(rank=rank(-Abundance))

dominant_taxa_df<-subset(dominant_taxa_df, rank ==1)
dominant_taxa_df$Genus<-factor(dominant_taxa_df$Genus)

# add to metadata 
sample_data(scaled_core_genus)$DomGenus <-vlookup(sample_data(scaled_core_genus)$feature.id, dominant_taxa_df, lookup_column = "Sample", result_column = "Genus")

## colour by focus genera (we justify this list later)
# but for now want the colours to match later figures of the focus genera

top_genera<-c( "Clostridium sensu stricto 1" ,
"Blautia" ,
"Unknown Lachnospiraceae",
 "Enterococcus"  ,
"[Ruminococcus] torques group",
"Bacteroides",  
"Paeniclostridium",
"Romboutsia" ,
"Lachnoclostridium",  
"Lactococcus"  ,
"Peptococcus",
"Helicobacter" ,
 "Unknown Bacillaceae",
 "Raoultibacter"  ,
 "Cellulomonas", 
"Eubacterium"  ,
"Other" )

sample_data(scaled_core_genus)$DomGenus_plot<-ifelse(sample_data(scaled_core_genus)$DomGenus %in% top_genera, as.character(sample_data(scaled_core_genus)$DomGenus), "Other")
sample_data(scaled_core_genus)$DomGenus_plot<-factor(sample_data(scaled_core_genus)$DomGenus_plot, levels =top_genera)

This chunk took 0.02 minutes

## colours scale

mypaly<-brewer.pal(12,"Paired")

mypaly[16]<-"darkgreen"
mypaly[5]<-"red"
mypaly[8]<-"darkorange"
mypaly[7]<-"darkblue"
mypaly[9]<-"magenta"
mypaly[10]<-"cyan"
mypaly[11]<-"#ffe20b"
mypaly[6]<-"#ae0001"
mypaly[12]<-"#795548"
mypaly[13]<-"peachpuff"
mypaly[14]<-"blue"
mypaly[15]<-"mediumorchid"
mypaly[15]<-"#8B008B"
mypaly[4]<-"forestgreen"
mypaly[17]<-"grey"

mypaly<-mypaly[c(1,4,3,2,5:8, 10,9,11:17)]

This chunk took 0 minutes

#Merge by genus
meerkat_genus <- scaled_core_genus %>%
 speedyseq::tax_glom(taxrank = "Genus") %>%                     # agglomerate at phylum level
  transform_sample_counts(function(x) {x/sum(x)} ) %>% # Transform to rel. abundance
  psmelt() %>%                                         # Melt to long format
  arrange(Genus)                                      # Sort data frame alphabetically by family

head(meerkat_genus)
dim(meerkat_genus)
## [1] 648765     40
# generate column for time interval
meerkat_genus$TimeInterval<-cut(meerkat_genus$HoursAfterSunrise, breaks = c(0,0.5,1,1.5,2,2.5,3,3.5,4,4.5,5,5.5,6,6.5,7,7.5,8,8.5,9,9.5,10,10.5,11,11.5,12,12.5,13,13.5,14,14.5))

#head(meerkat_genus)
meerkat_genus$Genus_plot<-ifelse(meerkat_genus$Genus %in% top_genera == TRUE, as.character(meerkat_genus$Genus), "Other")


meerkat_genus$Genus_plot<-factor(meerkat_genus$Genus_plot, levels = top_genera)

# merge
merged_samples<-meerkat_genus %>% group_by(TimeInterval, Genus_plot) %>% dplyr::summarize(mean_rel_abundance = sum(Abundance))

# barplot by half an hour intervals

barplot_by_time<- ggplot( merged_samples, aes(x = TimeInterval, y = mean_rel_abundance, fill = Genus_plot)) + 
  geom_col( width = 1, position = "fill", col = "black") +
    scale_fill_manual(values = mypaly)+
  ylab("Relative abundance")+
  xlab("30 minute time intervals after sunrise")+
  theme_light(base_size = 14)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  theme(legend.position = "none")+
   theme(plot.margin=unit(c(0,0.2,0.2,0.2),"cm"))

sample_data(meerkat_scaled)$TimeInterval<- cut(sample_data(meerkat_scaled)$HoursAfterSunrise, breaks = c(0,0.5,1,1.5,2,2.5,3,3.5,4,4.5,5,5.5,6,6.5,7,7.5,8,8.5,9,9.5,10,10.5,11,11.5,12,12.5,13,13.5,14,14.5))

hist_plot<-data.frame(table(sample_data(meerkat_scaled)$TimeInterval))

histogram<-ggplot(hist_plot, aes(x = Var1, y = Freq))+
  geom_bar(stat = "identity")+
  geom_text(aes(label=Freq), position=position_dodge(width=0.9), vjust=-0.25)+
  theme_bw(base_size = 14)+
  theme(axis.text.x = element_blank())+
  theme(axis.title.x = element_blank())+
  ylab("No. samples")+
  ylim(0,150)+
   theme(plot.margin=unit(c(0.5,0.2,0,0.2),"cm"))

Fig3_2<-ggarrange(histogram, barplot_by_time, ncol = 1, heights = c(0.3,1), 
                  legend = FALSE, labels = c("a)"),   font.label = list(size = 20), hjust = 1)

This chunk took 2.1 minutes

Statistics using envfit()

  • Keep only relevant biological variables
  • Use envfit() to model centroids using continuous variables for time of day and age
  • Month has to be categorical (dry/wet) since it can’t have a linear relationship with microbiome
  • For plotting, use categorical variables so the direction of the effects are clear
  • Extract vector centroids
  • Keep only variables that are significant/have detectable effect sizes
  • Plot ordination with arrows pointing towards centroids of variables of interest

Axes 1 and 2 using continuous variables

metadata<-data.frame(sample_data(scaled_core_genus))

vectors_df<-ord_wunifrac_mds$vectors

## model using continuous variables for time of day and age (to get partial R2 for each)

# axes 1 and 2 using continuous variables
(fit_continuous <- envfit(vectors_df, metadata[,c("HoursAfterSunrise",  "AgeAtSamplingYears", "Season",  "Storage",  "Seq_run", "Seq_depth_nospikein")], perm = 999, na.rm = TRUE,  choices=c(1,2)))
## 
## ***VECTORS
## 
##                       Axis.1   Axis.2     r2 Pr(>r)    
## HoursAfterSunrise    0.83127  0.55586 0.2695  0.001 ***
## AgeAtSamplingYears  -0.95336  0.30184 0.0012  0.482    
## Seq_depth_nospikein -0.81101 -0.58503 0.0495  0.001 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
## 
## ***FACTORS:
## 
## Centroids:
##                     Axis.1  Axis.2
## SeasonDRY WINTER   -0.0031 -0.0139
## SeasonWET SUMMER    0.0019  0.0084
## StorageFREEZEDRIED -0.0059 -0.0233
## StorageFROZEN       0.0083  0.0328
## Seq_runRUN1        -0.0258  0.0199
## Seq_runRUN2        -0.0014 -0.0123
## Seq_runRUN3         0.0247 -0.0067
## Seq_runRUN4         0.0135 -0.0019
## 
## Goodness of fit:
##             r2 Pr(>r)    
## Season  0.0040  0.015 *  
## Storage 0.0261  0.001 ***
## Seq_run 0.0188  0.001 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999

This chunk took 0.11 minutes

Axes 3 and 4 using continuous variables

# axes 3 and 4 nusing continuous variables

(fit_continuous <- envfit(vectors_df, metadata[,c("HoursAfterSunrise",  "AgeAtSamplingYears", "Season",  "Storage",  "Seq_run", "Seq_depth_nospikein")], perm = 999, na.rm = TRUE,  choices=c(3,4)))
## 
## ***VECTORS
## 
##                       Axis.3   Axis.4     r2 Pr(>r)    
## HoursAfterSunrise   -0.99910  0.04235 0.0029  0.222    
## AgeAtSamplingYears  -0.91956 -0.39295 0.0428  0.001 ***
## Seq_depth_nospikein  0.95102  0.30914 0.0061  0.022 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
## 
## ***FACTORS:
## 
## Centroids:
##                     Axis.3  Axis.4
## SeasonDRY WINTER    0.0118  0.0051
## SeasonWET SUMMER   -0.0071 -0.0031
## StorageFREEZEDRIED  0.0112 -0.0076
## StorageFROZEN      -0.0157  0.0107
## Seq_runRUN1        -0.0280 -0.0061
## Seq_runRUN2         0.0236 -0.0085
## Seq_runRUN3        -0.0039  0.0108
## Seq_runRUN4         0.0299  0.0193
## 
## Goodness of fit:
##             r2 Pr(>r)    
## Season  0.0074  0.001 ***
## Storage 0.0190  0.001 ***
## Seq_run 0.0431  0.001 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999

This chunk took 0.06 minutes

  • For plots we use categorical variables for arrows since this is more intuitive
metadata<-metadata[,c("SampleTimeCat",  "AgeCat", "Season",  "Storage",  "Seq_run", "Seq_depth_nospikein")]

This chunk took 0 minutes

Axes 1 and 2 using categorical variables

fit <- envfit(vectors_df, metadata, perm = 999, na.rm = TRUE,  choices=c(1,2))
df<-data.frame(vectors_df)
df$Dom_taxa<-sample_data(scaled_core_genus)$DomGenus_plot
df$TotalAbundance<-sample_data(scaled_core_genus)$TotalAbundance

#Get the vectors for bioenv.fit
df_biofit<-vegan::scores(fit,display=c("factors"), choices=1:2)

## scaling factor for arrows to fill 30% of plot
mul <- ordiArrowMul(df_biofit, fill = 0.3)
df_biofit <- df_biofit * mul                    # Scale the biplot scores
df_biofit<-as.data.frame(df_biofit)


# remove methodological variables from biofit dataframe, 
#as I am just interested in biological variables after controling for methods
#head(df_biofit)


df_biofit<- df_biofit[c("SampleTimeCatAFTERNOON","SampleTimeCatMORNING"),]
row.names(df_biofit)<-c("Afternoon", "Morning")



### spider plot
cent <- aggregate(cbind(Axis.1, Axis.2) ~ Dom_taxa, data = df, FUN = mean)

segs <- merge(df, setNames(cent, c('Dom_taxa','oNMDS1','oNMDS2')),
  by = 'Dom_taxa', sort = FALSE)

df1<-df
df_biofit1<-df_biofit
cent1<-cent
segs1<-segs

# order taxa so that they at least slightly match later figures!
df1$Dom_taxa<-factor(df1$Dom_taxa,levels=top_genera)
segs1$Dom_taxa<-factor(segs1$Dom_taxa,levels=top_genera)


P1<-ggplot(df1, aes(x = Axis.1, y = Axis.2)) +
  theme_light(base_size = 14)+
 geom_point(aes(fill = Dom_taxa), size = 3, pch = 21, alpha = 0.8, col = "white") + # spiders
  scale_fill_manual(values = mypaly[c(-16)])+
  scale_color_manual(values = mypaly[c(-16)])+
  geom_jitter(data = cent1, size = 7, aes(fill = Dom_taxa), col = "black", pch = 21,  stroke = 1.5, width = 0.02, height = 0.02) +  
  geom_segment(data=df_biofit1, aes(x = 0, y = 0, xend = Axis.1*0.7, yend = Axis.2*0.7),
    arrow = arrow(length = unit(0.5, "cm"), type = "closed"), lwd = 1,  col = "black")+
  geom_label(data=df_biofit1,  
    alpha = 0.9, col = "black", size = 4, fill = "yellow", 
    hjust = c(0,1), 
    vjust = c(1,1),
    aes(Axis.1*0.7,Axis.2*0.7, label = rownames(df_biofit1)))+
  xlab("Axis 1 (25.8%)")+
  ylab("Axis 2 (17.1%)")+
  guides(fill = guide_legend(override.aes = list(size = 6), ncol = 5))+
  theme(legend.title = element_blank())+
  annotate("text", x = 0, y = 0, label = "X", size = 7)+
  theme(plot.margin = margin(0.2,0.2,1,0.2, "cm"))+
   annotate(geom="text", x=-0.3, y=-0.27, label="Hours after sunrise: R^2 = 0.27, p < 0.001",
             color="black", size = 5, hjust = 0)+
  xlim(-0.3,0.3)

This chunk took 0.04 minutes

Axes 3 and 4 using categorical variables

metadata<-data.frame(sample_data(scaled_core_genus))

metadata<-metadata[,c("SampleTimeCat",  "AgeCat", "Season", "Storage",  "Seq_run", "Seq_depth_nospikein")]

vectors_df<-ord_wunifrac_mds$vectors

# Model variables contributing to axis 3 and 4

fit <- envfit(vectors_df, metadata, perm = 999, na.rm = TRUE,  choices=c(3,4))# axes 3 and 4
df<-data.frame(vectors_df)
df$Dom_taxa<-sample_data(scaled_core_genus)$DomGenus_plot
df$TotalAbundance<-sample_data(scaled_core_genus)$TotalAbundance

#Get the vectors for bioenv.fit
df_biofit<-vegan::scores(fit,display=c("factors"))

## scaling factor for arrows to fill 80% of plot
mul <- ordiArrowMul(df_biofit, fill = 0.3)
df_biofit <- df_biofit * mul                    # Scale the biplot scores
df_biofit<-as.data.frame(df_biofit)

# remove methodological variables from biofit dataframe, 
#as I am just interested in biological variables after controling for methods
df_biofit<- df_biofit[c("AgeCatMIDDLEAGED", "AgeCatOLD", "AgeCatYOUNG", "SeasonDRY", "SeasonWET"),]
row.names(df_biofit)<-c("Adult", "Old", "Young", "Dry season", "Wet season" )

### spider plot
cent <- aggregate(cbind(Axis.3, Axis.4) ~ Dom_taxa, data = df, FUN = mean)

segs <- merge(df, setNames(cent, c('Dom_taxa','oNMDS1','oNMDS2')),
  by = 'Dom_taxa', sort = FALSE)

df2<-df
df_biofit2<-df_biofit
cent2<-cent
segs2<-segs


# order taxa so that they at least slightly match later figures!
df2$Dom_taxa<-factor(df2$Dom_taxa,levels=top_genera)
segs2$Dom_taxa<-factor(segs2$Dom_taxa,levels=top_genera)


P2<-ggplot(df2, aes(x = Axis.3, y = Axis.4)) +
  theme_light(base_size = 14)+
  geom_point(aes(fill = Dom_taxa), size = 3, pch = 21, alpha = 0.8, col = "white") + # spiders
   scale_fill_manual(values = mypaly[c(-16)])+
  scale_color_manual(values = mypaly[c(-16)])+
   annotate("text", x = 0, y = 0, label = "X", size = 6)+
 geom_point(data = cent2, size = 7, aes(fill = Dom_taxa), col = "black", pch = 21,  stroke = 1.5) +  
  geom_segment(data=df_biofit2, aes(x = 0, y = 0, xend = Axis.3*0.5, yend = Axis.4*0.5),
    arrow = arrow(length = unit(0.5, "cm"), type = "closed"), lwd = 0.5,  col = "black")+
  geom_label(data=as.data.frame(df_biofit2), #position=position_jitter(width=0.01, height = 0.01),#labels
    alpha = 0.9, col = "black", size = 4, fill = "yellow", 
    vjust = c(-0.2, 1, 1, -0.5, 1.5),
    hjust = c(1,0,0,0.5,0.7),
    aes(Axis.3*0.5,Axis.4*0.5, label = rownames(df_biofit2)))+
  xlab("Axis 3 (11.7%)")+
  ylab("Axis 4 (7.1%)")+
  guides(fill = guide_legend(override.aes = list(size = 7), ncol = 5))+
  theme(legend.title = element_blank())+
  theme(plot.margin = margin(0.2,0.2,1,0.2, "cm"))+
  annotate(geom="text", x=-0.3, y=-0.19, label="Age: R^2 = 0.043, p < 0.001 \nSeason: R^2 = 0.007, p = 0.003",
             color="black", size = 5, hjust = 0)




Fig3_1<-ggarrange(P1,P2, ncol = 2, nrow = 1, common.legend = T, legend = "bottom", labels = c("b)","c)"),   
                  font.label = list(size = 20))


ggarrange(Fig3_2,Fig3_1, ncol = 1, common.legend = T, heights = c(1,1.3))+
  theme(plot.margin = margin(0.1,0.1,0.1,1, "cm")) 

#ggsave("FIGURES/Figure3.pdf", width = 11, height = 12.5, units = "in")

This chunk took 0.06 minutes

Differential abundance analysis for all genera with > 15% prevalence

Differential abundance analysis for morning/afternoon

scaled_core_genus_0.2_original<- microbiome::core(scaled_core_genus, detection = 0, prevalence =0.16) 

scaled_core_genus_0.2<-scaled_core_genus_0.2_original

sample_data(scaled_core_genus_0.2)<-sample_data(scaled_core_genus_0.2)[,c("feature.id", "SampleTimeCat")]

table(sample_data(scaled_core_genus)$SampleTimeCat)
## 
## AFTERNOON   MORNING 
##       366       743
genus0.2_df<-speedyseq::psmelt(scaled_core_genus_0.2)

head(genus0.2_df)
## non-parametric

models_by_genus <- genus0.2_df %>%
  group_by(OTU) %>%
  do(model1 = tidy(wilcox.test(log10(Abundance+1)~ SampleTimeCat, data = ., exact=FALSE, conf.int=TRUE, conf.level=0.95), conf.int=TRUE)) %>%   
  gather(model_name, model, -OTU) %>%                        
  unnest()

models_by_genus$p.adj<- p.adjust(models_by_genus$p.value, method = p.adjust.methods, n = length(models_by_genus$p.value))

models_by_genus$Sig<-models_by_genus$p.adj<0.05

models_by_genus_time<-models_by_genus


P_time<-ggplot(models_by_genus_time, aes(y = reorder(OTU, -estimate), x = estimate))+
    geom_vline(xintercept = 0, col = "black")+
  geom_point(aes(col = Sig), size = 3)+  
   geom_errorbarh(aes(xmin=conf.low, xmax=conf.high, height = 0, col = Sig), size = 1.5)+
  theme_bw(base_size = 16)+
  labs(color = "Significant")+
  theme(axis.title = element_blank())+
 theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))+
   theme(plot.margin=unit(c(2,0.2,0.2,0.2),"cm"))+
  xlim(c(min(models_by_genus_time$conf.low),1.3))+
  ggtitle("a) Morning <--> Afternoon")+
 theme(plot.title = element_text(hjust = 1, size = 14, face = "bold"))

This chunk took 0.21 minutes

Differential abundance analysis for dry/wet season

scaled_core_genus_0.2<-scaled_core_genus_0.2_original

sample_data(scaled_core_genus_0.2)<-sample_data(scaled_core_genus_0.2)[,c("feature.id", "Season")]
table(sample_data(scaled_core_genus)$Season)
## 
## DRY WINTER WET SUMMER 
##        418        691
genus0.2_df<-speedyseq::psmelt(scaled_core_genus_0.2)

head(genus0.2_df)
genus0.2_df$Season<-factor(genus0.2_df$Season, levels = c("WET SUMMER", "DRY WINTER"))



## non-parametric

models_by_genus <- genus0.2_df %>%
  group_by(OTU) %>%
  do(model1 = tidy(wilcox.test(log10(Abundance+1)~ Season, data = . , exact=TRUE, conf.int=TRUE, conf.level=0.95), conf.int=TRUE)) %>%   
  gather(model_name, model, -OTU) %>%                        
  unnest()

models_by_genus$p.adj<- p.adjust(models_by_genus$p.value, method = p.adjust.methods, n = length(models_by_genus$p.value))

models_by_genus$Sig<-models_by_genus$p.adj<0.05


models_by_genus_season<-models_by_genus

# positive values higher in wet season

P_season<-ggplot(models_by_genus_season, aes(y = reorder(OTU, -estimate), x = estimate))+
    geom_vline(xintercept = 0, col = "black")+
  geom_point(aes(col = Sig), size = 3)+  
 geom_errorbarh(aes(xmin=conf.low, xmax=conf.high, height = 0, col = Sig), size = 1.5)+
  theme_bw(base_size = 16)+
  theme(axis.title = element_blank())+
 theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))+
   theme(plot.margin=unit(c(2,0.2,0.2,0.2),"cm"))+
  xlim(c(min(models_by_genus_time$conf.low), max(models_by_genus_time$conf.high)))+
  ggtitle("b) Dry <--> Wet")+
 theme(plot.title = element_text(hjust = 1.2, size = 14, face = "bold"))

This chunk took 0.19 minutes

Differential abundance analysis for young/adult

scaled_core_genus_0.2<-scaled_core_genus_0.2_original

## make a comparison group 2-4 years old (to reduce unequal sample sizes)

sample_data(scaled_core_genus_0.2)$AgeCat <- as.factor(ifelse(
  sample_data(scaled_core_genus_0.2)$AgeAtSamplingYears >2 & sample_data(scaled_core_genus_0.2)$AgeAtSamplingYears <4, 
  "ADULT",  as.character(sample_data(scaled_core_genus_0.2)$AgeCat)))


table(sample_data(scaled_core_genus_0.2)$AgeCat)
## 
##      ADULT MIDDLEAGED        OLD      YOUNG 
##        236        391         97        385
###

sample_data(scaled_core_genus_0.2)<-sample_data(scaled_core_genus_0.2)[,c("feature.id", "AgeCat")]

genus0.2_df<-speedyseq::psmelt(scaled_core_genus_0.2)

head(genus0.2_df)
genus0.2_df<-subset(genus0.2_df, AgeCat != "MIDDLEAGED" & AgeCat != "OLD" )

genus0.2_df$AgeCat<-factor(genus0.2_df$AgeCat)


## non-parametric

models_by_genus <- genus0.2_df %>%
  group_by(OTU) %>%
  do(model1 = tidy(wilcox.test(log10(Abundance+1)~ AgeCat, data = ., exact=FALSE, conf.int=TRUE, conf.level=0.95 ), conf.int=TRUE)) %>%   
  gather(model_name, model, -OTU) %>%                        
  unnest()

models_by_genus$p.adj<- p.adjust(models_by_genus$p.value, method = p.adjust.methods, n = length(models_by_genus$p.value))

models_by_genus$Sig<-models_by_genus$p.adj<0.05


models_by_genus_young<-models_by_genus


## if positive, more prevalent in adults

P_young<-ggplot(models_by_genus_young, aes(y = reorder(OTU, -estimate), x = estimate))+
  geom_vline(xintercept = 0, col = "black")+
  geom_point(aes(col = Sig), size = 3)+  
  geom_errorbarh(aes(xmin=conf.low, xmax=conf.high, height = 0, col = Sig), size = 1.5)+
  theme_bw(base_size = 16)+
  theme(axis.title = element_blank())+
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))+
  theme(plot.margin=unit(c(2,0.2,0.2,0.2),"cm"))+
  xlim(c(min(models_by_genus_time$conf.low), 1.3))+
    ggtitle("c) Young <--> Adult")+
  theme(plot.title = element_text(hjust = 0.9, size = 14, face = "bold"))

This chunk took 0.11 minutes

Differential abundance analysis for adult/old

scaled_core_genus_0.2<-scaled_core_genus_0.2_original

## make a comparison group 2-4 years old (to reduce unequal sample sizes

sample_data(scaled_core_genus_0.2)$AgeCat <- as.factor(ifelse(
  sample_data(scaled_core_genus_0.2)$AgeAtSamplingYears >2 & sample_data(scaled_core_genus_0.2)$AgeAtSamplingYears <4, 
  "ADULT",  as.character(sample_data(scaled_core_genus_0.2)$AgeCat)))

table(sample_data(scaled_core_genus_0.2)$AgeCat)
## 
##      ADULT MIDDLEAGED        OLD      YOUNG 
##        236        391         97        385
###

sample_data(scaled_core_genus_0.2)<-sample_data(scaled_core_genus_0.2)[,c("feature.id", "AgeCat")]

genus0.2_df<-speedyseq::psmelt(scaled_core_genus_0.2)

genus0.2_df<-subset(genus0.2_df, AgeCat != "MIDDLEAGED" & AgeCat != "YOUNG" )

genus0.2_df$AgeCat<-factor(genus0.2_df$AgeCat, levels = c("OLD","ADULT"))

unique(genus0.2_df$AgeCat)
## [1] OLD   ADULT
## Levels: OLD ADULT
## non-parametric

models_by_genus <- genus0.2_df %>%
  group_by(OTU) %>%
  do(model1 = tidy(wilcox.test(log10(Abundance+1)~ AgeCat, data = ., exact=TRUE, conf.int=TRUE, conf.level=0.95 ), conf.int=TRUE)) %>%   
  gather(model_name, model, -OTU) %>%                        
  unnest()

models_by_genus$p.adj<- p.adjust(models_by_genus$p.value, method = p.adjust.methods, n = length(models_by_genus$p.value))

models_by_genus$Sig<-models_by_genus$p.adj<0.05


models_by_genus_old<-models_by_genus

P_old<-ggplot(models_by_genus_old, aes(y = reorder(OTU, -estimate), x = estimate))+
  geom_vline(xintercept = 0, col = "black")+
  geom_point(aes(col = Sig), size = 3)+  
  geom_errorbarh(aes(xmin=conf.low, xmax=conf.high, height = 0, col = Sig), size = 1.5)+
  theme_bw(base_size = 16)+
  theme(axis.title = element_blank())+
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))+
  theme(plot.margin=unit(c(2,0.2,0.2,0.2),"cm"))+
  xlim(c(min(models_by_genus_time$conf.low), max(models_by_genus_old$conf.high)))+
  ggtitle("d) Adult <--> Old")+
  theme(plot.title = element_text(hjust = 0.7, size = 14, face = "bold"))


ggarrange(P_time, P_season, P_young, P_old, ncol = 4, common.legend = T, legend = "right")

#ggsave("FIGURES/wilcox_test_estimates.pdf", width = 20, height = 22, units = "in")

This chunk took 0.11 minutes

Selecting 16 focus genera

This chunk took 0 minutes

Make table of genus characteristics

  • Calculate prevalence, abundance, and number of samples where that genus is dominant
# calculate prevalence per genus

genus_info_df<-data.frame(microbiome::prevalence(scaled_core_genus))
names(genus_info_df)<-"Prevalence"
genus_info_df$Genus<-row.names(genus_info_df)
genus_info_df<-genus_info_df[,c(2,1)]

### add mean abundance as column

genus_info_df$Abundance<-taxa_sums(scaled_core_genus)
genus_info_df$RelAbundance<-taxa_sums(scaled_core_genus)/sum(taxa_sums(scaled_core_genus))


####### add dominant taxa as column
dominant_phylo<-scaled_core_genus
sample_data(dominant_phylo)<-sample_data(dominant_phylo)[,"feature.id"]

# melt data
psmelt_df<-speedyseq::psmelt(dominant_phylo)
#head(psmelt_df)

dominant_taxa_df<-psmelt_df %>% 
  group_by(feature.id) %>% 
  arrange(feature.id, -Abundance) %>% 
  dplyr::mutate(rank=rank(-Abundance))

dominant_taxa_df<-subset(dominant_taxa_df, rank ==1)
dominant_taxa_df$Genus<-factor(dominant_taxa_df$Genus)

# add to metadata 
sample_data(scaled_core)$DomGenus <-vlookup(sample_data(scaled_core)$feature.id, dominant_taxa_df, lookup_column = "Sample", result_column = "Genus")

dominant_genus_freq<-data.frame(table(dominant_taxa_df$OTU))
names(dominant_genus_freq)<-c("Genus", "Dominant")

genus_info_df$Dominant<-vlookup(genus_info_df$Genus, dominant_genus_freq, lookup_column = "Genus", result_column = "Dominant")

genus_info_df<-genus_info_df[order(-genus_info_df$RelAbundance),]
#head(genus_info_df, 10) # ordered by relative abundance

This chunk took 0.01 minutes

  • Keep only genera with over 60% prevalence
top_genera_candidates<-subset(genus_info_df, Prevalence > 0.6)
top_genera_candidates<-top_genera_candidates[order(-top_genera_candidates$Abundance),]

DT::datatable(top_genera_candidates)
### based on differential abundance analysis, make a list of genera to  exclude

top_genera<-c( "Clostridium sensu stricto 1" ,
               "Blautia" ,
               "Unknown Lachnospiraceae",
               "Enterococcus"  ,
               "[Ruminococcus] torques group",
               "Bacteroides",  
               "Paeniclostridium",
               "Romboutsia" ,
               "Lachnoclostridium",  
               "Lactococcus"  ,
               "Peptococcus",
               "Helicobacter" ,
               "Unknown Bacillaceae",
               "Raoultibacter"  ,
               "Cellulomonas", 
               "Eubacterium"  ,
               "Other" )

top_genera
##  [1] "Clostridium sensu stricto 1"  "Blautia"                     
##  [3] "Unknown Lachnospiraceae"      "Enterococcus"                
##  [5] "[Ruminococcus] torques group" "Bacteroides"                 
##  [7] "Paeniclostridium"             "Romboutsia"                  
##  [9] "Lachnoclostridium"            "Lactococcus"                 
## [11] "Peptococcus"                  "Helicobacter"                
## [13] "Unknown Bacillaceae"          "Raoultibacter"               
## [15] "Cellulomonas"                 "Eubacterium"                 
## [17] "Other"
focus_taxa_df<-genus_info_df[top_genera[-17],]

focus_taxa_df$Prevalence<-round(focus_taxa_df$Prevalence, 3)
focus_taxa_df$RelAbundance<-round(focus_taxa_df$RelAbundance, 3)
row.names(focus_taxa_df)<-1:16

This chunk took 0 minutes

Prevalence and abundance of 16 focus genera

# info on 16 focus genera

DT::datatable(focus_taxa_df)

This chunk took 0 minutes

Figure 4: Temporal dynamics of 16 focus genera

core_genus <- prune_taxa(top_genera, scaled_core_genus)

paste("Core taxa make up",round(sum(taxa_sums(core_genus))/sum(taxa_sums(meerkat_scaled)),2), "% of reads")
## [1] "Core taxa make up 0.75 % of reads"

This chunk took 0.03 minutes

## make vector of genus names
taxonomy<-data.frame(core_genus@tax_table@.Data)
taxa_names(core_genus)<-taxonomy$Genus

## make vector of variables to include in model

variables_all<-c("HoursAfterSunrise", "month", "AgeAtSamplingYears", 
  "IndividID", "GroupAtSampling",
  "Storage", "Seq_depth_nospikein", "Seq_run", "HoursTilFrozen")


### loop to model each genus seperatly and extract estimates
taxanames<-as.character(taxa_names(core_genus))

This chunk took 0 minutes

base_genus_confint<-list()

for (i in 1:length(taxanames)){
  tryCatch({ #catch errors
    #print(i)
  
    taxa1<-prune_taxa(taxanames[i], core_genus)
   # taxa1<-prune_taxa("Clostridium sensu stricto 1"   , core_genus)
    sample_data(taxa1)$taxa_i_abundance<-sample_sums(taxa1)
    metadata<-data.frame(sample_data(taxa1))
    metadata<-metadata[,c(variables_all, "taxa_i_abundance", "Year")]
    
    #scale variables
    metadata$Seq_depth_nospikein<-as.numeric(scale(metadata$Seq_depth_nospikein))
    
    # fit gamm    
    m_taxa <- mgcv::gamm(log10(taxa_i_abundance+1)~
        s(HoursAfterSunrise, bs = "cr")+
        s(month, bs = "cc")+
        s(AgeAtSamplingYears, bs = "cr")+
        s(Seq_depth_nospikein, bs = "cr")+
       Storage+
        Seq_run+
        HoursTilFrozen+
        s(IndividID, bs = "re")+
        s(GroupAtSampling, bs = "re"),
      correlation = corARMA(form = ~ 1|Year, p = 3),
      data=metadata,
      family = gaussian)
    
    #draw(m_taxa$gam, select = c(1:3), ncol = 3, scales = "fixed")+geom_hline(yintercept = 0)
    
    confint_df_time<-confint(m_taxa$gam, parm = "s(HoursAfterSunrise)",  type = "confidence")
    confint_df_season<-confint(m_taxa$gam, parm = "s(month)",  type = "confidence")
    confint_df_age<-confint(m_taxa$gam, parm = "s(AgeAtSamplingYears)",  type = "confidence")
    
    confint_df_time$Taxa<-taxanames[i]
    confint_df_time$Variable<-"HoursAfterSunrise"
    names(confint_df_time)[3]<-"Time"
    confint_df_time$p.val<-data.frame(summary(m_taxa$gam)$s.table)$p.value[1]
    confint_df_season$Taxa<-taxanames[i]
    confint_df_season$Variable<-"month"
    names(confint_df_season)[3]<-"Time"
   confint_df_season$p.val<-data.frame(summary(m_taxa$gam)$s.table)$p.value[2]

    confint_df_age$Taxa<-taxanames[i]
    confint_df_age$Variable<-"AgeAtSamplingYears"
    names(confint_df_age)[3]<-"Time"
    confint_df_age$p.val<-data.frame(summary(m_taxa$gam)$s.table)$p.value[3]

    
    confint_df<-rbind(confint_df_time, confint_df_season, confint_df_age)
    confint_df$R2<-summary(m_taxa$gam)$r.sq
    
    # convert into real estimate by adding the intercapt (= mean abundance)
    
    confint_df$est_real<-confint_df$est+summary(m_taxa$gam)$p.coeff[1] 
    confint_df$lower_real<- confint_df$lower+summary(m_taxa$gam)$p.coeff[1] 
    confint_df$upper_real<- confint_df$upper+summary(m_taxa$gam)$p.coeff[1] 
    
    confint_df
    base_genus_confint[[i]]<-confint_df
    
  }, error=function(e){})
}

names(base_genus_confint)<-taxanames

base_genus_confint[1]
## $Helicobacter
## # A tibble: 600 x 15
##    smooth by_variable  Time    est    se  crit  lower upper Taxa  Variable
##    <chr>  <fct>       <dbl>  <dbl> <dbl> <dbl>  <dbl> <dbl> <chr> <chr>   
##  1 s(Hou~ <NA>        0.225 -0.353 0.267  1.96 -0.876 0.169 Heli~ HoursAf~
##  2 s(Hou~ <NA>        0.296 -0.332 0.256  1.96 -0.834 0.170 Heli~ HoursAf~
##  3 s(Hou~ <NA>        0.366 -0.310 0.246  1.96 -0.792 0.172 Heli~ HoursAf~
##  4 s(Hou~ <NA>        0.437 -0.288 0.236  1.96 -0.750 0.173 Heli~ HoursAf~
##  5 s(Hou~ <NA>        0.507 -0.267 0.226  1.96 -0.709 0.176 Heli~ HoursAf~
##  6 s(Hou~ <NA>        0.578 -0.245 0.216  1.96 -0.669 0.179 Heli~ HoursAf~
##  7 s(Hou~ <NA>        0.649 -0.224 0.207  1.96 -0.629 0.182 Heli~ HoursAf~
##  8 s(Hou~ <NA>        0.719 -0.202 0.198  1.96 -0.590 0.186 Heli~ HoursAf~
##  9 s(Hou~ <NA>        0.790 -0.180 0.190  1.96 -0.552 0.191 Heli~ HoursAf~
## 10 s(Hou~ <NA>        0.860 -0.159 0.181  1.96 -0.515 0.197 Heli~ HoursAf~
## # ... with 590 more rows, and 5 more variables: p.val <dbl>, R2 <dbl>,
## #   est_real <dbl>, lower_real <dbl>, upper_real <dbl>

This chunk took 29.15 minutes

confint_summary<-do.call(rbind, base_genus_confint)
confint_summary$Variable<-factor(confint_summary$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))
names(confint_summary)[which(colnames(confint_summary)=="p.val")]<-"p.val.all"
confint_summary$Significant<-confint_summary$p.val.all <0.05

## add abundance for plot

abundance_df<-data.frame(taxa_sums(core_genus))
names(abundance_df)<-"Abundance"
abundance_df$Genus<-row.names(abundance_df)
confint_summary$Abundance<-as.numeric(vlookup(confint_summary$Taxa, abundance_df, lookup_column = "Genus", result_column = "Abundance"))
confint_summary<-data.frame(confint_summary)
confint_summary<-dplyr::arrange(confint_summary, desc(Abundance))
confint_summary$Taxa<-factor(confint_summary$Taxa, levels = rev(top_genera[-17]))

This chunk took 0 minutes

taxanames<-as.character(taxa_names(core_genus))
base_genus_confint_storage<-list()

for (i in 1:length(taxanames)){
  tryCatch({ #catch errors
    #print(i)
    
    taxa1<-prune_taxa(taxanames[i], core_genus)
    # taxa1<-prune_taxa("Clostridium sensu stricto 1"   , core_genus)
    sample_data(taxa1)$taxa_i_abundance<-sample_sums(taxa1)
    metadata<-data.frame(sample_data(taxa1))
    metadata<-metadata[,c(variables_all, "taxa_i_abundance", "Year")]
    
    #scale variables
    metadata$Seq_depth_nospikein<-as.numeric(scale(metadata$Seq_depth_nospikein))
    
    # fit gamm    
    m_taxa <- mgcv::gamm(log10(taxa_i_abundance+1)~
                           s(HoursAfterSunrise, bs = "cr", by = Storage)+
                           s(month, bs = "cc",  by = Storage)+
                           s(AgeAtSamplingYears, bs = "cr",  by = Storage)+
                           s(Seq_depth_nospikein, bs = "cr",  by = Storage)+
                           Seq_run+
                           HoursTilFrozen+
                           s(IndividID, bs = "re")+
                           s(GroupAtSampling, bs = "re"),
                         correlation = corARMA(form = ~ 1|Year, p = 3),
                         data=metadata,
                         family = gaussian)
    
    summary(m_taxa$gam)
    
    #draw(m_taxa$gam, select = c(1:3), ncol = 3, scales = "fixed")+geom_hline(yintercept = 0)
    
    confint_df_time<-confint(m_taxa$gam, parm = "s(HoursAfterSunrise)",  type = "confidence")
    confint_df_season<-confint(m_taxa$gam, parm = "s(month)",  type = "confidence")
    confint_df_age<-confint(m_taxa$gam, parm = "s(AgeAtSamplingYears)",  type = "confidence")
    
    confint_df_time$Taxa<-taxanames[i]
    confint_df_time$Variable<-"HoursAfterSunrise"
    names(confint_df_time)[3]<-"Time"
    ### add p value
    confint_df_time$p.val<-NA
    confint_df_time$p.val[1:200]<-data.frame(summary(m_taxa$gam)$s.table)$p.value[1]
    confint_df_time$p.val[201:400]<-data.frame(summary(m_taxa$gam)$s.table)$p.value[2]
   
    
    confint_df_season$Taxa<-taxanames[i]
    confint_df_season$Variable<-"month"
    names(confint_df_season)[3]<-"Time"
    ### add p value
    confint_df_season$p.val<-NA
    confint_df_season$p.val[1:200]<-data.frame(summary(m_taxa$gam)$s.table)$p.value[3]
    confint_df_season$p.val[201:400]<-data.frame(summary(m_taxa$gam)$s.table)$p.value[4]
    
    confint_df_age$Taxa<-taxanames[i]
    confint_df_age$Variable<-"AgeAtSamplingYears"
    names(confint_df_age)[3]<-"Time"
    ### add p value
    confint_df_age$p.val<-NA
    confint_df_age$p.val[1:200]<-data.frame(summary(m_taxa$gam)$s.table)$p.value[5]
    confint_df_age$p.val[201:400]<-data.frame(summary(m_taxa$gam)$s.table)$p.value[6]
    
    
    confint_df<-rbind(confint_df_time, confint_df_season, confint_df_age)
    confint_df$R2<-summary(m_taxa$gam)$r.sq
    

   base_genus_confint_storage[[i]]<-confint_df
    
  }, error=function(e){})
}

names(base_genus_confint_storage)<-taxanames

#saveRDS(base_genus_confint_storage, "R_OUTPUT/base_genus_confint_storage.RDS")

This chunk took 51.95 minutes

confint_genus_storage<-do.call(rbind, base_genus_confint_storage)
confint_genus_storage$Variable<-factor(confint_genus_storage$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))
confint_genus_storage$Taxa<-factor(confint_genus_storage$Taxa, levels = top_genera[-17])
storage_model_summary<-distinct(confint_genus_storage, Storage, Taxa, Variable, .keep_all = T )

storage_model_frozen<-subset(storage_model_summary, Storage=="FROZEN")
storage_model_frozen<-storage_model_frozen[,c("Variable","Taxa","p.val")]
dim(confint_summary)
## [1] 9600   17
confint_summary<-merge(confint_summary, storage_model_frozen, by = c("Variable", "Taxa"))
names(confint_summary)[which(colnames(confint_summary)=="p.val")]<-"p.val_frozen"

storage_model_freezedried<-subset(storage_model_summary, Storage=="FREEZEDRIED")
storage_model_freezedried<-storage_model_freezedried[,c("Variable","Taxa","p.val")]
dim(confint_summary)
## [1] 9600   18
confint_summary<-merge(confint_summary, storage_model_freezedried, by = c("Variable", "Taxa"))
names(confint_summary)[which(colnames(confint_summary)=="p.val")]<-"p.val_freezedried"

# add column for whether trend is robust
confint_summary$Sig_robust<-ifelse(confint_summary$p.val_frozen < 0.05 & confint_summary$p.val_freezedried < 0.05, TRUE, FALSE)

This chunk took 0.01 minutes

Figure 4e-g: Back-transformed model estimates for genus-level abundance across temporal scales

  • Add bacterial load to figure
### rugdata
metadata<-data.frame(sample_data(core_genus))
rugdata<- data.frame(cbind(metadata$HoursAfterSunrise, metadata$month, metadata$AgeAtSamplingYears))
names(rugdata)<-c("HoursAfterSunrise", "month", "AgeAtSamplingYears")
rugdata<-gather(rugdata, Variable, Time, HoursAfterSunrise:AgeAtSamplingYears, factor_key=TRUE)
rugdata$est = 0
rugdata$Variable<-factor(rugdata$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))

##########

confint_summary_fig4b<-confint_summary[,c("Variable", "Taxa", "Time", "est_real","Sig_robust", "Abundance")]
confint_df_load_fig4b<-confint_df_load[,c("Variable", "Time", "est")]
confint_df_load_fig4b$Taxa <- "Bacterial load"
confint_df_load_fig4b$Sig_robust <- TRUE
confint_df_load_fig4b$Abundance <- 43124192 # same abundance of clostridium

confint_df_load_fig4b<-confint_df_load_fig4b[,c(1,4,2,3,5,6)]
names(confint_df_load_fig4b)<-c("Variable", "Taxa", "Time", "est_real","Sig_robust", "Abundance")
confint_df_load_fig4b$est_real<-log10(confint_df_load_fig4b$est_real)

confint_summary_fig4b<-rbind(confint_summary_fig4b,confint_df_load_fig4b )

confint_summary_fig4b$Taxa<-factor(confint_summary_fig4b$Taxa, levels = rev(c(top_genera[-17], "Bacterial load")))


Fig4def<-ggplot(confint_summary_fig4b, aes(x = Time, y = 10^est_real, group = Taxa))+
  theme_light(base_size = 14)+
 geom_line( aes(alpha = Sig_robust, col = Taxa, size = Abundance))+
  #geom_ribbon( aes(ymin = 10^lower_real, ymax = 10^upper_real, fill = Taxa, alpha = Sig_robust, size = 1))+
  scale_size(range = c(1, 1.5))+
  facet_wrap(~Variable, scales = "free_x")+
  geom_rug(data = rugdata, aes(x = Time, y = NULL), inherit.aes = F, size = 0.1)+
  scale_color_manual(values = rev(c(mypaly[-17], "darkgrey")),  labels = c(top_genera[-17], "Bacterial load") )+
  ylab("Estimated abundance")+
  scale_x_continuous(breaks = int_breaks)+
  theme(panel.grid = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line(colour = "black"))+
theme(panel.spacing = unit(1.2, "lines"))+
 theme(plot.margin=unit(c(0.2,0.2,0.2,0.6),"cm"))+
  theme(plot.title = element_text(size=10))+
  guides(size=FALSE, abundance = FALSE, alpha = FALSE)+
  guides(color = guide_legend(override.aes = list(size = 5)))+
  #theme(legend.position = "none")+
  theme(strip.background = element_blank(),
        strip.text.x = element_blank())+
  theme(axis.title.x = element_blank())+
  scale_alpha_manual(values = c(0,0.8))+
  scale_y_log10(limits = c(0.7,1e6), breaks = trans_breaks("log10", function(x) 10^x),
              labels = trans_format("log10", math_format(10^.x)))+
    guides(color = guide_legend(override.aes = list( size = 4, col = mypaly, 
                                                    labels = c(top_genera))))+
   labs(color='Genus') 

This chunk took 0 minutes

Figure 4e-g: Untransformed model estimates for genus-level abundance across temporal scales

  • Remove bacterial load
confint_sumamry_fig4a<-confint_summary[,c("Variable", "Taxa", "Time", "est","Sig_robust", "Abundance")]
confint_df_load_fig4a<-confint_df_load[,c("Variable", "Time", "est")]
confint_df_load_fig4a$Taxa <- "Bacterial load"
confint_df_load_fig4a$Sig_robust <- FALSE
confint_df_load_fig4a$Abundance <- 43124192 # same abundance of clostridium
str(confint_df_load_fig4a)
## tibble [600 x 6] (S3: confint.gam/evaluated_1d_smooth/evaluated_smooth/tbl_df/tbl/data.frame)
##  $ Variable  : Factor w/ 3 levels "HoursAfterSunrise",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Time      : num [1:600] 0.225 0.296 0.366 0.437 0.507 ...
##  $ est       : num [1:600] 106239 107397 108566 109743 110927 ...
##  $ Taxa      : chr [1:600] "Bacterial load" "Bacterial load" "Bacterial load" "Bacterial load" ...
##  $ Sig_robust: logi [1:600] FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Abundance : num [1:600] 43124192 43124192 43124192 43124192 43124192 ...
str(confint_sumamry_fig4a)
## 'data.frame':    9600 obs. of  6 variables:
##  $ Variable  : Factor w/ 3 levels "HoursAfterSunrise",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ Taxa      : Factor w/ 16 levels "Eubacterium",..: 12 12 12 12 12 12 12 12 12 12 ...
##  $ Time      : num  0.066 0.114 0.162 0.21 0.258 ...
##  $ est       : num  -0.466 -0.436 -0.405 -0.374 -0.344 ...
##  $ Sig_robust: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ Abundance : num  8788264 8788264 8788264 8788264 8788264 ...
confint_df_load_fig4a<-confint_df_load_fig4a[,c(1,4,2,3,5,6)]
names(confint_df_load_fig4a)<-c("Variable", "Taxa", "Time", "est","Sig_robust", "Abundance")
confint_df_load_fig4a$est<-0

confint_sumamry_fig4a<-rbind(confint_sumamry_fig4a,confint_df_load_fig4a )


confint_sumamry_fig4a$Taxa<-factor(confint_sumamry_fig4a$Taxa, levels = rev(c(top_genera[-17], "Bacterial load")))

confint_sumamry_fig4a$est<-ifelse(confint_sumamry_fig4a$Taxa=="Raoultibacter", confint_sumamry_fig4a$est+0.03, confint_sumamry_fig4a$est)

Fig4abc<-ggplot(confint_sumamry_fig4a, aes(x = Time, y = est, group = Taxa))+
  theme_light(base_size = 14)+
 geom_line( aes(alpha = Sig_robust, col = Taxa, size = Abundance))+
 #geom_line( aes( col = Taxa))+
  scale_size(range = c(1, 1.5))+
  facet_wrap(~Variable, scales = "free_x")+
  geom_rug(data = rugdata, aes(x = Time, y = NULL), inherit.aes = F, size = 0.1)+
  scale_color_manual(values = rev(c(mypaly[-17], "darkgrey")),  labels = c(top_genera[-17], "Bacterial load") )+
  ylab("Model estimate (log10)")+
  scale_x_continuous(breaks = int_breaks)+
  theme(panel.grid = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line(colour = "black"))+
theme(panel.spacing = unit(1.2, "lines"))+
 theme(plot.margin=unit(c(0.2,0.2,0.2,0.4),"cm"))+
  theme(plot.title = element_text(size=10))+
  guides(size=FALSE, alpha = FALSE, abundance = FALSE)+
  theme(strip.background = element_blank(),
        strip.text.x = element_blank())+
  theme(axis.text.x = element_blank(),
        axis.title.x = element_blank())+
  geom_hline(yintercept = 0, linetype = "dashed")+
  scale_alpha_manual(values = c(0,0.8))+
    guides(color = guide_legend(override.aes = list( size = 4, col = mypaly, 
                                                    labels = c(top_genera))))+
   labs(color='Genus') 

ggarrange(Fig4abc, Fig4def, ncol = 1, heights = c(1,1), legend = "right", common.legend = T)

#ggsave("FIGURES/figure4.pdf",width = 10, height = 5, units = "in")

This chunk took 0.08 minutes

Supplementary figures: Dynamics by storage type

confint_genus_storage<-do.call(rbind, base_genus_confint_storage)
confint_genus_storage$Variable<-factor(confint_genus_storage$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))

confint_genus_storage$Taxa<-factor(confint_genus_storage$Taxa, levels = top_genera[-17])

confint_genus_storage$Sig <-confint_genus_storage$p.val < 0.05



### rugdata
metadata<-data.frame(sample_data(core_genus))
rugdata<- data.frame(cbind(metadata$HoursAfterSunrise, metadata$month, metadata$AgeAtSamplingYears))
names(rugdata)<-c("HoursAfterSunrise", "month", "AgeAtSamplingYears")
rugdata<-gather(rugdata, Variable, Time, HoursAfterSunrise:AgeAtSamplingYears, factor_key=TRUE)
rugdata$est = 0
rugdata$Variable<-factor(rugdata$Variable, levels = c("HoursAfterSunrise", "month", "AgeAtSamplingYears"))

### Diurnal

diurnal_storage<-subset(confint_genus_storage, Variable == "HoursAfterSunrise")
head(diurnal_storage)
ggplot(diurnal_storage, aes(x = Time, y = est, group = Storage))+
  geom_line(aes(col = Storage, linetype = Sig) ,size = 1)+
  geom_ribbon(aes(ymin=lower, ymax=upper, fill = Storage), linetype=2, alpha=0.5)+
  facet_wrap(~Taxa, scales = "free", ncol = 4)+
  theme_bw(base_size = 14)+
  scale_color_manual(values = c("deeppink", "deepskyblue3"))+
  scale_fill_manual(values = c( "deeppink", "deepskyblue3"))+
  ylab("Model estimate")+
  xlab("Hours after sunrise")+
  guides(linetype=guide_legend(title="Significant (p < 0.05)"))

#ggsave("FIGURES/FigS4_diurnal.pdf", width = 12, height = 8, units = "in")

#### seasonal

seasonal_storage<-subset(confint_genus_storage, Variable == "month")


ggplot(seasonal_storage, aes(x = Time, y = est, group = Storage))+
  geom_line(aes(col = Storage, linetype = Sig) ,size = 1)+
  geom_ribbon(aes(ymin=lower, ymax=upper, fill = Storage), linetype=2, alpha=0.5)+
  facet_wrap(~Taxa, scales = "free", ncol = 4)+
  theme_bw(base_size = 14)+
  scale_color_manual(values = c("deeppink", "deepskyblue3"))+
  scale_fill_manual(values = c( "deeppink", "deepskyblue3"))+
  ylab("Model estimate")+
  xlab("Month")+
  guides(linetype=guide_legend(title="Significant (p < 0.05)"))+
  scale_x_continuous(breaks = c(1,2,3,4,5,6,7,8,9,10, 11, 12 ))

#ggsave("FIGURES/FigS5_seasonal.pdf", width = 12, height = 8, units = "in")


##### age 


age_storage<-subset(confint_genus_storage, Variable == "AgeAtSamplingYears")


ggplot(age_storage, aes(x = Time, y = est, group = Storage))+
  geom_line(aes(col = Storage, linetype = Sig) ,size = 1)+
  geom_ribbon(aes(ymin=lower, ymax=upper, fill = Storage), linetype=2, alpha=0.5)+
  facet_wrap(~Taxa, scales = "free", ncol = 4)+
  theme_bw(base_size = 14)+
  scale_color_manual(values = c("deeppink", "deepskyblue3"))+
  scale_fill_manual(values = c( "deeppink", "deepskyblue3"))+
  ylab("Model estimate")+
  xlab("Age at sampling (years)")+
  guides(linetype=guide_legend(title="Significant (p < 0.05)"))+
  scale_x_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9,10 ))

#ggsave("FIGURES/FigS6_age.pdf", width = 12, height = 8, units = "in")

This chunk took 0.14 minutes

Sample distributions by storage

This chunk took 0 minutes

Figure 5: Biological predictors of diversity and genus-level abundance

This chunk took 0 minutes

Figure 5a: Base models without biological predictors

This section fits base GAMMs (methodological and temporal variables only) and full GAMMs (including biological variables) to all microbial phenotypes, and extracts model summaries.

  • Make list of phenotypes (genus-level phenotypes and community-wide phenotypes)
  • Add these values as columns to metadata
  • Build ‘base models’ (models with just methodological and temporal predictors) for each phenotype
  • Extract summary statistics for temporal variables
  • This is kind of a repeat of earlier code, but we extract different information from the models (eg 95% CIs instead of predicted estimates).
### add ordination axes
sample_data(core_genus)$MDS1_W<-data.frame(ord_wunifrac_mds$vectors)$Axis.1
sample_data(core_genus)$MDS1_UW<-data.frame(ord_unifrac_mds$vectors)$Axis.1

diversity_names<-c("TotalAbundance", "Observed", "Shannon", "MDS1_W", "MDS1_UW")
taxanames
##  [1] "Helicobacter"                 "Cellulomonas"                
##  [3] "[Ruminococcus] torques group" "Unknown Lachnospiraceae"     
##  [5] "Blautia"                      "Lachnoclostridium"           
##  [7] "Bacteroides"                  "Paeniclostridium"            
##  [9] "Romboutsia"                   "Enterococcus"                
## [11] "Unknown Bacillaceae"          "Lactococcus"                 
## [13] "Clostridium sensu stricto 1"  "Peptococcus"                 
## [15] "Eubacterium"                  "Raoultibacter"
phenotypes<-c(taxanames, diversity_names)

# make empty list to store results
  
base_model_summary_list<-list()


for (i in 1:length(phenotypes)){
  
  tryCatch({ #catch errors
    
    #print(i)
    
    metadata<-data.frame(sample_data(core_genus))
   # str(metadata)
    asvtable<-data.frame(t(otu_table(core_genus)))
    asvtable_log<-  log10(asvtable+1)
    dim(asvtable_log)
    names(asvtable_log)<-taxanames
    
    metadata_all<-merge(metadata, asvtable_log, by = 0)
    
   row.names(metadata_all)<-metadata_all$feature.id

    
    #scale variables
    metadata_all$Seq_depth_nospikein<-as.numeric(scale(metadata_all$Seq_depth_nospikein))
    metadata_all$HoursTilFrozen<-as.numeric(scale(metadata_all$HoursTilFrozen))
    metadata_all$TotalAbundance<-log10(metadata_all$TotalAbundance+1) #log10 transform load
    metadata_all$Observed<-scale(metadata_all$Observed) # scale observed
    metadata_all$Shannon<-scale(metadata_all$Shannon) # scale shannon

    
    # fit gamm
    m_taxa <- mgcv::gamm(metadata_all[, phenotypes[i]]~
        s(HoursAfterSunrise, bs = "cr")+
        s(month, bs = "cc")+
        s(AgeAtSamplingYears, bs = "cr")+
        s(Seq_depth_nospikein, bs = "cr")+
        Storage+
        Seq_run+
        HoursTilFrozen+
        s(IndividID, bs = "re")+
       s(GroupAtSampling, bs = "re"),
     correlation = corARMA(form = ~ 1|Year, p = 3),
      data=metadata_all,
      family = gaussian)
    
    summary<- summary(m_taxa$gam) 

    
    fixed_terms<-data.frame(extract_fixed(m_taxa$gam))[,c(2,5:7)] ## need to load the function here
    row.names(fixed_terms)<-extract_fixed(m_taxa$gam)$term
    names(fixed_terms)<-c("Effect", "P.val", "lower_CI", "upper_CI")
    
    smoothed_terms<-data.frame(summary$s.table)
    names(smoothed_terms)<-c("lower_CI", "upper_CI", "Effect", "P.val")
    smoothed_terms$lower_CI <-0
    smoothed_terms$upper_CI <-0
    
    results_df<-rbind(fixed_terms, smoothed_terms)
    results_df$R.sq<-summary$r.sq
    results_df$Effect_abs<-abs( results_df$Effect)
    results_df$Variable<-row.names(results_df)
    results_df$Taxa<-phenotypes[i]
    
    base_model_summary_list[[i]]<-results_df
    
  }, error=function(e){})
}

names(base_model_summary_list)<-phenotypes

#saveRDS(base_model_summary_list, "R_OUTPUT/base_models.RDS")

This chunk took 36.93 minutes

base_model_summary<-do.call(rbind, base_model_summary_list)
base_model_summary$Significance<-base_model_summary$P.val < 0.01 # add column = whether significant
base_model_summary<-subset(base_model_summary, Variable !="Intercept") # remove intercept row

base_model_summary$Variable<- factor(base_model_summary$Variable, levels = c( # order variables
  "s(HoursAfterSunrise)", "s(month)" , "s(AgeAtSamplingYears)",   
  "s(IndividID)", "s(GroupAtSampling)",
  "StorageFROZEN","s(Seq_depth_nospikein)", "Seq_runRUN2", "Seq_runRUN3","Seq_runRUN4", "HoursTilFrozen"))

This chunk took 0 minutes

  • Split base models by storage type to assess how robust associations are
# make PS object for frozen and freezedried samples

core_genus_frozen<-subset_samples(core_genus, Storage == "FROZEN")
core_genus_freezedried<-subset_samples(core_genus, Storage == "FREEZEDRIED")

# put in a list

ps_list<-list(core_genus_frozen, core_genus_freezedried)
names(ps_list)<-c("FROZEN", "FREEZEDRIED")

# make empty lists to store results

base_model_storage_i<-list()
base_model_storage_k<-list()

# loops loops loops

for (k in 1:length(ps_list)){
  
  core_genus_storage<- ps_list[[k]]
 # core_genus_storage<- ps_list[[2]]
  
  for (i in 1:length(phenotypes)){
  
  tryCatch({ #catch errors
    
    #print(i)
    
    # generate metadata dataframe
    metadata<-data.frame(sample_data(core_genus_storage))
    asvtable<-data.frame(t(otu_table(core_genus_storage))) # add genus-level abundances to dataframe
    asvtable_log<-  log10(asvtable+1)
    names(asvtable_log)<-taxanames
    metadata_all<-merge(metadata, asvtable_log, by = 0)
    row.names(metadata_all)<-metadata_all$feature.id

    #scale variables
    metadata_all$Seq_depth_nospikein<-as.numeric(scale(metadata_all$Seq_depth_nospikein))
    metadata_all$HoursTilFrozen<-as.numeric(scale(metadata_all$HoursTilFrozen))
    metadata_all$TotalAbundance<-log10(metadata_all$TotalAbundance+1) #log10 transform load
    metadata_all$Observed<-scale(metadata_all$Observed) # scale observed
    metadata_all$Shannon<-scale(metadata_all$Shannon) # scale shannon

    
    # fit gamm
    m_taxa <- mgcv::gamm(metadata_all[, phenotypes[i]]~
   # m_taxa <- mgcv::gamm(metadata_all[, phenotypes[13]]~
        s(HoursAfterSunrise, bs = "cr")+
        s(month, bs = "cc")+
        s(AgeAtSamplingYears, bs = "cr")+
        s(Seq_depth_nospikein, bs = "cr")+
        Seq_run+
        HoursTilFrozen+
        s(IndividID, bs = "re")+
       s(GroupAtSampling, bs = "re"),
       correlation = corARMA(form = ~ 1|Year, p = 3),
      data=metadata_all,
      family = gaussian)
    
    # extract summary statistics
    summary<- summary(m_taxa$gam) 
    fixed_terms<-data.frame(extract_fixed(m_taxa$gam))[,c(2,5:7)] ## need to load the function here
    row.names(fixed_terms)<-extract_fixed(m_taxa$gam)$term
    names(fixed_terms)<-c("Effect", "P.val", "lower_CI", "upper_CI")
    
    smoothed_terms<-data.frame(summary$s.table)
    names(smoothed_terms)<-c("lower_CI", "upper_CI", "Effect", "P.val")
    smoothed_terms$lower_CI <-0
    smoothed_terms$upper_CI <-0
    
    results_df<-rbind(fixed_terms, smoothed_terms)
    results_df$R.sq<-summary$r.sq
    results_df$Effect_abs<-abs( results_df$Effect)
    results_df$Variable<-row.names(results_df)
    results_df$Taxa<-phenotypes[i]
    results_df$Type <- as.character(metadata_all$Storage[1])
    
    base_model_storage_i[[i]]<-results_df
    
  }, error=function(e){})
    
  }
    
    base_model_storage_df<-do.call(rbind, base_model_storage_i)
    
    base_model_storage_k[[k]] <-base_model_storage_df
    
}

base_model_storage_k_df<-do.call(rbind, base_model_storage_k)


#saveRDS(base_model_storage_k_df, "R_OUTPUT/base_models_by_storage.RDS")

This chunk took 21.31 minutes

  • Add whether phenotype is robust
base_model_summary_frozen<-subset(base_model_storage_k_df, Type == "FROZEN")
base_model_summary_freezedried<-subset(base_model_storage_k_df, Type == "FREEZEDRIED")

frozen_df<-base_model_summary_frozen[,c("Taxa", "Variable", "P.val")]
names(frozen_df)[3]<-"P.val_frozen"
freezedried_df<-base_model_summary_freezedried[,c("Taxa", "Variable", "P.val")]
names(freezedried_df)[3]<-"P.val_freezedried"


base_model_summary1<- merge(base_model_summary, frozen_df, by = c("Taxa", "Variable"))
base_model_summary2<- merge(base_model_summary1, freezedried_df, by = c("Taxa", "Variable"))

base_model_summary2$Sig_robust<- base_model_summary2$P.val_frozen < 0.05 & base_model_summary2$P.val_freezedried < 0.05

This chunk took 0 minutes

Fig. 5b and c: Fixed-effect models including biological predictors

  • Add biological predictors to models and remove temporal smooths
  • Variance partitioning by comparing R^2 of full and reduced models.
# make empty list to store results
  
full_model_summary_list<-list()


for (i in 1:length(phenotypes)){
  
  tryCatch({ #catch errors
    
    #print(i)
    
    metadata<-data.frame(sample_data(core_genus))
   # str(metadata)
    asvtable<-data.frame(t(otu_table(core_genus)))
    asvtable_log<-  log10(asvtable+1)
    head(asvtable_log)
    names(asvtable_log)<-taxanames
    metadata_all<-merge(metadata, asvtable_log, by = 0)
    row.names(metadata_all)<-metadata_all$feature.id

    
        #scale variables
    metadata_all$Seq_depth_nospikein<-as.numeric(scale(metadata_all$Seq_depth_nospikein))
    metadata_all$HoursTilFrozen<-as.numeric(scale(metadata_all$HoursTilFrozen))
    
    metadata_all$TotalAbundance<-log10(metadata_all$TotalAbundance+1) #log10 transform load
    metadata_all$Observed<-scale(metadata_all$Observed) # scale observed
    metadata_all$Shannon<-scale(metadata_all$Shannon) # scale shannon
    
    metadata_all$Condition_weight<-as.numeric(scale(metadata_all$Condition_weight))
    metadata_all$Temp_min<-as.numeric(scale(metadata_all$Temp_min))
    metadata_all$Temp_max<-as.numeric(scale(metadata_all$Temp_max))
    metadata_all$HourlyTemp<-as.numeric(scale(metadata_all$HourlyTemp))
    metadata_all$HoursSinceForagingStart<-as.numeric(scale(log10(metadata_all$HoursSinceForagingStart+0.1)))
    metadata_all$sum_rainfall_month<-as.numeric(scale(log10(metadata_all$sum_rainfall_month+0.1)))
    metadata_all$SocialStatus<-factor(metadata_all$SocialStatus)
    

# fit gamm minus temporal variables
    m_taxa <- mgcv::gamm(metadata_all[, phenotypes[i]]~
       # s(HoursAfterSunrise, bs = "cr")+
      #  s(month, bs = "cc")+
      #  s(AgeAtSamplingYears, bs = "cr")+
        s(Seq_depth_nospikein, bs = "cr")+
        Storage+
        Seq_run+
        s(IndividID, bs = "re")+
        s(GroupAtSampling, bs = "re")+
        HoursTilFrozen+
        Sex+
        SocialStatus+
        Condition_weight+
        HoursSinceForagingStart+
        HourlyTemp+
        Temp_max+
        Temp_min+
        sum_rainfall_month,
      correlation = corARMA(form = ~ 1|Year, p = 3),
      data=metadata_all,
      family = gaussian)
    
    summary<- summary(m_taxa$gam) 

    
    fixed_terms<-data.frame(extract_fixed(m_taxa$gam))[,c(2,5:7)] ## need to load the function here
    row.names(fixed_terms)<-extract_fixed(m_taxa$gam)$term
    names(fixed_terms)<-c("Effect", "P.val", "lower_CI", "upper_CI")
    
    smoothed_terms<-data.frame(summary$s.table)
    names(smoothed_terms)<-c("lower_CI", "upper_CI", "Effect", "P.val")
    smoothed_terms$lower_CI <-0
    smoothed_terms$upper_CI <-0
    
    results_df<-rbind(fixed_terms, smoothed_terms)
    results_df$R.sq<-summary$r.sq
    results_df$Effect_abs<-abs( results_df$Effect)
    results_df$Variable<-row.names(results_df)
    results_df$Taxa<-phenotypes[i]
    
    
    #### model with only methods
    
    m_taxa_methods <- mgcv::gamm(metadata_all[, phenotypes[i]]~
        s(Seq_depth_nospikein, bs = "cr")+
        Storage+
        Seq_run+
        HoursTilFrozen,
      correlation = corARMA(form = ~ 1|Year, p = 3),
      data=metadata_all,
      family = gaussian)
    
    results_df$R.sq_methods<-summary(m_taxa_methods$gam)$r.sq
    
    ### model with all terms
    
        m_taxa_full <- mgcv::gamm(metadata_all[, phenotypes[i]]~
        s(HoursAfterSunrise, bs = "cr")+
        s(month, bs = "cc")+
        s(AgeAtSamplingYears, bs = "cr")+
        s(Seq_depth_nospikein, bs = "cr")+
        Storage+
        Seq_run+
        s(IndividID, bs = "re")+
        s(GroupAtSampling, bs = "re")+
        HoursTilFrozen+
        Sex+
        SocialStatus+
        Condition_weight+
        HoursSinceForagingStart+
        HourlyTemp+
        Temp_max+
        Temp_min+
        sum_rainfall_month,
      correlation = corARMA(form = ~ 1|Year, p = 3),
      data=metadata_all,
      family = gaussian)
    
    results_df$R.sq_full<-summary(m_taxa_full$gam)$r.sq
    

    full_model_summary_list[[i]]<-results_df
    
  }, error=function(e){})
}

names(full_model_summary_list)<-phenotypes

#saveRDS(full_model_summary_list, "R_OUTPUT/full_models.RDS")

This chunk took 62.82 minutes

full_model_summary<-do.call(rbind, full_model_summary_list)

full_model_summary$Significance<-full_model_summary$P.val < 0.01
full_model_summary<-subset(full_model_summary, Variable !="Intercept")

full_model_summary$Variable<- factor(full_model_summary$Variable, levels = c(
  "s(HoursAfterSunrise)", "s(month)" , "s(AgeAtSamplingYears)",   
  "s(IndividID)", "s(GroupAtSampling)",
  "sum_rainfall_month",  "Temp_max", "Temp_min", "HourlyTemp",
  "SexM",   "Condition_weight",  "SocialStatusSubordinate",    "HoursSinceForagingStart", 
  "StorageFROZEN","s(Seq_depth_nospikein)", "Seq_runRUN2", "Seq_runRUN3","Seq_runRUN4", "HoursTilFrozen"))

This chunk took 0 minutes

  • split by storage to assess robustness
full_model_storage_i<-list()
full_model_storage_k<-list()


for (k in 1:length(ps_list)){
  
  core_genus_storage<- ps_list[[k]]
 # core_genus_storage<- ps_list[[1]]
  
  for (i in 1:length(phenotypes)){
  
  tryCatch({ #catch errors
    
  #  #print(i)
    
    metadata<-data.frame(sample_data(core_genus_storage))
   # str(metadata)
    asvtable<-data.frame(t(otu_table(core_genus_storage)))
    asvtable_log<-  log10(asvtable+1)
    dim(asvtable_log)
    names(asvtable_log)<-taxanames
    metadata_all<-merge(metadata, asvtable_log, by = 0)
    row.names(metadata_all)<-metadata_all$feature.id
    
     #scale variables
    metadata_all$Seq_depth_nospikein<-as.numeric(scale(metadata_all$Seq_depth_nospikein))
    metadata_all$HoursTilFrozen<-as.numeric(scale(metadata_all$HoursTilFrozen))
    
    metadata_all$TotalAbundance<-log10(metadata_all$TotalAbundance+1) #log10 transform load
    metadata_all$Observed<-scale(metadata_all$Observed) # scale observed
    metadata_all$Shannon<-scale(metadata_all$Shannon) # scale shannon
    
    metadata_all$Condition_weight<-as.numeric(scale(metadata_all$Condition_weight))
    metadata_all$Temp_min<-as.numeric(scale(metadata_all$Temp_min))
    metadata_all$Temp_max<-as.numeric(scale(metadata_all$Temp_max))
    metadata_all$HourlyTemp<-as.numeric(scale(metadata_all$HourlyTemp))
    metadata_all$HoursSinceForagingStart<-as.numeric(scale(log10(metadata_all$HoursSinceForagingStart+0.1)))
    metadata_all$sum_rainfall_month<-as.numeric(scale(log10(metadata_all$sum_rainfall_month+0.1)))
    metadata_all$SocialStatus<-factor(metadata_all$SocialStatus)

    
    # fit gamm
    m_taxa <- mgcv::gamm(metadata_all[, phenotypes[i]]~
       # s(HoursAfterSunrise, bs = "cr")+
                           #s(month, bs = "cc")+
                           #s(AgeAtSamplingYears, bs = "cr")+
                           s(Seq_depth_nospikein, bs = "cr")+
                           #Storage+
                           Seq_run+
                           s(IndividID, bs = "re")+
                           s(GroupAtSampling, bs = "re")+
                           HoursTilFrozen+
                           Sex+
                           SocialStatus+
                           Condition_weight+
                           HoursSinceForagingStart+
                           HourlyTemp+
                           Temp_max+
                           Temp_min+
                           sum_rainfall_month,
                         correlation = corARMA(form = ~ 1|Year, p = 3),
                         data=metadata_all,
                         family = gaussian)
    
    summary<- summary(m_taxa$gam) 

    
    fixed_terms<-data.frame(extract_fixed(m_taxa$gam))[,c(2,5:7)] ## need to load the function here
    row.names(fixed_terms)<-extract_fixed(m_taxa$gam)$term
    names(fixed_terms)<-c("Effect", "P.val", "lower_CI", "upper_CI")
    
    smoothed_terms<-data.frame(summary$s.table)
    names(smoothed_terms)<-c("lower_CI", "upper_CI", "Effect", "P.val")
    smoothed_terms$lower_CI <-0
    smoothed_terms$upper_CI <-0
    
    results_df<-rbind(fixed_terms, smoothed_terms)
    results_df$R.sq<-summary$r.sq
    results_df$Effect_abs<-abs( results_df$Effect)
    results_df$Variable<-row.names(results_df)
    results_df$Taxa<-phenotypes[i]
    results_df$Type <- names(ps_list[k])
    
    full_model_storage_i[[i]]<-results_df
    
  }, error=function(e){})
    
  }
    
    full_model_storage_df<-do.call(rbind, full_model_storage_i)
    
    full_model_storage_k[[k]] <-full_model_storage_df
    
}

full_model_storage_k_df<-do.call(rbind, full_model_storage_k)

#saveRDS(full_model_storage_k_df, "R_OUTPUT/full_models_by_storage.RDS")

#table(full_model_storage_k_df$Taxa)

This chunk took 12.03 minutes

  • Add whether phenotype is robust
full_model_summary_frozen<-subset(full_model_storage_k_df, Type == "FROZEN")
full_model_summary_freezedried<-subset(full_model_storage_k_df, Type == "FREEZEDRIED")

frozen_df<-full_model_summary_frozen[,c("Taxa", "Variable", "P.val")]
names(frozen_df)[3]<-"P.val_frozen"
freezedried_df<-full_model_summary_freezedried[,c("Taxa", "Variable", "P.val")]
names(freezedried_df)[3]<-"P.val_freezedried"


full_model_summary1<- merge(full_model_summary, frozen_df, by = c("Taxa", "Variable"))
full_model_summary2<- merge(full_model_summary1, freezedried_df, by = c("Taxa", "Variable"))

full_model_summary2$Sig_robust<- full_model_summary2$P.val_frozen < 0.05 & full_model_summary2$P.val_freezedried < 0.05

#full_model_summary2

This chunk took 0 minutes

Merge all model stats

#full_model_df$R.sq_partial<-full_model_df$R.sq - full_model_df$R.sq_methods
full_model_summary2$r2_fixed <-full_model_summary2$R.sq - full_model_summary2$R.sq_methods
full_model_summary2$r2_smooth <-full_model_summary2$R.sq_full - full_model_summary2$R.sq


### order taxa 
full_model_summary2$Taxa <- factor(full_model_summary2$Taxa, levels = c(rev(top_genera[-17]), rev(diversity_names)))
unique(full_model_summary2$Taxa)
##  [1] [Ruminococcus] torques group Bacteroides                 
##  [3] Blautia                      Cellulomonas                
##  [5] Clostridium sensu stricto 1  Enterococcus                
##  [7] Eubacterium                  Helicobacter                
##  [9] Lachnoclostridium            Lactococcus                 
## [11] MDS1_UW                      MDS1_W                      
## [13] Observed                     Paeniclostridium            
## [15] Peptococcus                  Raoultibacter               
## [17] Romboutsia                   Shannon                     
## [19] TotalAbundance               Unknown Bacillaceae         
## [21] Unknown Lachnospiraceae     
## 21 Levels: Eubacterium Cellulomonas Raoultibacter ... TotalAbundance
#full_model_summary2$Taxa <- factor(full_model_summary2$Taxa, levels = c(rev(top_genera[-16])))

### add third significance group

full_model_summary2$Sig_robust <-ifelse(full_model_summary2$Sig_robust == FALSE & full_model_summary2$P.val < 0.05, "Overall significant", as.character(full_model_summary2$Sig_robust))
full_model_df<-full_model_summary2

## base models 
base_model_summary2$Taxa <- factor(base_model_summary2$Taxa, levels = c(rev(top_genera[-17]), rev(diversity_names)))

### add third significance group

base_model_summary2$Sig_robust <-ifelse(base_model_summary2$Sig_robust == FALSE & base_model_summary2$P.val < 0.05, "Overall significant", as.character(base_model_summary2$Sig_robust))

base_model_summary_df<-base_model_summary2

This chunk took 0 minutes

Plotting estimates

fixed_df<-subset(full_model_df, 
              #   Variable != "s(HoursAfterSunrise)" & 
              #     Variable != "s(month)" & 
              #     Variable != "s(AgeAtSamplingYears)"& 
                   Variable != "s(IndividID)"& 
                  Variable != "s(GroupAtSampling)" &
                  Variable != "s(Seq_depth_nospikein)"& 
                   Variable != "Storage" &
                   Variable != "Seq_runRUN2" &
                   Variable != "Seq_runRUN3" &
                   Variable != "Seq_runRUN4" &
                   Variable != "StorageFROZEN" &
                   Variable != "HoursTilFrozen")

fixed_df$Variable<-factor(fixed_df$Variable)
unique(fixed_df$Variable)
## [1] Condition_weight        HourlyTemp              HoursSinceForagingStart
## [4] SexM                    SocialStatusSubordinate sum_rainfall_month     
## [7] Temp_max                Temp_min               
## 8 Levels: sum_rainfall_month Temp_max Temp_min HourlyTemp ... HoursSinceForagingStart
levels(fixed_df$Variable) <- c("Rainfall", "Max. temp.", "Min. temp.", 
                              "Sampling temp.", 
                               "Sex (Male)", "Body condition", "Social status (Sub.)",  "Time foraging")


################


smooth_df<-subset(base_model_summary_df, Variable == "s(HoursAfterSunrise)" |  Variable == "s(month)" |  Variable == "s(AgeAtSamplingYears)")

smooth_df$Variable<-factor(smooth_df$Variable, levels = c("s(HoursAfterSunrise)", "s(month)", "s(AgeAtSamplingYears)"))

levels(smooth_df$Variable) <-c("Hours after sunrise", "Month", "Age (years)")



P1<- ggplot(smooth_df, aes(x = Effect, y = Taxa, fill = Sig_robust))+
  theme_bw(base_size = 12)+
  #geom_errorbarh(aes(xmin=upper_CI, xmax=lower_CI, height = 0), alpha = 0.5, col = "black", size = 1)+
  geom_point( pch = 21, alpha = 0.9, col = "black", size = 2)+
  facet_wrap(~Variable, nrow = 1, labeller = label_wrap_gen(width=10))+
  scale_size(range = c(1.5,4))+
  scale_fill_manual(values = c("darkgray" , "deepskyblue", "blue"))+
  scale_color_manual(values = c("darkgray", "deepskyblue", "blue"))+
 theme(legend.position = "none") +
  theme(axis.title = element_blank())+
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))+
  theme( # remove the vertical grid lines
    panel.grid.major.x = element_blank() ,
    panel.grid.minor.x = element_blank() ,
    # explicitly set the horizontal lines (or they will disappear too)
    panel.grid.major.y = element_line( size=.05, color="lightgrey" )) + 
  theme(  plot.margin = margin(0.2, 0.1,0.5,0, "cm"))




P2<-ggplot(fixed_df, aes(x = Effect, y = Taxa, fill = Sig_robust))+
  theme_bw(base_size = 12)+
  geom_vline(xintercept = 0, alpha = 0.3, size = 0.3)+
  geom_errorbarh(aes(xmax=upper_CI+0.08, xmin=lower_CI-0.08, height = 0, col = Sig_robust), size = 1)+
  geom_point(size = 2, pch = 21, alpha = 0.9, col = "black")+
  facet_wrap(~Variable, nrow = 1, labeller = label_wrap_gen(width=10), scales = "free_x")+
  scale_size(range = c(1.5,3))+
  scale_fill_manual(values = c("darkgray","pink" , "red"))+
  scale_color_manual(values = c("darkgray","pink" , "red"))+
 theme(legend.position = "none")+
 theme(axis.title.y=element_blank(),
  axis.text.y=element_blank(),
   axis.ticks.y=element_blank())+
  theme(axis.title = element_blank())+
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))+
  theme( # remove the vertical grid lines
    panel.grid.major.x = element_blank() ,
    panel.grid.minor.x = element_blank() ,
    # explicitly set the horizontal lines (or they will disappear too)
    panel.grid.major.y = element_line( size=.05, color="lightgrey" ))


## Plot R2 contributions of methods and biological variables

r2_partial_df<-distinct(full_model_df, Taxa, .keep_all = T)
head(r2_partial_df)
R2_df<-gather(r2_partial_df, Type, R2, c("R.sq_methods","r2_fixed", "r2_smooth"), factor_key=TRUE)

R2_df$R2<-ifelse(R2_df$R2 <0, 0, R2_df$R2)

P3<-ggplot(R2_df, aes(x = R2, y = Taxa, fill = Type))+
  geom_bar(stat = "identity")+
  scale_fill_manual(values = c("grey", "red", "dodgerblue"))+
  theme_bw()+
  theme(axis.text.y = element_blank(),
        axis.title.y = element_blank())+
  theme(legend.position = "none")+
  theme(  plot.margin = margin(1.6, 0.2,0.4,0.1, "cm"))+
 # geom_vline(xintercept = 0.1, linetype = "dashed", size = 1)+
   theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
         axis.title.x = element_blank(),
         panel.grid = element_blank())


ggarrange(P1,P2,P3, ncol = 3, widths = c(3.5,5,1.5))

#ggsave("FIGURES/Figure5.pdf", width = 13, height = 6, units = "in")

This chunk took 0.05 minutes

Figure 6: Do diurnal oscillations decay with age?

taxanames<-as.character(taxa_names(core_genus))
confint_list_age<-list()
#model_stats_list<-list()


for (i in 1:length(taxanames)){
  
  tryCatch({ #catch errors
    
    #print(i)
    #taxa1<-prune_taxa("Lachnoclostridium", core_genus)
    taxa1<-prune_taxa(taxanames[i], core_genus)
    sample_data(taxa1)$taxa_i_abundance<-sample_sums(taxa1)
    metadata<-data.frame(sample_data(taxa1))
    metadata$Seq_depth_nospikein<-as.numeric(scale(metadata$Seq_depth_nospikein))
    
    m_taxa <- mgcv::gamm(log10(taxa_i_abundance+1)~
        s(HoursAfterSunrise, bs = "cr", by = AgeCat)+
        s(month, bs = "cc")+
        # s(AgeAtSamplingYears, bs = "cr")+
        s(Seq_depth_nospikein, bs = "cr")+
        Storage+
        Seq_run+
        HoursTilFrozen+
        s(IndividID, bs = "re")+
        s(GroupAtSampling, bs = "re"),
      correlation = corARMA(form = ~ 1|Year, p = 3),
      data=metadata,
      family = gaussian)
    
    confint_df_time<-confint(m_taxa$gam, parm = "s(HoursAfterSunrise)",  type = "confidence")
    confint_df_time$Taxa<-taxanames[i]
    names(confint_df_time)[3]<-"Time"
    confint_list_age[[i]]<-confint_df_time
    
  }, error=function(e){})
}

names(confint_list_age)<-taxanames

This chunk took 31.19 minutes

confint_summary_age<-do.call(rbind, confint_list_age)

unique(confint_summary_age$Taxa)
##  [1] "Helicobacter"                 "Cellulomonas"                
##  [3] "[Ruminococcus] torques group" "Unknown Lachnospiraceae"     
##  [5] "Blautia"                      "Lachnoclostridium"           
##  [7] "Bacteroides"                  "Paeniclostridium"            
##  [9] "Romboutsia"                   "Enterococcus"                
## [11] "Unknown Bacillaceae"          "Lactococcus"                 
## [13] "Clostridium sensu stricto 1"  "Peptococcus"                 
## [15] "Eubacterium"                  "Raoultibacter"
unique(confint_summary_age$AgeCat)
## [1] MIDDLEAGED OLD        YOUNG     
## Levels: MIDDLEAGED OLD YOUNG
confint_summary_age$AgeCat<-factor(confint_summary_age$AgeCat, levels = c("YOUNG", "MIDDLEAGED", "OLD"))
levels(confint_summary_age$AgeCat)<-c("Young", "Adult", "Old")

abundance_df<-data.frame(taxa_sums(core_genus))
names(abundance_df)<-"Abundance"
abundance_df$Genus<-row.names(abundance_df)

confint_summary_age$Abundance<-vlookup(confint_summary_age$Taxa, abundance_df, lookup_column = "Genus", result_column = "Abundance")
confint_summary_age<-arrange(confint_summary_age, desc(Abundance))
confint_summary_age$Taxa<-factor(confint_summary_age$Taxa, levels = rev(top_genera[-17]))

This chunk took 0 minutes

Building figure

sample_data(scaled_core)$AgeCat<-factor(sample_data(scaled_core)$AgeCat, levels = c("YOUNG", "MIDDLEAGED", "OLD"))
levels(sample_data(scaled_core)$AgeCat)<-c("Young", "Adult", "Old")

age_histogram<-ggplot(sample_data(scaled_core), aes(x = HoursAfterSunrise))+
  geom_histogram(fill = "grey", col = "black", binwidth = 0.5)+
  facet_wrap(~AgeCat, scales = "free_y")+
  theme_minimal(base_size = 14)+
  theme( strip.background = element_blank(),
    strip.text.x = element_blank())+
  theme(plot.margin = unit(c(0.2,7.4, 0,0.3), "cm"))+
  theme(axis.title.x = element_blank(), 
        axis.text.x = element_blank())+
  ylab("Frequency")+
   theme( # remove the vertical grid lines
    panel.grid.major.x = element_blank() ,
    panel.grid.minor.x = element_blank() ,
    # explicitly set the horizontal lines (or they will disappear too)
    panel.grid.major.y = element_line( size=.05, color="lightgrey" ))


####################


confint_summary_age_sig<-subset(confint_summary_age, Taxa != "Enterococcus" & Taxa != "Lactococcus" & Taxa != "Eubacterium")

unique(confint_summary_age$Taxa)
##  [1] Clostridium sensu stricto 1  Blautia                     
##  [3] Unknown Lachnospiraceae      Bacteroides                 
##  [5] [Ruminococcus] torques group Enterococcus                
##  [7] Paeniclostridium             Romboutsia                  
##  [9] Lachnoclostridium            Lactococcus                 
## [11] Peptococcus                  Helicobacter                
## [13] Unknown Bacillaceae          Cellulomonas                
## [15] Raoultibacter                Eubacterium                 
## 16 Levels: Eubacterium Cellulomonas Raoultibacter ... Clostridium sensu stricto 1
Fig6abc<-ggplot(confint_summary_age_sig, aes(x = Time, y = est, col = Taxa))+
  facet_wrap(~AgeCat)+
   geom_line( aes(col = Taxa, size = Abundance))+
  scale_size(range = c(1, 2))+
  ylim(-1.6,1.1)+
  scale_color_manual(values = rev(mypaly[c(-4, -10, -16, -17)]), labels = top_genera[c(-4, -10, -16, -17)])+
  theme_bw(base_size = 14)+
  ylab("Model estimate (log10)")+
  xlab("Hours after sunrise")+
  theme(plot.title = element_text(size=10))+
  guides(size=FALSE)+
  #guides(color = guide_legend(override.aes = list(size = 3))) +
  theme(strip.background = element_blank(),
        strip.text.x = element_blank())+
 # theme(legend.position = "none")+
  geom_hline(yintercept = 0, linetype = "dashed") +
  guides(color = guide_legend(override.aes = list( size = 4, col = mypaly[c(-4, -10, -16, -17)], 
     labels = top_genera[c(-4, -10, -16, -17)])))+
   labs(color='Genus')+
  theme(plot.margin = unit(c(0,0.2, 0.2,0.3), "cm"))

Fig6top<-ggarrange(age_histogram, Fig6abc, ncol = 1, heights = c(0.5,2))

This chunk took 0.02 minutes

A focus on five genera

  • Look more closely at Clostridium, Bacteroides, Paeniclostridium, Cellulomonas, and Raoultibacter
clostridium_age<-subset(confint_summary_age, Taxa == "Clostridium sensu stricto 1")

clostridium<-ggplot(clostridium_age, aes(x = Time, y = est, col = AgeCat))+
  geom_line(size = 1.5)+
  geom_ribbon(aes(ymin=lower, ymax=upper, fill = AgeCat), linetype=2, alpha=0.1)+
  coord_cartesian(ylim = c(-2,1.5))+
  scale_color_manual(values = c( "blue", "darkgrey", "red"))+
  scale_fill_manual(values = c( "blue", "darkgrey", "red"))+
  theme_bw(base_size = 14)+
  ylab("Model estimate (log10)")+
  xlab("")+
  # theme(legend.position = "none")+
  ggtitle("d) Clostridium")+
  theme(plot.title = element_text(size=12))+
  geom_hline(yintercept = 0, linetype = "dashed")+
  labs(col='Age category')+
    guides(fill = FALSE)


bacteroides_age<-subset(confint_summary_age, Taxa == "Bacteroides")

bacteroides<-ggplot(bacteroides_age, aes(x = Time, y = est, col = AgeCat))+
  geom_line(size = 1.5)+
  geom_ribbon(aes(ymin=lower, ymax=upper, fill = AgeCat), linetype=2, alpha=0.1)+
  coord_cartesian(ylim = c(-2,1.5))+
  scale_color_manual(values = c( "blue", "darkgrey", "red"))+
  scale_fill_manual(values = c( "blue", "darkgrey", "red"))+
  theme_bw(base_size = 14)+
  theme(axis.title.y = element_blank(), axis.text.y = element_blank())+
  xlab("")+
 # theme(legend.position = "none")+
  ggtitle("e) Bacteroides")+
  theme(plot.title = element_text(size=12)) +
  geom_hline(yintercept = 0, linetype = "dashed")+
  labs(col='Age category')+
    guides(fill = FALSE)

blautia_age<-subset(confint_summary_age, Taxa == "Blautia")

blautia<-ggplot(blautia_age, aes(x = Time, y = est, col = AgeCat))+
  geom_line(size = 1.5)+
  geom_ribbon(aes(ymin=lower, ymax=upper, fill = AgeCat), linetype=2, alpha=0.1)+
  coord_cartesian(ylim = c(-2,1.5))+
  scale_color_manual(values = c( "blue", "darkgrey", "red"))+
  scale_fill_manual(values = c( "blue", "darkgrey", "red"))+
  theme_bw(base_size = 14)+
  theme(axis.title.y = element_blank(), axis.text.y = element_blank())+
  xlab("Hours after sunrise")+
  # theme(legend.position = "none")+
  ggtitle("f) Blautia")+
  theme(plot.title = element_text(size=12))+
  geom_hline(yintercept = 0, linetype = "dashed")+
  labs(col='Age category')+
    guides(fill = FALSE)

ruminococcus_age<-subset(confint_summary_age, Taxa == "[Ruminococcus] torques group")

ruminococcus<-ggplot(ruminococcus_age, aes(x = Time, y = est, col = AgeCat))+
  geom_line(size = 1.5)+
  geom_ribbon(aes(ymin=lower, ymax=upper, fill = AgeCat), linetype=2, alpha=0.1)+
  coord_cartesian(ylim = c(-2,1.5))+
  scale_color_manual(values = c( "blue", "darkgrey", "red"))+
  scale_fill_manual(values = c( "blue", "darkgrey", "red"))+
  theme_bw(base_size = 14)+
  theme(axis.title.y = element_blank(), axis.text.y = element_blank())+
  xlab("Hours after sunrise")+
  # theme(legend.position = "none")+
  ggtitle("f) Ruminococcus")+
  theme(plot.title = element_text(size=12))+
  geom_hline(yintercept = 0, linetype = "dashed")+
  labs(col='Age category')+
    guides(fill = FALSE)

paeniclostridium_age<-subset(confint_summary_age, Taxa == "Paeniclostridium")

paeniclostridium<-ggplot(paeniclostridium_age, aes(x = Time, y = est, col = AgeCat))+
  geom_line(size = 1.5)+
  geom_ribbon(aes(ymin=lower, ymax=upper, fill = AgeCat), linetype=2, alpha=0.1)+
  coord_cartesian(ylim = c(-2,1.5))+
  scale_color_manual(values = c( "blue", "darkgrey", "red"))+
  scale_fill_manual(values = c( "blue", "darkgrey", "red"))+
  theme_bw(base_size = 14)+
  theme(axis.title.y = element_blank(), axis.text.y = element_blank())+
  xlab("")+
  # theme(legend.position = "none")+
  ggtitle("f) Paeniclostridium")+
  theme(plot.title = element_text(size=12))+
  geom_hline(yintercept = 0, linetype = "dashed")+
  labs(col='Age category')+
    guides(fill = FALSE)



cellulomonas_age<-subset(confint_summary_age, Taxa == "Cellulomonas")

cellulomonas<-ggplot(cellulomonas_age, aes(x = Time, y = est, col = AgeCat))+
  geom_line(size = 1.5)+
  geom_ribbon(aes(ymin=lower, ymax=upper, fill = AgeCat), linetype=2, alpha=0.1)+
  coord_cartesian(ylim = c(-2,1.5))+
  scale_color_manual(values = c( "blue", "darkgrey", "red"))+
  scale_fill_manual(values = c( "blue", "darkgrey", "red"))+
  theme_bw(base_size = 14)+
  theme(axis.title.y = element_blank(), axis.text.y = element_blank())+
  xlab("")+
  # theme(legend.position = "none")+
  ggtitle("g) Cellulomonas")+
  theme(plot.title = element_text(size=12))+
  geom_hline(yintercept = 0, linetype = "dashed")+
  labs(col='Age category')+
    guides(fill = FALSE)

raoultibacter_age<-subset(confint_summary_age, Taxa == "Raoultibacter")

raoultibacter<-ggplot(raoultibacter_age, aes(x = Time, y = est, col = AgeCat))+
  geom_line(size = 1.5)+
  geom_ribbon(aes(ymin=lower, ymax=upper, fill = AgeCat), linetype=2, alpha=0.1)+
  coord_cartesian(ylim = c(-2,1.5))+
  scale_color_manual(values = c( "blue", "darkgrey", "red"))+
  scale_fill_manual(values = c( "blue", "darkgrey", "red"))+
  theme_bw(base_size = 14)+
  theme(axis.title.y = element_blank(), axis.text.y = element_blank())+
  xlab("")+
  # theme(legend.position = "none")+
  ggtitle("h) Raoultibacter")+
  theme(plot.title = element_text(size=12))+
  geom_hline(yintercept = 0, linetype = "dashed")+
  labs(col='Age category')+
    guides(fill = FALSE)


Fig6def<-ggarrange(clostridium, bacteroides,paeniclostridium, cellulomonas, raoultibacter, ncol = 5,  common.legend = TRUE, legend = "right", widths = c(1.2,1,1,1,1))+
  theme(plot.margin = unit(c(0.2,3.5,0.2,0.2), "cm"))


ggarrange(Fig6top, Fig6def, ncol = 1, heights = c(1.5,1))

#ggsave("FIGURES/Figure6.pdf", width = 13, height = 8, units = "in")

This chunk took 0.09 minutes

Technical replication

meerkat_replicates<-readRDS("technical_replicate_data_phyloseq.RDS")

meerkat_replicates
## phyloseq-class experiment-level object
## otu_table()   OTU Table:          [ 3739 taxa and 32 samples ]:
## sample_data() Sample Data:        [ 32 samples by 6 sample variables ]:
## tax_table()   Taxonomy Table:     [ 3739 taxa by 7 taxonomic ranks ]:
## phy_tree()    Phylogenetic Tree:  [ 3739 tips and 3738 internal nodes ]:
## taxa are rows
### add total abundance of reads to dataframe
sample_data(meerkat_replicates)$TotalAbundance<-sample_sums(meerkat_replicates)

# estimate alpha diversity
sample_data(meerkat_replicates)$Observed<-phyloseq::estimate_richness(meerkat_replicates, measures="Observed")
sample_data(meerkat_replicates)$Observed<-sample_data(meerkat_replicates)$Observed$Observed

sample_data(meerkat_replicates)$Shannon<-phyloseq::estimate_richness(meerkat_replicates, measures="Shannon")
sample_data(meerkat_replicates)$Shannon<-sample_data(meerkat_replicates)$Shannon$Shannon

####### beta diversity 

# weighted unifrac

ord_wunifrac_mds<-ordinate(meerkat_replicates, "MDS", "wunifrac")
wunifrac_dist<-distance(meerkat_replicates, method = "wunifrac")

sample_data(meerkat_replicates)$MDS1_WU<-data.frame(ord_wunifrac_mds$vectors)$Axis.1
sample_data(meerkat_replicates)$MDS2_WU<-data.frame(ord_wunifrac_mds$vectors)$Axis.2


# unweighted

ord_uwunifrac_mds<-ordinate(meerkat_replicates, "MDS", "unifrac")
unifrac_dist<-distance(meerkat_replicates, method = "unifrac")


sample_data(meerkat_replicates)$MDS1_UU<-data.frame(ord_uwunifrac_mds$vectors)$Axis.1
sample_data(meerkat_replicates)$MDS2_UU<-data.frame(ord_uwunifrac_mds$vectors)$Axis.2

## jaccard

ord_jaccard_mds<-ordinate(meerkat_replicates, "MDS", "jaccard")
jaccard_dist<-distance(meerkat_replicates, method = "jaccard")


sample_data(meerkat_replicates)$MDS1_JC<-data.frame(ord_jaccard_mds$vectors)$Axis.1
sample_data(meerkat_replicates)$MDS2_JC<-data.frame(ord_jaccard_mds$vectors)$Axis.2

## morisita


ord_morisita_mds<-ordinate(meerkat_replicates, "MDS", "morisita")
morisita_dist<-distance(meerkat_replicates, method = "morisita")


sample_data(meerkat_replicates)$MDS1_Mor<-data.frame(ord_morisita_mds$vectors)$Axis.1
sample_data(meerkat_replicates)$MDS2_Mor<-data.frame(ord_morisita_mds$vectors)$Axis.2


replicates_df<-data.frame(sample_data(meerkat_replicates))


load_model_re<-lmer(TotalAbundance~1+(1|SampleNo), data = replicates_df)
observed_model_re<-lmer(Observed~1 + (1|SampleNo), data = replicates_df)
shannon_model_re<-lmer(Shannon~1 + (1|SampleNo), data = replicates_df)
MDS1_WU_re<-lmer(MDS1_WU~1 + (1|SampleNo), data = replicates_df)
MDS1_UU_re<-lmer(MDS1_UU~1 + (1|SampleNo), data = replicates_df)
MDS1_JC_re<-lmer(MDS2_JC~1 + (1|SampleNo), data = replicates_df)
MDS1_Mor_re<-lmer(MDS1_Mor~1 + (1|SampleNo), data = replicates_df)

## calculate contribution of sample on all diversity measures


performance::icc(load_model_re)
## # Intraclass Correlation Coefficient
## 
##      Adjusted ICC: 0.904
##   Conditional ICC: 0.904
performance::icc(observed_model_re)
## # Intraclass Correlation Coefficient
## 
##      Adjusted ICC: 0.983
##   Conditional ICC: 0.983
performance::icc(shannon_model_re)
## # Intraclass Correlation Coefficient
## 
##      Adjusted ICC: 0.971
##   Conditional ICC: 0.971
performance::icc(MDS1_WU_re)
## # Intraclass Correlation Coefficient
## 
##      Adjusted ICC: 0.996
##   Conditional ICC: 0.996
performance::icc(MDS1_UU_re)
## # Intraclass Correlation Coefficient
## 
##      Adjusted ICC: 0.988
##   Conditional ICC: 0.988
performance::icc(MDS1_JC_re)
## # Intraclass Correlation Coefficient
## 
##      Adjusted ICC: 0.989
##   Conditional ICC: 0.989
performance::icc(MDS1_Mor_re)
## # Intraclass Correlation Coefficient
## 
##      Adjusted ICC: 0.992
##   Conditional ICC: 0.992
adonis_weighted<-vegan::adonis(wunifrac_dist ~ SampleNo, data = replicates_df)
adonis_unweighted<-vegan::adonis(unifrac_dist ~ SampleNo, data = replicates_df)
adonis_jaccard<-vegan::adonis(jaccard_dist ~ SampleNo, data = replicates_df)
adonis_morisita<-vegan::adonis(morisita_dist ~ SampleNo, data = replicates_df)

adonis_weighted
## 
## Call:
## vegan::adonis(formula = wunifrac_dist ~ SampleNo, data = replicates_df) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##           Df SumsOfSqs  MeanSqs F.Model      R2 Pr(>F)    
## SampleNo  15   2.14509 0.143006  38.282 0.97289  0.001 ***
## Residuals 16   0.05977 0.003736         0.02711           
## Total     31   2.20486                  1.00000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis_unweighted
## 
## Call:
## vegan::adonis(formula = unifrac_dist ~ SampleNo, data = replicates_df) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##           Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)    
## SampleNo  15    6.5459 0.43639  4.4011 0.80492  0.001 ***
## Residuals 16    1.5865 0.09915         0.19508           
## Total     31    8.1324                 1.00000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis_jaccard
## 
## Call:
## vegan::adonis(formula = jaccard_dist ~ SampleNo, data = replicates_df) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##           Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)    
## SampleNo  15   11.6524 0.77683  8.9632 0.89365  0.001 ***
## Residuals 16    1.3867 0.08667         0.10635           
## Total     31   13.0391                 1.00000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis_morisita
## 
## Call:
## vegan::adonis(formula = morisita_dist ~ SampleNo, data = replicates_df) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##           Df SumsOfSqs MeanSqs F.Model     R2 Pr(>F)    
## SampleNo  15    9.6782 0.64521  366.58 0.9971  0.001 ***
## Residuals 16    0.0282 0.00176         0.0029           
## Total     31    9.7064                 1.0000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################### plot  #########################
####################### plot  #########################
####################### plot  #########################

### colours

mypal =     pal_lancet("lanonc", alpha = 1)(7)
mypal5<- pal_rickandmorty("schwifty")(8)
mypal6<-brewer.pal(8,"BrBG")[-1]
mypalx<-c(mypal, mypal5, mypal6)


rep_load<-ggplot(sample_data(meerkat_replicates), aes(x = fct_reorder(SampleNo, TotalAbundance, .fun = "median"), 
                                        y = TotalAbundance, group = SampleNo, shape = Storage))+
  geom_line(aes(col = SampleNo), size = 2)+
  geom_jitter( width = 0, height = 0.01, size = 4, aes(fill = SampleNo), col = "black")+
  ylab("Bacterial load estimate")+
  xlab("Sample")+
  theme_bw(base_size = 14)+
  scale_fill_manual(values = mypalx)+
  scale_color_manual(values = mypalx)+
  theme(axis.text.x = element_blank())+
  scale_shape_manual(values = c(21,24))+
  guides(fill = guide_legend(override.aes=list(shape=21)))

#### alpha diversity

rep_observed<-ggplot(sample_data(meerkat_replicates), aes(x = fct_reorder(SampleNo, Observed, .fun = "median"), 
                                                          y = Observed, group = SampleNo, shape = Storage))+
  geom_line(aes(col = SampleNo), size = 2)+
  geom_jitter(width = 0, height = 0.01, size = 4, aes(fill = SampleNo), col = "black")+
  ylab("Observed richness")+
  xlab("Sample")+
  theme_bw(base_size = 14)+
  scale_fill_manual(values = mypalx)+
  scale_color_manual(values = mypalx)+
  theme(axis.text.x = element_blank())+
  scale_shape_manual(values = c(21,24))+
  guides(fill = guide_legend(override.aes=list(shape=21)))

rep_shannon<-ggplot(sample_data(meerkat_replicates), aes(x = fct_reorder(SampleNo, Shannon, .fun = "median"), 
                                                          y = Shannon, group = SampleNo, shape = Storage))+
  geom_line(aes(col = SampleNo), size = 2)+
  geom_jitter( width = 0, height = 0.01, size = 4, aes(fill = SampleNo), col = "black")+
  ylab("Shannon index")+
  xlab("Sample")+
  theme_bw(base_size = 14)+
  scale_fill_manual(values = mypalx)+
  scale_color_manual(values = mypalx)+
  theme(axis.text.x = element_blank())+
  scale_shape_manual(values = c(21,24))+
  guides(fill = guide_legend(override.aes=list(shape=21)))

### axis 

rep_MDS1_WU<-ggplot(sample_data(meerkat_replicates), aes(x = fct_reorder(SampleNo, MDS1_WU, .fun = "median"), 
                                                         y =  MDS1_WU, group = SampleNo, shape = Storage))+
  geom_line(aes(col = SampleNo), size = 2)+
  geom_jitter( width = 0, height = 0.01, size = 4, aes(fill = SampleNo), col = "black")+
  ylab("Axis 1 Weighted Unifrac")+
  xlab("Sample")+
  theme_bw(base_size = 14)+
  scale_fill_manual(values = mypalx)+
  scale_color_manual(values = mypalx)+
  theme(axis.text.x = element_blank())+
  scale_shape_manual(values = c(21,24))+
  guides(fill = guide_legend(override.aes=list(shape=21)))



rep_MDS1_UU<-ggplot(sample_data(meerkat_replicates), aes(x = fct_reorder(SampleNo, MDS1_UU, .fun = "median"), 
                                                          y =  MDS1_UU, group = SampleNo, shape = Storage))+
  geom_line(aes(col = SampleNo), size = 2)+
  geom_jitter( width = 0, height = 0.01, size = 4, aes(fill = SampleNo), col = "black")+
  ylab("Axis 1 Unweighted Unifrac")+
  xlab("Sample")+
  theme_bw(base_size = 14)+
  scale_fill_manual(values = mypalx)+
  scale_color_manual(values = mypalx)+
  theme(axis.text.x = element_blank())+
  scale_shape_manual(values = c(21,24))+
  guides(fill = guide_legend(override.aes=list(shape=21)))


rep_MDS1_JC<-ggplot(sample_data(meerkat_replicates), aes(x = fct_reorder(SampleNo, MDS1_JC, .fun = "median"), 
                                                         y =  MDS1_JC, group = SampleNo, shape = Storage))+
  geom_line(aes(col = SampleNo), size = 2)+
  geom_jitter( width = 0, height = 0.01, size = 4, aes(fill = SampleNo), col = "black")+
  ylab("Axis 1 Jaccard")+
  xlab("Sample")+
  theme_bw(base_size = 14)+
  scale_fill_manual(values = mypalx)+
  scale_color_manual(values = mypalx)+
  theme(axis.text.x = element_blank())+
  scale_shape_manual(values = c(21,24))+
  guides(fill = guide_legend(override.aes=list(shape=21)))



rep_MDS1_Mor<-ggplot(sample_data(meerkat_replicates), aes(x = fct_reorder(SampleNo, MDS1_Mor, .fun = "median"), 
                                                         y =  MDS1_Mor, group = SampleNo, shape = Storage))+
  geom_line(aes(col = SampleNo), size = 2)+
  geom_jitter( width = 0, height = 0.01, size = 4, aes(fill = SampleNo), col = "black")+
  ylab("Axis 1 Morisita")+
  xlab("Sample")+
  theme_bw(base_size = 14)+
  scale_fill_manual(values = mypalx)+
  scale_color_manual(values = mypalx)+
  theme(axis.text.x = element_blank())+
  scale_shape_manual(values = c(21,24))+
  guides(fill = guide_legend(override.aes=list(shape=21)))

ggarrange(rep_load, rep_observed, 
            rep_MDS1_JC, 
          rep_MDS1_Mor, rep_MDS1_UU, 
          rep_MDS1_WU,
         # rep_weighted, rep_unweighted, rep_jaccard,
          nrow = 2, ncol = 3, 
          common.legend = T, legend = "right", 
          labels = c("a)", "b)", "c)", "d)", "e)", "f)"), 
          font.label = list(size = 20))

###### ordination plots


rep_weighted<-ggplot(sample_data(meerkat_replicates), aes(x = MDS1_WU, y = MDS2_WU, group = SampleNo, shape = Storage))+
  geom_line(aes(col = SampleNo), size = 2)+
  geom_point(aes(fill = SampleNo), size = 4)+
  theme_bw(base_size = 14)+
  scale_fill_manual(values = mypalx)+
  scale_color_manual(values = mypalx)+
  scale_shape_manual(values = c(21,24))+
  guides(fill = guide_legend(override.aes=list(shape=21)))+
  xlab("Axis 1 WU (36%)")+
  ylab("Axis 2 WU (13%)")


# unweighted


rep_unweighted<-ggplot(sample_data(meerkat_replicates), aes(x = MDS1_UU, y = MDS2_UU, group = SampleNo, shape = Storage))+
  geom_line(aes(col = SampleNo), size = 2)+
  geom_point(aes(fill = SampleNo), size = 4)+
  theme_bw(base_size = 14)+
  scale_fill_manual(values = mypalx)+
  scale_color_manual(values = mypalx)+
  scale_shape_manual(values = c(21,24))+
  guides(fill = guide_legend(override.aes=list(shape=21)))+
  xlab("Axis 1 UU (19%)")+
  ylab("Axis 2 UU (11%)")


## jaccard

ord_jaccard_mds$values$Relative_eig[1:4]
## [1] 0.14039363 0.09086992 0.08670688 0.07489021
rep_jaccard<-ggplot(sample_data(meerkat_replicates), aes(x = MDS1_JC, y = MDS2_JC, group = SampleNo, shape = Storage))+
  geom_line(aes(col = SampleNo), size = 2)+
  geom_point(aes(fill = SampleNo), size = 4)+
  theme_bw(base_size = 14)+
  scale_fill_manual(values = mypalx)+
  scale_color_manual(values = mypalx)+
  scale_shape_manual(values = c(21,24))+
  guides(fill = guide_legend(override.aes=list(shape=21)))+
  xlab("Axis 1 JC (14%)")+
  ylab("Axis 2 JC (9%)")

## morisita

ord_morisita_mds$values$Relative_eig[1:4]
## [1] 0.2374819 0.1885098 0.1392811 0.1335871
rep_morisita<-ggplot(sample_data(meerkat_replicates), aes(x = MDS1_Mor, y = MDS2_Mor, group = SampleNo, shape = Storage))+
  geom_line(aes(col = SampleNo), size = 2)+
  geom_point(aes(fill = SampleNo), size = 4)+
  theme_bw(base_size = 14)+
  scale_fill_manual(values = mypalx)+
  scale_color_manual(values = mypalx)+
  scale_shape_manual(values = c(21,24))+
  guides(fill = guide_legend(override.aes=list(shape=21)))+
  xlab("Axis 1 Morisita (24%)")+
  ylab("Axis 2 Morista (19%)")

ggarrange(rep_jaccard, rep_morisita, rep_unweighted, rep_weighted, ncol = 2, nrow = 2, common.legend = T, legend = "right")

This chunk took 0.19 minutes

Pilot study

## pilot study 

pilot<- readRDS("pilot_study_data_phyloseq.RDS")

pilot
## phyloseq-class experiment-level object
## otu_table()   OTU Table:          [ 899 taxa and 18 samples ]:
## sample_data() Sample Data:        [ 18 samples by 3 sample variables ]:
## tax_table()   Taxonomy Table:     [ 899 taxa by 7 taxonomic ranks ]:
## phy_tree()    Phylogenetic Tree:  [ 899 tips and 898 internal nodes ]:
## taxa are rows
pilot_rarefied<-rarefy_even_depth(pilot, sample.size = min(sample_sums(pilot)), rngsee = 100, replace = TRUE, trimOTUs=TRUE,verbose=TRUE)

core_microbiome<-speedyseq::tax_glom(pilot_rarefied, taxrank = "Family")
core_microbiome<-microbiome::core(core_microbiome, detection = 500, prevalence = 0.6)
speedyseq::plot_bar(core_microbiome, fill = "Family")+scale_fill_manual(values = mypaly)+facet_wrap(~Storage, scales = "free")

### beta diversity 

ord_wunifrac_mds<-ordinate(pilot_rarefied, "MDS", "wunifrac")
ord_unifrac_mds<-ordinate(pilot_rarefied, "MDS", "unifrac")

dist_wunifrac<-distance(pilot_rarefied, method="wunifrac")
dist_unifrac<-distance(pilot_rarefied, method="unifrac")

sample_data(pilot_rarefied)$MDS1<-data.frame(ord_wunifrac_mds$vectors)$Axis.1
sample_data(pilot_rarefied)$MDS2<-data.frame(ord_wunifrac_mds$vectors)$Axis.2
sample_data(pilot_rarefied)$MDS3<-data.frame(ord_wunifrac_mds$vectors)$Axis.3
sample_data(pilot_rarefied)$MDS1_UU<-data.frame(ord_unifrac_mds$vectors)$Axis.1
sample_data(pilot_rarefied)$MDS2_UU<-data.frame(ord_unifrac_mds$vectors)$Axis.2


ggplot(sample_data(pilot_rarefied), aes(x = MDS1, y = MDS2, group = IndividID, fill = Storage))+
  geom_line(size = 2)+
  geom_point( size = 4, pch = 21)+
  theme_bw(base_size = 14)+
  scale_fill_manual(values = mypalx)+
  scale_color_manual(values = mypalx)

ggplot(sample_data(pilot_rarefied), aes(x = MDS1_UU, y = MDS2_UU, group = IndividID, fill = Storage))+
  geom_line(size = 2)+
  geom_point( size = 4, pch = 21)+
  theme_bw(base_size = 14)+
  scale_fill_manual(values = mypalx)+
  scale_color_manual(values = mypalx)

metadata<-data.frame(sample_data(pilot_rarefied))

adonis2(dist_wunifrac~
    Storage+
    IndividID, 
    data = metadata,
     by = "margin")
adonis2(dist_unifrac~
          Storage+
          IndividID, 
        data = metadata,
        by = "margin")

This chunk took 0.04 minutes

How long did this script take?

end_time<-Sys.time()

round((end_time - start_time), 3)
## Time difference of 12.483 hours

This chunk took 0 minutes

Acknowledgements

Thanks to those who made their data and code open access for myself and others to copy and build upon. Also thanks to the many blog authors for providing explanations of code and stats that helped to build this analysis. Special thanks to Gavin Simpson for his blogs on GAMs, gratia and vegan (https://fromthebottomoftheheap.net/), which I found especially useful throughout this analysis.

Do me a favour

If you are using this data and/or code please pay it forward and put your own data and code online in a format that is easily reproducible. People who use the line ‘data/code available on reasonable request’ are takers - don’t be that person. I am a researcher that supports equal opportunities for everyone, and stand in solidarity with those negatively affected by political systems and power structures that promote sexist, racist, colonial, and homophobic attitudes.