---
title: 'Supplementary Information'
subtitle: 'Long-term post-fire resilience of the Ericaceous Belt, Bale Mountains, Ethiopia'
author: "Graciela Gil-Romera, Carole Adolf, Blas M. Benito, Lucas Bittner, Maria U. Johansson, David A. Grady, Henry F. Lamb, Bruk Lemma, Mekbib Fekadu, Bruno Glaser, Betelhem Mekonnen, Miguel Sevilla-Callejo, Michael Zech, Wolfgang Zech, Georg Miehe"
output:
pdf_document:
fig_caption: yes
fig_width: 9
highlight: tango
includes:
in_header: header.tex
keep_tex: yes
latex_engine: xelatex
number_sections: yes
toc: yes
toc_depth: 2
df_print: kable
html_notebook:
fig_caption: yes
highlight: haddock
toc: yes
toc_depth: 3
code_folding: show
urlcolor: blue
citation_package: natbib
---
```{r setup, echo=FALSE, message=FALSE, error=FALSE, warning=FALSE, results="hide"}
options(scipen=999)
#checking if required packages are installed, and installing them if not
#checking if required packages are installed, and installing them if not
list.of.packages <- c("png", "grid", "ggplot2", "tidyr", "viridis", "nlme", "ggdendro", "cowplot", "formatR", "rioja", "knitr")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, dep = TRUE)
#loading libraries
library(png)
library(grid)
library(ggplot2)
library(tidyr)
library(viridis)
library(nlme)
library(ggdendro)
library(cowplot)
library(formatR)
library(rioja)
library(knitr)
# setting code font size in output pdf, from https://stackoverflow.com/a/46526740
def.chunk.hook <- knitr::knit_hooks$get("chunk")
knitr::knit_hooks$set(chunk = function(x, options) {
x <- def.chunk.hook(x, options)
ifelse(options$size != "normalsize", paste0("\\", options$size,"\n\n", x, "\n\n \\normalsize"), x)
})
#trying to line-wrap code in pdf output
#from https://github.com/yihui/knitr-examples/blob/master/077-wrap-output.Rmd
knitr::opts_chunk$set(echo = TRUE, fig.pos = "h")
opts_chunk$set(tidy.opts = list(width.cutoff = 80), tidy = FALSE)
#loading functions (should be in the same folder!)
source("functions.R")
#loading data (should be in the same folder!)
load("data.RData")
#subsetting data
#removing first 60 samples
bale.data <- bale.data[61:nrow(bale.data), ]
#removing ages older than 13.7k
bale.data <- bale.data[bale.data$age <= 13.779,]
```
**Note**: The PDF version of this document only shows the most relevant code used in our analyses. The complete code can be found in the Rmd version, available at https://github.com/ggilromera/BaleFire .
#Study site and core retrieval#
Garba Guracha lake is 500 x 300 m (ca. 0,15 $km^{2}$) in size, with a maximum water depth of 6 m (reference). We recovered duplicate cores (BAL-GGU-1A, BAL-GGU-1B) using a Livingstone piston corer in February 2017, operated from a raft anchored at 5 m water depth. We reached a depth of 1680 cm (including water depth), similar to the depth previously published by Tiercelin et al. (2008). Both cores contain two sections between 12 and 13 m depth that could not be analysed as they are mainly coarse sand and gravel and could not be properly split when opening the cores or sampled. In the current study we present results from 1178 to 0 cm.
Further information on core retrieval can be found in Bittner et al.(submitted).
#Pollen and charcoal sampling#
We have analized samples between 1548 and 1472 cm and between 1178 and 0 cm, leaving out the unopen sections of the record. We analyzed thus 1525 samples from which we are presenting in this study 1179, *i.e.* from 1178 to 0 cm. We have made this decision for the sake of having a continuous record of the fire and vegetation variables.
Charcoal particles (>150 µm) have been shown to be good proxies for local (*e.g.* Clark and Royall [1] or Peters and Higuera [2]) and sometimes even regional fire regimes [3, 4]. Pollen source area (PSA) is both site and taxon-dependent as pollen productivity and dispersal patterns vary among plant species, and the size and nature of the lake and its catchment are approximately proportional to the pollen source area [5], [6]. *Erica* pollen grains are poorly dispersed, and their PSA has been proved to be essentially local [7], [8].
Charcoal particles were sampled at contiguous intervals of core GGU1B (except in the coarse sand and gravel sections), making a total of 1525 samples analysed. Charcoal particles were isolated by soaking 1- 2 cm3 sediment samples in 6% $H^{2}O^{2}$ for 48h, sieving at 150 μm and counting under a binocular microscope (×40). Charcoal counting was accomplished according to existing literature, counting opaque, angular particles [1]. We analysed and counted 275 samples of *Erica* fossil pollen grains using a modified version of the laboratory protocol of Moore (1981) and adding Lycopodium spores in a known number to estimate pollen accumulation rates (PAR) (Stockmarr, 1971). Both charcoal and pollen values were transformed to influx (accumulation rates measured as particles/$cm^{2}$ yr; CHAR and PAR respectively) in order 1) to account for the effect of different accumulation rates, 2) CHAR and PAR are better measurements of biomass burning and 3) to make them comparable when proceeding with numerical analyses.
#Chronology: age model for BAL17-GGU-1B core#
A full description of the methods used and the results obtained for the depth-age model of the GGU-IB core can be found in Bittner et al. [9]. The chronology is based on 24 radiocarbon-dated samples of bulk sediment, charcoal particles and n-alkanes from core BAL-GGU-1B and 23 samples from the surface core BAL-GGU-1A, dated by 210Pb-137Cs techniques. The age-depth model was built with the Bayesian approach implemented in the R package Bacon [10]. A summary of the samples dated and the model obtained can be found in ESM Table 1 and ESM Fig 1.
\newpage
\includepdf[pages=-]{ESM_table1.pdf}
```{r, fig.height=6, fig.cap="ESM Figure 1: Bayesian age-depth model for Garba Guracha lake (3950m asl) performed with the dates presented in ESM Table 1. More details are found in Bittner et al (7).", echo=FALSE}
grid.raster(readPNG("age_depth.png"))
```
#Clustering CHAR data#
Following Legendre and Gallagher [11], charcoal accumulation rate was square-root transformed to improve the distribution of its values, and then standardized to make them unitless.
```{r, size="small"}
#subsetting bale.data and removing NAs
bale.charcoal <- na.omit(
bale.data[, c("age", "charcoal.acc.rate")]
)
#log-transform to reduce skewness
charcoal.log <- log10(bale.charcoal$charcoal.acc.rate + 1)
#standardising for clustering
charcoal.scaled <- data.frame(scale(charcoal.log))
```
Zones for the square-root transformed CHAR and burned area sequences were calculated using constrained hierarchical clustering and checked for significance using B-Stick analysis by using the R package Rioja [11].
```{r, height=3, size="small", fig.cap="Broken stick analysis on the constrained hierarchical clustering solution provided by chclust (package rioja)."}
#Computing distance matrix for clustering with CHAR and BA only
charcoal.dist <- dist(charcoal.scaled)
#constrained hierarchical clustering
charcoal.cluster <- chclust(charcoal.dist, method="coniss")
#broken-stick analysis
charcoal.cluster.bstick <- bstick(charcoal.cluster)
```
```{r, echo=FALSE}
#finding number of groups programmatically
charcoal.cluster.bstick$minimum <- charcoal.cluster.bstick$dispersion + charcoal.cluster.bstick$bstick
number.of.groups <- charcoal.cluster.bstick[which.min(charcoal.cluster.bstick$minimum), "nGroups"]
#Separate cluster zones
charcoal.cluster.zones <- cutree(charcoal.cluster, k=number.of.groups)
#Joining grouping vector with bale.charcoal
bale.charcoal$clustering.zone <- charcoal.cluster.zones
```
\newpage
\blandscape
\thispagestyle{empty}
```{r, fig.height=5, fig.width=10, fig.cap="From top to bottom: Dendrogram of the cluster analysis showing the most relevant zones (GGU1 to GGU5); Erica PAR (grains/$cm^{2}$/year): CHAR (particles/$cm^{2}$/year); Fire number (fire/1000km2/year); FRP: Fire Radiative Power (MW/1000 $km^{2}$/year)x 103; Burned area (km2/1000$km^{2}$/year; values on top of histogram indicate mean annual burned areas per zone).", echo=FALSE, message=FALSE, warning=FALSE}
plotDiagram(pollen.data=bale.data,
fire.data=bale.charcoal,
fire.cluster=charcoal.cluster,
dendrogram=TRUE,
title="Garba Guracha (3950 masl)",
title.size=16,
text.size=6,
width=12,
height=6,
filename="Chronostratigraphical_diagram.pdf")
```
\vfill
\raisebox{0.05cm}{\makebox[\linewidth]{\thepage}}
\elandscape
```{r, echo=FALSE, message=FALSE, error=FALSE, results="hide", warning=FALSE}
#removing objects we don't need any longer
rm(charcoal.cluster, charcoal.log, fire.rotation.period, charcoal.scaled, fire.cluster.zones, fire.dist, min.age, max.age, zone, number.of.groups, fire.cluster.bstick)
#subsetting *Erica* and charcoal and rounding ages
erica <- na.omit(bale.data[, c("age","ericaceae.par")])
erica$age <- round(erica$age, 3)
char <- na.omit(bale.data[, c("age", "charcoal.acc.rate")])
char$age <- round(char$age, 3)
```
#Interpolation of charcoal into regular time intervals#
To facilitate further analyses we interpolated the charcoal record into 10 years time-intervals. We modeled charcoal as a function of age with *loess*, by finding the span value (parameters that controls de degree of smoothing) that maximized the correlation between observed and predicted charcoal values. We implemented the maximization algorithm within the function *interpolateDatasets*, which requires the definition of a time step to define the regular time intervals over which to interpolate the model result.
```{r, size="small", message=FALSE, warning=FALSE, error=FALSE, results="hide"}
#requires time to find optimum span value
char.interpolated<-interpolateDatasets(
datasets.list=list(char=char),
age.column.name="age",
interpolation.time.step=0.01 #ka
)
```
The interpolation of charcoal into 10 years time-intervals only generated 248 new samples. To improve the quality of the interpolation, we replaced the interpolated values with the observed ones where possible, as shown below.
```{r, size="small"}
#matching decimal positions in ages
char.interpolated$age<-round(char.interpolated$age, 2)
char$age <- round(char$age, 2)
#for every age in char.interpolated
for(i in char.interpolated$age){
#getting observed value for the given age
observed.i <- char[char$age == i,"charcoal.acc.rate"]
#if observed.i is not empty
if(length(observed.i) > 0){
#replacing interpolated by observed
char.interpolated[
char.interpolated$age == i,
"charcoal.acc.rate"
] <- max(observed.i) #if two or more observed values with same age
}
}
```
**Figure 4** shows the comparison between observed and interpolated charcoal time-series.
```{r, echo=FALSE, figh.height=4, fig.cap="Comparison of observed (gray) and interpolated (red) charcoal time series. Panel a shows the complete time series, while panel b shows zooms on a particular time period to show fine-grain details of the interpolation."}
#plot to compare observed and interpolated char curves
plot.1 <- ggplot() +
geom_line(data=char, aes(x=age, y=charcoal.acc.rate), col="gray70", size=0.6) +
geom_line(data=char.interpolated, aes(x=age, y=charcoal.acc.rate), color="red4", size=0.5, alpha=0.5) +
ylab("CHAR (particles/cm2 yr)") +
scale_x_continuous(breaks=seq(0, ceiling(max(bale.data$age)), by=1)) +
xlab("Age (ka BP)")
plot.1.zoom <- ggplot() +
geom_line(data=char, aes(x=age, y=charcoal.acc.rate), col="gray70", size=0.6) +
geom_line(data=char.interpolated, aes(x=age, y=charcoal.acc.rate), color="red4", size=0.5, alpha=0.5) +
ylab("") +
scale_x_continuous(breaks=seq(0, ceiling(max(bale.data$age)), by=1)) +
xlab("Age (ka BP)") +
coord_cartesian(xlim=c(12, 13.5), ylim=c(0, 20))
plot_grid(plot.1, plot.1.zoom, rel_widths = c(1, 0.6), labels = "auto")
#matching erica ages with char.interpolated ages
erica <- erica[erica$age >= min(char.interpolated$age), ]
erica <- erica[erica$age <= max(char.interpolated$age), ]
```
#Generalised Least Squares#
GLS provides robust estimates of regression parameters even when model residuals are heteroscedastic [12], a common problem of linear models fitted on time series data. We used the *gls* function of the **nlme** package to fit three different sets of models, one assessing the relationship between synchronous values of charcoal and Erica PAR, and two others assessing time-delayed relationships between charcoal and Erica, and viceversa.
##Synchronous model: concurrent effect of charcoal accumulation rate on Erica abundance##
The code below shows the procedure used to fit the synchronous model (**Equation 1**), where Erica abundance was used as response variable, and charcoal accumulation rate (interpolated to regular time) as predictor. Intercept was left free, under the assumption that under zero fire, Erica abundances would likely be higher than zero.
**Equation 1**: $$CHAR = \alpha + \beta PAR + \epsilon$$
Where:
+ $\alpha$ represents the intercept.
+ $\beta$ is the coefficient estimate.
+ $\epsilon$ is the error term.
```{r, size="small", results="hide", message=FALSE, error=FALSE, warning=FALSE}
#rounding erica ages
erica$age <- round(erica$age, 2)
#pairing samples of erica and interpolated charcoal
erica.char <- merge(erica, char.interpolated, by="age")
#fitting GLS model
erica.char.gls <- gls(ericaceae.par ~ charcoal.acc.rate,
data=erica.char)
#model summary
summary(erica.char.gls)
#pseudo R2 (gls doesn't provide R2)
erica.char.gls.R2 <- cor(erica.char$ericaceae.par,
predict(erica.char.gls))^2
```
The model showed a pseudo R-squared value (*gls* does not compute R-squared) of `r round(erica.char.gls.R2, 3)`. **Figure 5** shows the data and the fitted model.
```{r, fig.height=4.5, echo=FALSE, fig.cap="Erica abundances paired with synchronous samples of charcoal accumulation rate. Straight line shows the fit of the GLS model shown above.."}
#adding predicted and residual values to erica.char
erica.char$ericaceae.par.predicted <- predict(erica.char.gls)
#plotting erica vs char
ggplot(data=erica.char, aes(x=charcoal.acc.rate, y=ericaceae.par)) +
geom_point(shape=21, fill="gray50", color="black", size=4, alpha=0.5) +
geom_line(aes(x=charcoal.acc.rate, y=ericaceae.par.predicted), size=2, color="red4", alpha=0.6) +
xlab("CHAR (particles/cm2 yr)") +
ylab("Erica PAR (pollen grains/cm2yr)") +
ggtitle("Erica vs. CHAR") +
theme(text=element_text(size=12), plot.title=element_text(size = 16))
```
###Generating time-delayed (lagged) data##
The analysis described in the next section require the data to be organized in time-lags, which involves aligning the samples of a given response variable with antecedent values of a given predictor or predictors. In our data, if we consider first Erica abundances (PAR) to be the response variable, and interpolated charcoal accumulation rates (CHAR) to be the predictor, for a lag of 10 years, the first PAR sample, with age 0.23 ka BP, has to be paired with the antecedent CHAR sample, with age 0.24 ka BP. The process is repeated for every sample and every lag, in our case until 100 lags (1000 years) are considered.
For this study we generate two time-lagged datasets, one were Erica samples are paired with antecedent charcoal samples (named *lag.data.backward* in the code below), and one were charcoal samples are paired with antecedent Erica samples (named *lag.data.forward* in the code). We selected *backward* and *forward* as names because we were considering Erica samples as reference. Therefore, the backward dataset is to assess the effect of "past" charcoal values on Erica, while the forward dataset is to assess the effect of Erica on "future" charcoal values.
The code below uses the custom functions **backwardsLags** and **forwardLags** to generate the datasets.
```{r, size="small"}
#100 lags
lags<-1:101
#forward dataset
lag.data.forward <- forwardLags(
lags=lags,
reference.data=erica,
data.to.lag=char.interpolated
)
#backward dataset
lag.data.backward <- backwardLags(
lags=lags,
reference.data=erica,
data.to.lag=char.interpolated
)
```
**Figure 6** shows both datasets for lags 1 (10 years), 25 (250 years), 50 (500 years), 75 (750 years), and 100 (1000 years). It also shows an advance of the models that will be fitted in the following section.
```{r, echo=FALSE, fig.height=11, fig.cap="Lags 1, 25, 50, 75, , and 100 of the backward and forward datasets. Lag number can be found withing the gray strips. They have to be multiplied by 10 to convert lags into years. Lines represent linear models equivalent to those fitted in the following section."}
#preparing plotting data (8 lags only)
temp.forward <- lag.data.forward
temp.backward <- lag.data.backward
temp.forward$lag <- as.numeric(temp.forward$lag)
temp.backward$lag <- as.numeric(temp.backward$lag)
temp.forward <- temp.forward[temp.forward$lag %in% c(1, 25, 50, 75, 100),]
temp.backward <- temp.backward[temp.backward$lag%in% c(1, 25, 50, 75, 100),]
plot.past <- ggplot(data=temp.backward, aes(x=charcoal.acc.rate, y=ericaceae.par, group=lag)) +
geom_point(shape=21, fill="gray50", color="black", size=2, alpha=0.5) +
facet_wrap("lag", ncol=1) +
xlab("CHAR (particles/cm2 yr)") +
ylab("Erica PAR (pollen grains/cm2yr)") +
ggtitle("Backward (PAR ~ CHAR)") +
theme(text=element_text(size=12),
plot.title=element_text(size = 16), legend.position="none") +
geom_smooth(method = lm, size=2, color="red4", se=FALSE, aes(alpha=0.5))
plot.future <- ggplot(data=temp.forward, aes(x=ericaceae.par, y=charcoal.acc.rate)) + geom_point(shape=21, fill="gray50", color="black", size=2, alpha=0.5) +
facet_wrap("lag", ncol=1) +
xlab("Erica PAR (pollen grains/cm2yr)") +
ylab("CHAR (particles/cm2 yr)") +
ggtitle("Forward (CHAR ~ PAR)") +
theme(text=element_text(size=12), plot.title=element_text(size = 16), legend.position="none") +
geom_smooth(method = lm, size=2, color="red4", se=FALSE, aes(alpha=0.5))
print(cowplot::plot_grid(plot.past, plot.future, align="h", ncol=2))
```
##Asynchronous models: time-delayed links between Erica abundance and charcoal accumulation rate##
We aim to answer the questions: 1) Does Erica abundance have an effect on subsequent charcoal accumulation rates? 2) Are there time-delayed effects of charcoal accumulation rates on Erica abundances?;
To answer these questions we fitted two sets of *asynchronous* models described in Equations 2 and 3.
**Equation 2**: $$CHAR_{t} = \alpha + \beta PAR_{t+lag} + \epsilon$$
**Equation 3**: $$PAR_{t} = \alpha + \beta CHAR_{t+lag} + \epsilon$$
Where:
+ $t$ is the age of the given response sample.
+ $lag$, with values between 10 and 1000 (in 10 years time-steps), represents the time-span in between the response samples and the antecedent samples of the given predictor.
Equations 2 and 3 were fitted once per lag on standardized data with generalised least squares (GLS) by using the gls function of the R package nlme [31]. Pseudo R-squared and standardized coefficient estimates with their respective confidence intervals were used to assess goodness of fit. 100 null models on permutated response variables were fitted for each equation to assess statistical significance.
We fitted the models through the custom function *modelLagData*, that requires a formula, and a lagged dataset. The function automatically fits one GLS model per lag, and stores pseudo R-squared values and standardized coefficient estimates.
```{r, size="small"}
#fitting a GLS model per lag on backward datasets
backward.results <- modelLagData(
model.formula="ericaceae.par ~ charcoal.acc.rate",
lagged.data=lag.data.backward
)
#for the forward dataset
forward.results <- modelLagData(
model.formula="charcoal.acc.rate ~ ericaceae.par",
lagged.data=lag.data.forward
)
```
\newpage
To test the deviation of coefficient estimates and pseudo R-squared from random results we fitted each model 1000 times on permuted values of the response. The quantiles 0.05 and 0.95 of the resulting coefficient estimates and pseudo R-squared values were used as reference limits to differentiate random from non-random results. This operation was performed through the custom function *modelRandomLagData*, which takes a lagged dataset, a formula, and a number of iterations, and fits as many GLS models as iterations indicated, with the particularity that on each model the response is permutated. Aggregated pseudo R-squared values and standardized coefficient estimates of these models provide a robust null model to test the significance of our findings.
```{r, cache=TRUE, size="small"}
backward.results.random <- modelRandomLagData(
lagged.data=lag.data.backward,
model.formula="ericaceae.par ~ charcoal.acc.rate",
iterations=20
)
forward.results.random <- modelRandomLagData(
lagged.data=lag.data.forward,
model.formula="charcoal.acc.rate ~ ericaceae.par",
iterations=20
)
```
Coefficient estimates with their standard errors and pseudo R-squared values were extracted from each model and plotted against lagged age to facilitate the interpretation of the results. **Figure 7** shows the results of the fitted models.
```{r, echo=FALSE, fig.height=6, fig.cap="Results of the time-lagged models. Left panel represents models fitted with Equation 1, that is, the influence of antecedent values of CHAR on the pollen abundance of Erica (PAR) across time-lags up to 1000 years. The right panel represents the influence of antecedent values of Erica PAR on CHAR over the same time-lags. Yellow strips represent standardized coefficients and pseudo R-squared values obtained by models with permutated responses. Data not intersecting yellow strips is interpreted as statistically significant."}
plotModelOutput(forward.results=forward.results,
forward.results.random=forward.results.random,
backward.results=backward.results,
backward.results.random=backward.results.random,
filename="Figure_2.pdf",
width=9,
height=6,
title.size=16,
text.size=10)
```
\newpage
#Bibliography#
[1] J. S. Clark and P. D. Royall, “Local and regional sediment charcoal evidence for fire regimes in presettlement north-eastern North America,” Journal of Ecology, vol. 84, pp. 365–382, 1996.
[2] P. E. Higuera, M. E. Peters, L. B. Brubaker, and D. G. Gavin, “Understanding the origin and analysis of sediment-charcoal records with a simulation model,” Quaternary Science Reviews, vol. 26, no. 13–14, pp. 1790–1809, Jul. 2007.
[3] C. Adolf et al., “The sedimentary and remote-sensing reflection of biomass burning in Europe,” Global Ecol Biogeogr, vol. 27, no. 2, pp. 199–212, Feb. 2018.
[4] Oris, F. et al., ‘Charcoal dispersion and deposition in boreal lakes from 3 years of monitoring: Differences between local and regional fires’, Geophysical Research Letters, vol. 41, no.2, p. 6743-6752. doi: 10.1002/2014GL060984, Aug. 2014.
[5] M. B. Davis, “On the theory of pollen analysis,” 1963.
[6] M. B. Davis, “Redepostion of pollen grains in lake sediments” Limnology and Oceanography, vol. 18, no. 1, pp. 44–52, Jan. 1973.
[7] L. Schüler, A. Hemp, and H. Behling, “Relationship between vegetation and modern pollen-rain along an elevational gradient on Kilimanjaro, Tanzania,” The Holocene, p. 702-713, Mar. 2014.
[8] R. Bonnefille and G. Riollet, “The Kashiru Pollen Sequence (Burundi) Palaeoclimatic Implications for the last 40,000 yr B.P. in Tropical Africa,” Quaternary Research, vol. 30, pp. 19–35, 1988.
[9] L. Bittner, M. Bliedtner, G. Gil-Romera, David Grady, H. Lamb, B. Glaser, S. Szidat, G. Salazar, R. Zech, M. Zech, “Revisiting Lake Garba Guracha in the afro-alpine Bale Mountains of Ethiopia - rationale, chronology and geochemistry,” In prep.
[10] M. Blaauw and J. A. Christen, “Flexible paleoclimate age-depth models using an autoregressive gamma process,” Bayesian Anal., vol. 6, no. 3, pp. 457–474, Sep. 2011.
[11] P. Legendre and E. Gallagher, “Ecologically meaningful transformations for ordination of species data,” Oecologia, vol. 129, pp. 271–280, Oct. 2001.
[12] S. Juggins and M. S. Juggins, “Package ‘rioja,’” 2013.
[13] A. C. Aitken, “IV.—On Least Squares and Linear Combination of Observations,” Proceedings of the Royal Society of Edinburgh, vol. 55, pp. 42–48, 1936.