Assessment of spatial autocorrelation

R codes to calculate Moran’s I statistic at different distances to determine if spatial autocorrelation was present in the system and at which scale for each risk-taking behavior. When spatial autocorrelation was detected, we took it into account by adapting our sampling site identity variable by grouping together sampling sites that are spatially non-independent. We runned new mixed models with this adapted version of sampling site identity as a random effect to see if different results between the models that considered or not spatial autocorrelation.

Libraries to install

library(ape)
library(pgirmess)
library(dplyr)
library(lme4)
library(MuMIn)
library(tidyverse)

Load datasets in R

df <- read.table("Behaviors.txt", header = T) # behavior measurements

LatLong <- read.table("LatLong.txt", header = T) # Coordinates for each sampling site

Sum of active defensive behaviors

Creation of dataset

We created a dataset with the mean of the behavior for each sampling site. We only kept the first behavioral test performed in the field for each turtle.

df.sp <-subset(df, Tank.Site!="T")
df.sp <- df.sp %>%
   arrange(ID, Year, JulianDay) %>%
    group_by(ID) %>%
    slice_head(n=1)

df.unique <- aggregate( active.defenses ~ Site, df.sp, mean )

We merged the coordinates with the dataset previously created

df.unique.xy <- merge(df.unique, LatLong, by = "Site", all=TRUE)

Calculating Moran’s I statistic

We tested for spatial autocorrelation by calculating Moran’s I statistic at different distance with ape and pgirmess packages.

Global index

First, we will checked if there is spatial autocorrelation along the system. If yes, we can after determine at which scale.

# create distance matrix from latitude and longitude values. 
df.dist <- as.matrix(dist(cbind(df.unique.xy$Long, df.unique.xy$Lat)))

# verify matrix dimensions identical to lat/long vector length.
dim(df.dist)

## [1] 22 22

# invert matrix values to map weights related to proximity.
samp_dists_inv <- 1/df.dist
samp_dists_inv[sapply(samp_dists_inv, is.infinite)] <- 0 #replace infinite values with zeroes.

The dimension is 22 x 22 given that we have 22 sampling sites and one measure per site.

# maximum distance between sampling sites (in meters). 
max_dist<-max(dist(cbind(df.unique.xy$Long, df.unique.xy$Lat)))
max_dist

## [1] 133332.3

This distance correspond to the distance between the two furthest sites: RR1 and CB1 sites.

moran_i_one <- ape::Moran.I(df.unique.xy$active.defenses, samp_dists_inv)   
moran_i_one

## $observed
## [1] 0.1779486
## 
## $expected
## [1] -0.04761905
## 
## $sd
## [1] 0.08173191
## 
## $p.value
## [1] 0.005782832

This test tell us that we have spatial autocorrelation in the system.

Index at different distances

# bind latitude and longitude point data vector into object.
xy2 = cbind(df.unique.xy$Long, df.unique.xy$Lat)

# create correlogram 
pgi_cor_site <- pgirmess::correlog(coords=xy2, z=df.unique.xy$active.defenses, 
                              method="Moran", nbclass=18) 
pgi_cor_site

## Moran I statistic 
##       dist.class        coef     p.value  n
##  [1,]   4925.544  0.65374234 0.003245742 30
##  [2,]  12263.076  0.05212543 0.334317372 36
##  [3,]  19600.607 -0.12617681 0.636472820 42
##  [4,]  26938.139  0.15487802 0.187433538 38
##  [5,]  34275.670  0.29256386 0.074013377 36
##  [6,]  41613.201  0.18108148 0.151562534 42
##  [7,]  48950.733 -0.19077234 0.727285768 38
##  [8,]  56288.264 -0.29117430 0.833899465 34
##  [9,]  63625.796 -0.00745516 0.426683337 28
## [10,]  70963.327  0.02020934 0.396834893 22
## [11,]  78300.859  0.05014161 0.351993066 22
## [12,]  85638.390  0.04876616 0.326846594 20
## [13,]  92975.921 -0.51305914 0.922867965 22
## [14,] 100313.453 -0.70031047 0.971662754 18
## [15,] 107650.984 -0.69533715 0.932499465 12
## [16,] 114988.516 -0.23903400 0.589588443 10
## [17,] 122326.047 -1.00800640 0.903416917  6
## [18,] 129663.579 -0.80445394 0.944656256  6

dist.class is the distance in meters between pairs of sites.

coef is the Moran’s I correlation coefficient

Given that the dataset have only one observation per site, we are not able to go under 4000 meters for the distance class. This is the maximum number of class that we can have.

Visualization of the correlogram

plot(pgi_cor_site)

Red dots are the significant correlations. The only significant correlation coefficient is the one at ~ 5 km. Thus, we need to adapt our model to consider that all sites that are located within 5 km of each other are non-independent. In our case, it’s correspond to two pairs of sites: RR3-1 with RR3-2 and RR2-2-2019 with RR2-2020. It is especially important to consider spatial autocorrelation for the sum of active defensive behaviors given that it is the only behavior of which human activities have an effect.

R Codes for this section are based from this tutorial: https://rpubs.com/Averysaurus/spatial_epi_4

Mixed model including a correction for spatial autocorrelation

Preparation of the data for the mixed models

We want to make sure that the data are in the good format for the analyses.

# Risk-taking behaviors that will be used as response variable in their respective mixed model.

df$active.defenses <- as.numeric(df$active.defenses)
df$escape <- as.numeric(df$escape)
df$log.escape <- log(df$escape + 1) # log(x+1) transformation of the escape latency to be able to use a Gaussian distribution in the mixed model.
df$emergence <- as.numeric(df$emergence)

# Predictor variables

df$Year <- as.factor(df$Year)
df$JulianDay <- as.numeric(df$JulianDay)
df$Hour.active.defenses <- as.numeric(df$Hour.active.defenses)
df$Hour.Platform <- as.numeric(df$Hour.Platform)
df$Order.trial <- as.numeric(df$Order.trial)
df$CL <- as.numeric(df$CL)
df$Wind <- as.numeric(df$Wind)
df$ID.Temperature <- as.numeric(df$ID.Temperature)

df$Distance.channel <- as.numeric(df$Distance.channel)
df$Vessels.activities.2019 <- as.numeric(df$Vessels.activities.2019)

df$House.200 <- as.numeric(df$House.200)
df$House.300 <- as.numeric(df$House.300)
df$House.400 <- as.numeric(df$House.400)

df$Urban.200 <- as.numeric(df$Urban.200)
df$Urban.600 <- as.numeric(df$Urban.600)
df$Urban.900 <- as.numeric(df$Urban.900)

All continuous predictor variables were standardized (mean zero, unit variance) before model selection.

df$JulianDay.scaled <- scale(df$JulianDay, center=TRUE, scale=TRUE)
df$Hour.active.defenses.scaled <- scale(df$Hour.active.defenses, center=TRUE, scale=TRUE)
df$Hour.Platform.scaled <- scale(df$Hour.Platform, center=TRUE, scale=TRUE)
df$Order.trial.scaled <- scale(df$Order.trial, center=TRUE, scale=TRUE)
df$CL.scaled <- scale(df$CL, center=TRUE, scale=TRUE)
df$Wind.scaled <- scale(df$Wind, center=TRUE, scale=TRUE)
df$ID.Temperature.scaled <- scale(df$ID.Temperature, center=TRUE, scale=TRUE)

df$Distance.channel.scaled <- scale(df$Distance.channel, center=TRUE, scale=TRUE)
df$Vessels.activities.2019.scaled <- scale(df$Vessels.activities.2019, center=TRUE, scale=TRUE)

df$House.200.scaled <- scale(df$House.200, center=TRUE, scale=TRUE)
df$House.300.scaled <- scale(df$House.300, center=TRUE, scale=TRUE)
df$House.400.scaled <- scale(df$House.400, center=TRUE, scale=TRUE)

df$Urban.200.scaled <- scale(df$Urban.200, center=TRUE, scale=TRUE)
df$Urban.600.scaled <- scale(df$Urban.600, center=TRUE, scale=TRUE)
df$Urban.900.scaled <- scale(df$Urban.900, center=TRUE, scale=TRUE)

In the dataset, the variable spatial.cor is the variable that take into account spatial autocorrelation. It’s correspond to sampling site identity, but, that grouped RR3-1 with RR3-2 and RR2-2-2019 with RR2-2020 to consider these two pairs of sites as the same location.

So, sampling site identity is replaced by this variable that consider spatial autocorrelation.

Verification of the assumptions with the initial model

mod.ad.full <- lmer(active.defenses ~ House.200.scaled + Distance.channel.scaled + Vessels.activities.2019.scaled + Order.trial.scaled + Sex + Tank.Site + Year + JulianDay.scaled + Hour.active.defenses.scaled + ID.Temperature.scaled + CL.scaled + (1|spatial.cor) + (1|ID), data = df, REML = TRUE, na.action=na.exclude)

summary(mod.ad.full)

## Linear mixed model fit by REML ['lmerMod']
## Formula: active.defenses ~ House.200.scaled + Distance.channel.scaled +  
##     Vessels.activities.2019.scaled + Order.trial.scaled + Sex +  
##     Tank.Site + Year + JulianDay.scaled + Hour.active.defenses.scaled +  
##     ID.Temperature.scaled + CL.scaled + (1 | spatial.cor) + (1 |      ID)
##    Data: df
## 
## REML criterion at convergence: 2264.5
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.47881 -0.52828 -0.06374  0.47668  2.85233 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  ID          (Intercept) 0.44938  0.6704  
##  spatial.cor (Intercept) 0.04379  0.2093  
##  Residual                0.32472  0.5698  
## Number of obs: 889, groups:  ID, 701; spatial.cor, 20
## 
## Fixed effects:
##                                 Estimate Std. Error t value
## (Intercept)                     0.831420   0.085567   9.717
## House.200.scaled               -0.106519   0.055101  -1.933
## Distance.channel.scaled        -0.150617   0.058653  -2.568
## Vessels.activities.2019.scaled  0.204995   0.068168   3.007
## Order.trial.scaled              0.019732   0.033280   0.593
## SexM                            0.300576   0.074484   4.035
## Tank.SiteT                     -0.200451   0.125913  -1.592
## Year2020                        0.050374   0.080185   0.628
## JulianDay.scaled               -0.006547   0.068583  -0.095
## Hour.active.defenses.scaled     0.014557   0.037260   0.391
## ID.Temperature.scaled           0.005203   0.046040   0.113
## CL.scaled                       0.118214   0.039838   2.967
## 
## Correlation of Fixed Effects:
##             (Intr) H.200. Dstn.. V..201 Ordr.. SexM   Tnk.ST Yr2020 JlnDy.
## Hs.200.scld  0.134                                                        
## Dstnc.chnn.  0.132  0.105                                                 
## Vssl..2019. -0.063 -0.070 -0.480                                          
## Ordr.trl.sc  0.179 -0.032 -0.026  0.068                                   
## SexM        -0.566  0.022 -0.016  0.008 -0.041                            
## Tank.SiteT  -0.144  0.015 -0.143  0.043  0.080 -0.039                     
## Year2020    -0.385 -0.251 -0.148  0.091 -0.128  0.038 -0.251              
## JulnDy.scld  0.113  0.213  0.255 -0.298 -0.088  0.063 -0.402  0.062       
## Hr.ctv.dfn.  0.093  0.035  0.074 -0.077  0.135  0.012 -0.542  0.182  0.145
## ID.Tmprtr.s  0.012  0.075 -0.105 -0.050 -0.005  0.006  0.367 -0.292 -0.427
## CL.scaled   -0.258 -0.052 -0.035  0.009 -0.043  0.377 -0.061  0.069  0.090
##             Hr.c.. ID.Tm.
## Hs.200.scld              
## Dstnc.chnn.              
## Vssl..2019.              
## Ordr.trl.sc              
## SexM                     
## Tank.SiteT               
## Year2020                 
## JulnDy.scld              
## Hr.ctv.dfn.              
## ID.Tmprtr.s -0.031       
## CL.scaled    0.061 -0.036

plot(mod.ad.full)

hist(residuals(mod.ad.full))

qqnorm(resid(mod.ad.full))

Everything seems fine. GVIF were already verified and the change of a random variable does not affect them.

Model selection - Random effect

First, we are testing if the identity of the turtle and the sampling site (with spatial autocorrelation) need to be added in the initial model. We are using likelihood ratio tests to test if the addition of these random variables make a significant effect on the initial model. We are using a dummy variable (same value for all the observations) to create a null mixed model to compared with the different combinations of mixed models.

REML is set at FALSE for the selection of random and fixed effects

## Null mixed model
mod.ad.null <- lmer(active.defenses ~ House.200.scaled + Distance.channel.scaled + Vessels.activities.2019.scaled + Order.trial.scaled + Sex + Tank.Site + Year + JulianDay.scaled + Hour.active.defenses.scaled + ID.Temperature.scaled + CL.scaled + (1|dummy), data = df, na.action=na.exclude, control=lmerControl(check.nlev.gtr.1="ignore"), REML = FALSE)

## only turtle identity as random variable
mod.ad.dummy.ID <- lmer(active.defenses ~ House.200.scaled + Distance.channel.scaled + Vessels.activities.2019.scaled + Order.trial.scaled + Sex + Tank.Site + Year + JulianDay.scaled + Hour.active.defenses.scaled + ID.Temperature.scaled + CL.scaled + (1|dummy) + (1|ID), data = df, control=lmerControl(check.nlev.gtr.1="ignore"), na.action=na.exclude, REML = FALSE)

## only spatial autocorrelation variable as random variable
mod.ad.dummy.spatial <- lmer(active.defenses ~ House.200.scaled + Distance.channel.scaled + Vessels.activities.2019.scaled + Order.trial.scaled + Sex + Tank.Site + Year + JulianDay.scaled + Hour.active.defenses.scaled + ID.Temperature.scaled + CL.scaled + (1|dummy) + (1|spatial.cor), data = df, control=lmerControl(check.nlev.gtr.1="ignore"), na.action=na.exclude, REML = FALSE)

## Turtle identity without the dummy variable
mod.ad.ID <- lmer(active.defenses ~ House.200.scaled + Distance.channel.scaled + Vessels.activities.2019.scaled + Order.trial.scaled + Sex + Tank.Site + Year + JulianDay.scaled + Hour.active.defenses.scaled + ID.Temperature.scaled + CL.scaled + (1|ID), data = df,  na.action=na.exclude, REML = FALSE)

## Turtle and spatial autocorrelation without the dummy variable
mod.ad.ID.spatial <- lmer(active.defenses ~ House.200.scaled + Distance.channel.scaled + Vessels.activities.2019.scaled + Order.trial.scaled + Sex + Tank.Site + Year + JulianDay.scaled + Hour.active.defenses.scaled + ID.Temperature.scaled + CL.scaled + (1|spatial.cor) + (1|ID), data = df, na.action=na.exclude, REML = FALSE)

Now, we will performed likelihood ratio tests between the mixed models. The less complex model is always first.

anova(mod.ad.null, mod.ad.dummy.ID)

anova(mod.ad.null, mod.ad.dummy.spatial)

anova(mod.ad.ID, mod.ad.ID.spatial)

According to these tests, turtle identity and the sampling site identity adjusted for spatial autocorrelation need to be keep in our models.

Model selection - Predictor variables

We are selecting the final model with the backward selection procedure. At each step, we deleted the variable with the higher p value. The deletion of each predictor variable is confirmed with a likelihood ratio test. We need to create a new dataset at each step to have only the rows with complete observations for all the preditors. It allow us to perform the likelihood ratio test that does not run between two models with not the same number of observations. In this document, compared to the MixedModels_Behaviors_CodeR script, we did not show all the steps to keep it concise. However, we followed the same steps here.

We started with the same full model:

mod.ad.full <- lmer(active.defenses ~ House.200.scaled + Distance.channel.scaled + Vessels.activities.2019.scaled + Order.trial.scaled + Sex + Tank.Site + Year + JulianDay.scaled + Hour.active.defenses.scaled + ID.Temperature.scaled + CL.scaled + (1|spatial.cor) + (1|ID), data = df, na.action=na.exclude, REML = FALSE)
summary(mod.ad.full)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: active.defenses ~ House.200.scaled + Distance.channel.scaled +  
##     Vessels.activities.2019.scaled + Order.trial.scaled + Sex +  
##     Tank.Site + Year + JulianDay.scaled + Hour.active.defenses.scaled +  
##     ID.Temperature.scaled + CL.scaled + (1 | spatial.cor) + (1 |      ID)
##    Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##   2245.4   2317.3  -1107.7   2215.4      874 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.50251 -0.54736 -0.06119  0.49942  2.86678 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  ID          (Intercept) 0.44643  0.6682  
##  spatial.cor (Intercept) 0.02343  0.1531  
##  Residual                0.32272  0.5681  
## Number of obs: 889, groups:  ID, 701; spatial.cor, 20
## 
## Fixed effects:
##                                 Estimate Std. Error t value
## (Intercept)                     0.826735   0.078395  10.546
## House.200.scaled               -0.116107   0.049519  -2.345
## Distance.channel.scaled        -0.147885   0.050080  -2.953
## Vessels.activities.2019.scaled  0.195535   0.057023   3.429
## Order.trial.scaled              0.022488   0.032764   0.686
## SexM                            0.299781   0.073855   4.059
## Tank.SiteT                     -0.182090   0.122561  -1.486
## Year2020                        0.064574   0.077589   0.832
## JulianDay.scaled                0.011709   0.062194   0.188
## Hour.active.defenses.scaled     0.018011   0.036512   0.493
## ID.Temperature.scaled           0.006366   0.044338   0.144
## CL.scaled                       0.114868   0.039364   2.918
## 
## Correlation of Fixed Effects:
##             (Intr) H.200. Dstn.. V..201 Ordr.. SexM   Tnk.ST Yr2020 JlnDy.
## Hs.200.scld  0.137                                                        
## Dstnc.chnn.  0.166  0.118                                                 
## Vssl..2019. -0.086 -0.088 -0.483                                          
## Ordr.trl.sc  0.188 -0.043 -0.033  0.083                                   
## SexM        -0.612  0.035 -0.018  0.010 -0.038                            
## Tank.SiteT  -0.152  0.006 -0.169  0.050  0.069 -0.040                     
## Year2020    -0.412 -0.256 -0.176  0.126 -0.113  0.040 -0.245              
## JulnDy.scld  0.122  0.222  0.273 -0.294 -0.105  0.071 -0.437  0.051       
## Hr.ctv.dfn.  0.094  0.049  0.081 -0.082  0.138  0.013 -0.537  0.180  0.130
## ID.Tmprtr.s  0.009  0.084 -0.118 -0.068  0.003  0.006  0.381 -0.296 -0.463
## CL.scaled   -0.276 -0.049 -0.036  0.011 -0.043  0.375 -0.062  0.070  0.105
##             Hr.c.. ID.Tm.
## Hs.200.scld              
## Dstnc.chnn.              
## Vssl..2019.              
## Ordr.trl.sc              
## SexM                     
## Tank.SiteT               
## Year2020                 
## JulnDy.scld              
## Hr.ctv.dfn.              
## ID.Tmprtr.s -0.036       
## CL.scaled    0.063 -0.047

Final model

mod.ad.final.spatial <- lmer(active.defenses ~ House.200.scaled + Distance.channel.scaled + Vessels.activities.2019.scaled + Sex + Tank.Site + CL.scaled + (1|spatial.cor) + (1|ID), data = df, REML = TRUE, na.action=na.exclude)
summary(mod.ad.final.spatial)

## Linear mixed model fit by REML ['lmerMod']
## Formula: active.defenses ~ House.200.scaled + Distance.channel.scaled +  
##     Vessels.activities.2019.scaled + Sex + Tank.Site + CL.scaled +  
##     (1 | spatial.cor) + (1 | ID)
##    Data: df
## 
## REML criterion at convergence: 2744.4
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.30633 -0.62139 -0.08848  0.52228  2.97238 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  ID          (Intercept) 0.33844  0.5818  
##  spatial.cor (Intercept) 0.05263  0.2294  
##  Residual                0.42383  0.6510  
## Number of obs: 1091, groups:  ID, 714; spatial.cor, 20
## 
## Fixed effects:
##                                Estimate Std. Error t value
## (Intercept)                     0.85624    0.07580  11.296
## House.200.scaled               -0.09769    0.05157  -1.894
## Distance.channel.scaled        -0.12227    0.05814  -2.103
## Vessels.activities.2019.scaled  0.19559    0.06587   2.969
## SexM                            0.32911    0.07000   4.701
## Tank.SiteT                     -0.21706    0.06759  -3.212
## CL.scaled                       0.11303    0.03734   3.027
## 
## Correlation of Fixed Effects:
##             (Intr) H.200. Dstn.. V..201 SexM   Tnk.ST
## Hs.200.scld  0.020                                   
## Dstnc.chnn.  0.025  0.014                            
## Vssl..2019. -0.003  0.053 -0.444                     
## SexM        -0.595  0.012 -0.025  0.028              
## Tank.SiteT  -0.117  0.045 -0.028 -0.029 -0.015       
## CL.scaled   -0.254 -0.071 -0.049  0.032  0.365  0.011

We needed to fit the REML to TRUE to calculate the summary statistics of the final model.

Calculation of the marginal and conditional variance explained by the final model

r.squaredGLMM(mod.ad.final.spatial)

##             R2m       R2c
## [1,] 0.07396558 0.5183702

marginal: only for the fixed effect. conditional: both fixed and random effect.

Calculation of the 95% confidence intervals

confint(mod.ad.final.spatial, level = 0.95, method = "Wald")

##                                      2.5 %       97.5 %
## .sig01                                  NA           NA
## .sig02                                  NA           NA
## .sigma                                  NA           NA
## (Intercept)                     0.70767922  1.004809193
## House.200.scaled               -0.19875785  0.003387839
## Distance.channel.scaled        -0.23622655 -0.008318579
## Vessels.activities.2019.scaled  0.06648181  0.324702171
## SexM                            0.19190980  0.466311225
## Tank.SiteT                     -0.34952563 -0.084591235
## CL.scaled                       0.03983751  0.186215760

We obtained very similar results compared to the original model that did not consider the presence of spatial autocorrelation. We found that the variance explained by sampling site identity was highly similar with or without the adaptation for spatial autocorrelation (without correction for spatial autocorrelation: variance = 0.051, standard deviation = 0.226; with the correction: variance = 0.053, standard deviation = 0.229). We also obtained similar results by having the same significant variables with similar effect size in the final model after model selection.

Escape latency

Creation of dataset

We created a dataset with the mean of the behavior for each sampling site. We only kept the first behavioral test performed in the field for each turtle.

df.sp <- df %>% drop_na(escape)
df.sp <-subset(df.sp, Tank.Site!="T")
df.sp <- df.sp %>%
   arrange(ID, Year, JulianDay) %>%
    group_by(ID) %>%
    slice_head(n=1)

df.unique <- aggregate(escape ~ Site, df.sp, mean )

We merged the coordinates with the dataset previously created

df.unique.xy <- merge(df.unique, LatLong, by = "Site", all=TRUE)

Calculating Moran’s I statistic

We tested for spatial autocorrelation by calculating Moran’s I statistic at different distance with ape and pgirmess packages.

Global index

First, we will checked if there is spatial autocorrelation along the system. If yes, we can after determine at which scale.

# create distance matrix from latitude and longitude values. 
df.dist <- as.matrix(dist(cbind(df.unique.xy$Long, df.unique.xy$Lat)))

# verify matrix dimensions identical to lat/long vector length.
dim(df.dist)

## [1] 22 22

# invert matrix values to map weights related to proximity.
samp_dists_inv <- 1/df.dist
samp_dists_inv[sapply(samp_dists_inv, is.infinite)] <- 0 #replace infinite values with zeroes.

The dimension is 22 x 22 given that we have 22 sampling sites and one measure per site.

# maximum distance between sampling sites (in meters). 
max_dist<-max(dist(cbind(df.unique.xy$Long, df.unique.xy$Lat)))
max_dist

## [1] 133332.3

This distance correspond to the distance between the two furthest sites: RR1 and CB1 sites.

moran_i_one <- ape::Moran.I(df.unique.xy$escape, samp_dists_inv)   
moran_i_one

## $observed
## [1] -0.01343005
## 
## $expected
## [1] -0.04761905
## 
## $sd
## [1] 0.08071731
## 
## $p.value
## [1] 0.6718833

This test tell us that we do not have spatial autocorrelation in the system for escape latency. Thus, we can stop here.

Emergence of the turtle after escaping

Creation of dataset

We created a dataset with the mean of the behavior for each sampling site. We only kept the first behavioral test performed in the field for each turtle.

df.sp <- df %>% drop_na(emergence)
df.sp <-subset(df.sp, Tank.Site!="T")
df.sp <- df.sp %>%
   arrange(ID, Year, JulianDay) %>%
    group_by(ID) %>%
    slice_head(n=1)

df.unique <- aggregate(emergence ~ Site, df.sp, mean )

We merged the coordinates with the dataset previously created

df.unique.xy <- merge(df.unique, LatLong, by = "Site", all=TRUE)

Calculating Moran’s I statistic

We tested for spatial autocorrelation by calculating Moran’s I statistic at different distance with ape and pgirmess packages.

Global index

First, we will checked if there is spatial autocorrelation along the system. If yes, we can after determine at which scale.

# create distance matrix from latitude and longitude values. 
df.dist <- as.matrix(dist(cbind(df.unique.xy$Long, df.unique.xy$Lat)))

# verify matrix dimensions identical to lat/long vector length.
dim(df.dist)

## [1] 22 22

# invert matrix values to map weights related to proximity.
samp_dists_inv <- 1/df.dist
samp_dists_inv[sapply(samp_dists_inv, is.infinite)] <- 0 #replace infinite values with zeroes.

The dimension is 22 x 22 given that we have 22 sampling sites and one measure per site.

# maximum distance between sampling sites (in meters). 
max_dist<-max(dist(cbind(df.unique.xy$Long, df.unique.xy$Lat)))
max_dist

## [1] 133332.3

This distance correspond to the distance between the two furthest sites: RR1 and CB1 sites.

moran_i_one <- ape::Moran.I(df.unique.xy$emergence, samp_dists_inv)   
moran_i_one

## $observed
## [1] 0.2256922
## 
## $expected
## [1] -0.04761905
## 
## $sd
## [1] 0.08052985
## 
## $p.value
## [1] 0.0006890164

This test tell us that we have spatial autocorrelation in the system.

Index at different distances

# bind latitude and longitude point data vector into object.
xy2 = cbind(df.unique.xy$Long, df.unique.xy$Lat)

# create correlogram 
pgi_cor_site <- pgirmess::correlog(coords=xy2, z=df.unique.xy$emergence, 
                              method="Moran", nbclass=18) 
pgi_cor_site

## Moran I statistic 
##       dist.class       coef    p.value  n
##  [1,]   4925.544  0.5265536 0.01124073 30
##  [2,]  12263.076  0.4265098 0.02307077 36
##  [3,]  19600.607  0.3428017 0.03916783 42
##  [4,]  26938.139  0.4666098 0.01184642 38
##  [5,]  34275.670  0.3077532 0.06183919 36
##  [6,]  41613.201  0.3063449 0.05315229 42
##  [7,]  48950.733 -0.2185344 0.76936446 38
##  [8,]  56288.264 -0.3218387 0.86755632 34
##  [9,]  63625.796 -0.3320568 0.84153375 28
## [10,]  70963.327 -0.1041240 0.56024045 22
## [11,]  78300.859 -0.1995201 0.67957162 22
## [12,]  85638.390 -0.5453957 0.91918655 20
## [13,]  92975.921 -0.7738750 0.99012738 22
## [14,] 100313.453 -0.7070348 0.97573166 18
## [15,] 107650.984 -0.4396790 0.80637743 12
## [16,] 114988.516 -0.7131062 0.90127054 10
## [17,] 122326.047 -0.5316877 0.75645859  6
## [18,] 129663.579 -0.5818050 0.88460531  6

dist.class is the distance in meters between pairs of sites.

coef is the Moran’s I correlation coefficient

Given that the dataset have only one observation per site, we are not able to go under 4000 meters for the distance class. This is the maximum number of class that we can have.

Visualization of the correlogram

plot(pgi_cor_site)

Red dots are the significant correlations. We can see here that spatial autocorrelation is detected up to ~ 27 km (and even 40 km). However, for this behavior, we did not previously detected an effect of any variable quantifying human disturbance and sampling site identity as a random effect. Thus, it is not relevant to adjust our model for the spatial autocorrelation detected.