In this document we will regenerate the empirical analysis from the epm article.

The epm R package is available on CRAN. Source code, bug reporting and additional documentation can be found on the epm GitHub page.

To install the epm package from Github, you can do:

remotes::install_github('ptitle/epm')

Let’s start by loading a few R packages.

We will load the epm package of course, but also the sf package for handling spatial vector data. We will also load the ape package to read in a phylogeny a bit later. We will also load the rnaturalearth R package to access some basemaps.

library(epm)

# other spatial packages
library(sf)
library(ape)
library(rnaturalearth)

# packages for spatial statistics
library(spdep)
library(spatialreg)

First step: Read in all data types


Geographic data

Let’s load a landmass dataset from NaturalEarth. This is higher resolution than the worldmap bundled with the epm package, and will be helpful for making good-looking maps.

land <- ne_download(scale = 50, type = 'land', category = 'physical')
land <- st_as_sf(land)

For species geographic ranges, we have extracted the geographic ranges for North American squirrels from the IUCN mammals dataset, and provide this squirrel-specific dataset alongside this demo as a .rds file.

Let’s read that in.

squirrelIUCN <- readRDS('iucn_squirrels.rds')
squirrelIUCN

And let’s create a vector of the species names that are included.

squirrelSp <- sort(unique(squirrelIUCN$binomial))

Let’s now pull each of these species range polygons and store them separately in a list.

spList <- vector('list', length(squirrelSp))
names(spList) <- squirrelSp

for (i in 1:length(squirrelSp)) {
    ind <- which(squirrelIUCN$binomial == squirrelSp[i])
    spList[[i]] <- squirrelIUCN[ind,]
}

Projecting to equal area

We will generally want to work in an equal area projection, such that all grid cells represent the same area. For North America, the North America Albers Equal Area Projection is a reasonable choice.

EAproj <- '+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs'

# Let's now transform each range polygon to this projection. 
spListEA <- lapply(spList, function(x) st_transform(x, crs = EAproj))

We need to make some adjustments to account for synonymy issues. Here, we will replace Neotamias with Tamias.

names(spList) <- gsub('Neotamias', 'Tamias', names(spList))
names(spListEA) <- gsub('Neotamias', 'Tamias', names(spListEA))

We will also replace all spaces with underscores (not strictly necessary, but this will match the phylogenetic data later on).

names(spList) <- gsub('\\s+', '_', names(spList))
names(spListEA) <- gsub('\\s+', '_', names(spListEA))

Phylogeny

We will use the mammal phylogeny from Upham et al. 2019. There may be more accurate and/or more complete phylogenies for particular groups of organisms, but we will use this one in the demo, as it will be useful for most mammal groups.

We downloaded the phylogeny from https://data.vertlife.org/, and more specifically the file DNAonly_4098sp_topoFree_FBDasZhouEtAI.zip.

For this demo, we reduced this set of 10k trees to a random subset of 100, and we calculated the maximum clade credibility tree, so as to have a single tree that best represents this distribution. There are other ways to summarize tree distributions, and it may also be desirable to perform analyses across a set of trees, but that is beyond the scope of this demo.

Let’s load the MCC tree:

mammaltree <- read.tree('Upham_mammals_allExtant_mcctree.tre')
mammaltree
## 
## Phylogenetic tree with 4099 tips and 4098 internal nodes.
## 
## Tip labels:
##   Zaglossus_bruijnii, Tachyglossus_aculeatus, Ornithorhynchus_anatinus, Rhyncholestes_raphanurus, Lestoros_inca, Caenolestes_fuliginosus, ...
## Node labels:
##   1, 1, 1, 1, 1, 1, ...
## 
## Rooted; includes branch lengths.

How many taxa are shared between our ICUN range data and the phylogeny?

length(intersect(names(spListEA), mammaltree$tip.label))
## [1] 143

We could subset the phylogeny to squirrels here. However, this will also happen automatically later, so is not absolutely necessary.


Morphological data

We will read in geometric morphometric data for squirrels from Zelditch et al. 2017. The data table we are reading in here contains species means, with species names as rownames.

traits <- read.csv('squirrelShapeData.csv', stringsAsFactors = FALSE, row.names = 1, header = FALSE)
dim(traits)
## [1] 168 196
traits[1:5, 1:5]

How many taxa are shared between the IUCN range data and the the morphological data?

length(intersect(names(spListEA), rownames(traits)))
## [1] 116

We could subset the morphological data to squirrels for which we have geographic information. However, this will also happen automatically later.

And how many taxa are shared across the 3 datasets?

length(Reduce(intersect, list(names(spListEA), mammaltree$tip.label, rownames(traits))))
## [1] 115



Now we are ready to create an epmGrid object.

We will opt for 50km resolution hexagonal grids, using the percentOverlap approach for conversion to grid.

We used the function interactiveExtent() to draw our own extent. This function provided a map that we used to draw our own extent polygon, and it returned that polygon in the WKT format. We then copy/pasted that polygon definition here.

extentPoly <- "POLYGON ((-2434716 4964421, -4517026 2930550, -777888.2 -3950842, 1318106 -4062455, 2529022 -2468645, 3617975 -1827107, 3157318 1750423, 2430347 2726832, 133667 4876265, -2434716 4964421))"

Now we can create an object of class epmGrid. The function createEPMgrid() will take as input the IUCN geographic range polygons, as well as some specifications, such as resolution, cell type, approach for converting polygons to a gridded system, and whether or not we want to retain small-ranged taxa that would otherwise disappear (we do).

squirrelEPM <- createEPMgrid(spListEA, resolution = 50000, method = 'percentOverlap', percentThreshold = 0.25, cellType = 'hexagon', retainSmallRanges = TRUE, extent = extentPoly)
## 3 small-ranged species were preserved:
##  Tamias_palmeri
##  Tamiasciurus_mearnsi
##  Urocitellus_brunneus
## 
## 99 species are being dropped.

We can see a summary by just providing the name of the object

squirrelEPM
## 
##  Summary of epm object:
## 
##  metric: spRichness 
##  grid type:  hexagon 
##  number of grid cells: 9320 
##  grid resolution: 50000 by 50000 
##  projected: TRUE 
##  crs: +proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs 
## 
##  number of unique species: 87 (richness range: 1 - 13) 
##  data present: No 
##  phylogeny present: No

Adding phylogenetic and trait data

Now that we have an epmGrid object containing the geographic data, we can add in phylogenetic and morphological attributes.

We’ve already read in the phylogeny and the morphological data, so let’s add them.

squirrelEPM <- addPhylo(squirrelEPM, mammaltree)
## Warning in addPhylo(squirrelEPM, mammaltree): 4023 species were pruned from the phylogeny because they lack geographic data.
squirrelEPM <- addTraits(squirrelEPM, data = traits)
## Warning in addTraits(squirrelEPM, data = traits): 99 species were dropped from the trait data because they lack geographic data.

We got some warnings about lots of species being dropped, but that is expected, as the phylogeny and trait data contained many taxa that were either not squirrels, and/or did not occur within the extent polygon we had defined.

We can see in the summary that now it lists info regarding phylogeny and traits.

squirrelEPM
## 
##  Summary of epm object:
## 
##  metric: spRichness 
##  grid type:  hexagon 
##  number of grid cells: 9320 
##  grid resolution: 50000 by 50000 
##  projected: TRUE 
##  crs: +proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs 
## 
##  number of unique species: 87 (richness range: 1 - 13) 
##  data present: Yes 
##  number of species shared between data and grid: 69 
##  phylogeny present: Yes (1 tree) 
##  number of species shared between phylogeny and grid: 76
plot(squirrelEPM, title = 'sp richness')


Calculating grid metrics

We will now calculate a couple of diversity metrics based on morphological and phylogenetic information.

Here, we want to be sure that the geographic, morphological and phylogenetic metrics are all based on the same set of taxa.

We will therefore use the function reduceToCommonTaxa() to do just that: It will identify the set of taxa that are shared across all aspects of the epmGrid object, and will retain only those taxa.

squirrelEPM <- reduceToCommonTaxa(squirrelEPM)
squirrelEPM
## 
##  Summary of epm object:
## 
##  metric: spRichness 
##  grid type:  hexagon 
##  number of grid cells: 9120 
##  grid resolution: 50000 by 50000 
##  projected: TRUE 
##  crs: +proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs 
## 
##  number of unique species: 68 (richness range: 1 - 12) 
##  data present: Yes 
##  number of species shared between data and grid: 68 
##  phylogeny present: Yes (1 tree) 
##  number of species shared between phylogeny and grid: 68

We can see that the object now contains 68 taxa that have geographic, morphological and phylogenetic information.

Now let’s calculate some diversity metrics. We will calculate

  • morphological range, which is the morphological distance between the most dissimilar pair of species in each grid cell,
  • mean nearest-neighbor distance, which is the average similarity of each species to its most similar species in morphospace, and
  • mean patristic nearest neighbor distance, which is the average phylogenetic distance between each species and the most closely related species that also occurs in the grid cell.
morphRange <- gridMetrics(squirrelEPM, metric = 'range', verbose = FALSE)
morphMeanNNdist <- gridMetrics(squirrelEPM, metric = 'mean_NN_dist', verbose = FALSE)
phyloMinNNdist <- gridMetrics(squirrelEPM, metric = 'meanPatristicNN', verbose = FALSE)

Let’s plot them. Instead of the included low resolution worldmap, we will use the higher resolution basemap we downloaded at the beginnning of this tutorial.

Here we will pull out a subset of that map that is clipped to a 500km buffer around our epmGrid object.

# This code works for a hexagonal-cell epmGrid
landEA <- st_transform(land, EAproj)
landclip <- st_intersection(st_cast(st_geometry(landEA), 'MULTILINESTRING'), st_convex_hull(st_combine(st_buffer(squirrelEPM[[1]], dist = 500000))))
# This code works for a square-cell epmGrid
landEA <- st_transform(land, EAproj)
rasterBuff <- st_as_sf(terra::as.polygons(terra::ext(squirrelEPM[[1]])))
rasterBuff <- st_buffer(rasterBuff, dist = 500000)
st_crs(rasterBuff) <- st_crs(landEA)
landclip <- st_intersection(st_cast(st_geometry(landEA), 'MULTILINESTRING'), rasterBuff)
par(mfrow = c(1,3), xpd = NA)
plotdat <- plot(morphRange, lwd = 0.1, border = gray(0.9), use_tmap = FALSE, legend = FALSE, basemap = 'none')
plot(landclip, add = TRUE, lwd = 0.1)
addLegend(morphRange, params = plotdat, location = 'left', adj = 0.2)
title(main = 'morphological range', line = -1)

plotdat <- plot(morphMeanNNdist, lwd = 0.1, border = gray(0.9), use_tmap = FALSE, legend = FALSE, basemap = 'none')
plot(landclip, add = TRUE, lwd = 0.1)
addLegend(morphMeanNNdist, params = plotdat, location = 'left', adj = 0.2)
title(main = 'morph. NN dist', line = -1)

plotdat <- plot(phyloMinNNdist, lwd = 0.1, border = gray(0.9), use_tmap = FALSE, legend = FALSE, basemap = 'none')
plot(landclip, add = TRUE, lwd = 0.1)
addLegend(phyloMinNNdist, params = plotdat, location = 'left', adj = 0.2)
title(main = 'phylo NN dist', line = -1)

Extracting data from epmGrids

Each of these grid metrics is now its own epmGrid object. Since we would like to perform analyses on these, we will need grid cell values for these metrics at the same grid cells.

We will use the function tableFromEpmGrid() to extract values from all of these objects, at 1000 randomly selected locations. We will also include the original squirrelEPM object to get species richness.

cellDF <- tableFromEpmGrid(squirrelEPM, morphRange, morphMeanNNdist, phyloMinNNdist, minTaxCount = 2, n = 1000)
dim(cellDF)
## [1] 1000    6
# Let's rename the header squirrelEPM to spRichness.
colnames(cellDF)[3] <- 'spRichness'

head(cellDF)

In the resulting table, we can see that the function has returned the coordinates of cells, along with the values of the grid objects at those cells.


Now we can start exploring the relationships between these diversity metrics and species richness.



The first thing we would like to plot are morphological range, morphological mean NN distance, and phylogenetic mean NN distance, binned by species richness.

Since these metrics do not make sense for grid cell communities of 1 species, we will start at 2. For each value of species richness, we will store the diversity metric values found at the corresponding cells.

richnessRange <- 2:max(cellDF$spRichness)

rangeList   <- vector('list', length(richnessRange)) 
meanNNlist  <- vector('list', length(richnessRange))
phyloNNlist <- vector('list', length(richnessRange))

for (i in 1:length(richnessRange)) { 
    rangeList[[i]] <- cellDF[cellDF$spRichness == richnessRange[i], 'morphRange']
    meanNNlist[[i]] <- cellDF[cellDF$spRichness == richnessRange[i], 'morphMeanNNdist']
    phyloNNlist[[i]] <- cellDF[cellDF$spRichness == richnessRange[i], 'phyloMinNNdist']
}

Let’s create one of these plots. We will plot the one for mean morphological nearest neighbor distance.

We plot species richness on a log scale, and we will jitter the data to avoid some overplotting, and we will also plot the mean for each value of species richness.

plot.new()
plot.window(xlim = log(range(richnessRange)), ylim = range(meanNNlist))
axis(1, at = log(richnessRange), labels = richnessRange, lwd = 0, lwd.ticks = 1)
mtext('log species richness', side = 1, line = 2.5)
axis(2, lwd = 0, lwd.ticks = 1)
mtext('shape NN dist', side = 2, line = 2.5)
box(which = 'plot', bty = 'l')

# plot the points
points(jitter(log(cellDF$spRichness), factor = 0.5), cellDF$morphMeanNNdist, col = adjustcolor('coral', alpha.f = 0.5))

# add per-richness means 
points(log(richnessRange), sapply(meanNNlist, mean), pch = 8, cex = 1.5)

From this graphic, we can see that as species richness increases, the more morphologically similar species in those grid cells are on average more similar in high richness cells than in low richness cells. In other words, as species richness goes up, the packing of species in the morphospace also goes up.

Let’s now fit a linear model fitted to the data.

lm1 <- lm(morphMeanNNdist ~ log(spRichness), data = cellDF)

summary(lm1)
## 
## Call:
## lm(formula = morphMeanNNdist ~ log(spRichness), data = cellDF)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.068755 -0.011055  0.001252  0.008885  0.081593 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.139126   0.002734   50.89   <2e-16 ***
## log(spRichness) -0.041125   0.001875  -21.93   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02407 on 998 degrees of freedom
## Multiple R-squared:  0.3253, Adjusted R-squared:  0.3246 
## F-statistic: 481.1 on 1 and 998 DF,  p-value: < 2.2e-16

There appears to be a significant relationship between these two variables, but it is possible that the spatial structure of the data might be biasing some of our parameters.

Accounting for spatial autocorrelation

With geographic datasets, grid cells that are geographically close will be more likely to be similar in value than cells that are further apart. This spatial autocorrelation means that neighboring grid cells may not be statistically independent, and this can potentially influence our interpretation of the relationships between variables.

First, let’s quantify the degree of spatial autocorrelation in our datasets. We will assesss the degree of spatial autocorrelation in the residuals of the linear model that we just fit between morphological nearest neighbor and log species richness.

We will plot a correlogram, which plots Moran’s I, a measure of autocorrelation, in relation to spatial distance classes. Specifically, we will assess spatial autocorrelation in the residuals of our linear model in 100km increments.

We will use the correlog function from the ncf R package.

co1 <- ncf::correlog(cellDF$x, cellDF$y, residuals(lm1), increment = 100000, resamp = 0, latlon = F, na.rm = TRUE)

plot(co1$mean.of.class, co1$correlation, type = 'b', cex = 1, ylab = 'correlation', xlab = 'distance class (m)', ylim = c(-3, 3), pch = 20)
abline(h = 0, lty = 2)
title('OLS regression')

Here a test is performed for each distance class to determine whether there is statistically significant autocorrelation. We can see that a majority of distance classes are significantly different from zero.

Given that we are focused on geographically close grid cells being similar, we will concern ourselves primarily with the positive values at smaller distance classes on the left, and we won’t worry too much about the larger negative correlations at greater distances on the right.

Spatial modeling

We will now fit a spatial simultaneous autoregressive (SAR) model. We will use functions from the spdep and spatialreg R packages. this type of model incorporates the spatial structure of the data into the error term.

The spatial structure is defined by an identification of which grid cell is considered a neighbor to another grid cell. We will test several different ways of defining these cell neighborhoods, and we will ultimately pick the best one via AIC model selection.

Let’s generate these neighbor structures.

# define distance options in meters (as our coordinates are in m)
d <- seq(50000, 250000, by = 50000)
neiStyle = 'W'

neiList <- vector('list', (length(d) * length(neiStyle)))
counter <- 1

# a couple of distance-based neighbors
for (i in 1:length(d)) {
    for (j in 1:length(neiStyle)) {
        tmp <- dnearneigh(as.matrix(cellDF[, c('x', 'y')]), d1 = 0, d2 = d[i], longlat = FALSE)
        neiList[[counter]] <- nb2listw(tmp, style=neiStyle, zero.policy = TRUE)
        names(neiList)[counter] <- paste0('dist_', d[i], '_', neiStyle[j])
        counter <- counter + 1
    }
}

# one graph-based neighborhood
for (j in 1:length(neiStyle)) {
    tmp <- graph2nb(gabrielneigh(as.matrix(cellDF[, c('x', 'y')])), sym = TRUE)
    neiList[[counter]] <- nb2listw(tmp, style=neiStyle[j], zero.policy=TRUE)
    names(neiList)[counter] <- paste0('gabriel', '_', neiStyle[j])
    counter <- counter + 1
}

# another graph-based neighborhood
for (j in 1:length(neiStyle)) {
    tmp <- graph2nb(relativeneigh(as.matrix(cellDF[, c('x', 'y')])), sym = TRUE)
    neiList[[counter]] <- nb2listw(tmp, style=neiStyle[j], zero.policy=TRUE)
    names(neiList)[counter] <- paste0('relative', '_', neiStyle[j])
    counter <- counter + 1
}

# another graph-based neighborhood
for (j in 1:length(neiStyle)) {
    tmp <-  graph2nb(soi.graph(tri2nb(as.matrix(cellDF[, c('x', 'y')])), as.matrix(cellDF[, c('x', 'y')])))
    neiList[[counter]] <- nb2listw(tmp, style=neiStyle[j], zero.policy=TRUE)
    names(neiList)[counter] <- paste0('sphereInfluence', '_', neiStyle[j])
    counter <- counter + 1
}

# k nearest neighbors
k <- seq(1, 20, by = 1)
for (i in 1:length(k)) {
    for (j in 1:length(neiStyle)) {
        tmp <- knn2nb(knearneigh(as.matrix(cellDF[, c('x', 'y')]), k = k[i], longlat = FALSE), sym = TRUE)
        neiList[[counter]] <- nb2listw(tmp, style=neiStyle[j], zero.policy = TRUE)
        names(neiList)[counter] <- paste0('kNN', k[i], '_', neiStyle[j])
        counter <- counter + 1
    }
}

Now we will define a function that, for a given fitted linear model, will fit SAR models with these different neighbor structures, and determine which is the best performing model, which will then be returned.

We will select the best model by minimizing the AIC.

SARmodelSelect <- function(lm) {

    SARmodelSelect <- as.data.frame(matrix(nrow = length(neiList), ncol = 3))
    colnames(SARmodelSelect) <- c('nb','AIC')
    for (i in 1:length(neiList)) {
        # message('\t', names(neiList)[i])
        sar_e <- errorsarlm(lm, listw = neiList[[i]], na.action = na.omit, zero.policy = TRUE)
        SARmodelSelect[i, 'nb'] <- names(neiList)[i]
        SARmodelSelect[i, 'AIC'] <- AIC(sar_e)
    }

    # add delta AIC and AIC weights
    SARmodelSelect$dAIC <- SARmodelSelect$AIC - min(SARmodelSelect$AIC)
    SARmodelSelect$wAIC = exp(-0.5 * SARmodelSelect$dAIC) / sum(exp(-0.5 * SARmodelSelect$dAIC))
    SARmodelSelect$dAIC <- round(SARmodelSelect$dAIC, 2)
    SARmodelSelect$wAIC <- round(SARmodelSelect$wAIC, 4)
    SARmodelSelect <- SARmodelSelect[order(SARmodelSelect$AIC),]
    
    # model selection via AIC
    bestNei <- neiList[[SARmodelSelect[1,1]]]

    message('\t\tbest spatial weights: ', SARmodelSelect[1,1])

    # fit best SAR model
    errorsarlm(lm, listw = bestNei, na.action = na.omit, zero.policy = TRUE)
}

Now we will apply this function to each of our 3 response variables.

rangeSAR <- SARmodelSelect(lm(morphRange ~ log(spRichness), data = cellDF))
##      best spatial weights: sphereInfluence_W
morphNNdistSAR <- SARmodelSelect(lm(morphMeanNNdist ~ log(spRichness), data = cellDF))
##      best spatial weights: sphereInfluence_W
phyloNNdistSAR <- SARmodelSelect(lm(phyloMinNNdist ~ log(spRichness), data = cellDF))
##      best spatial weights: sphereInfluence_W

Let’s again plot a correlogram, but this time we’ll show side-by-side the spatial autocorrelation in the residuals from a standard linear model, and in the residuals of the SAR model.

lm1 <- lm(morphMeanNNdist ~ log(spRichness), data = cellDF)

# Let's compare correlograms
co1 <- ncf::correlog(cellDF$x, cellDF$y, residuals(lm1), increment = 100000, resamp = 0, latlon = F, na.rm=TRUE)
co2 <- ncf::correlog(cellDF$x, cellDF$y, residuals(morphNNdistSAR), increment = 100000, resamp = 0, latlon = F, na.rm=TRUE)

par(mfrow=c(1,2))

plot(co1$mean.of.class, co1$correlation, type = 'b', cex = 1, ylab = 'correlation', xlab = 'distance class (m)', ylim = c(-3, 3), pch = 20)
abline(h = 0, lty = 2)
title('OLS regression')

plot(co2$mean.of.class, co2$correlation, type = 'b', cex = 1, ylab = 'correlation', xlab = 'distance class (m)', ylim = c(-3, 3), pch = 20)
abline(h = 0, lty = 2)
title('SAR')

We can see that the SAR model dramatically reduced the degree of spatial autocorrelation.

Let’s plot the residuals of the models geographically.

divCol <- colorRampPalette(c('#998ec3', '#f7f7f7', '#f1a340'))
pts <- st_as_sf(cellDF, coords = c('x', 'y'), crs = st_crs(EAproj))
pts$lm_residuals <- residuals(lm1)
pts$sar_residuals <- residuals(morphNNdistSAR)

plot(pts[, c('lm_residuals', 'sar_residuals')], pal = divCol, pch = 20, cex = 0.75, key.pos = 4)

Let’s look at the model summary.

summary(morphNNdistSAR, Nagelkerke = TRUE)
## 
## Call:errorsarlm(formula = lm, listw = bestNei, na.action = na.omit, 
##     zero.policy = TRUE)
## 
## Residuals:
##         Min          1Q      Median          3Q         Max 
## -0.08747089 -0.00587442  0.00044461  0.00511269  0.10963006 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                   Estimate Std. Error z value  Pr(>|z|)
## (Intercept)      0.1418007  0.0034726  40.834 < 2.2e-16
## log(spRichness) -0.0427022  0.0021729 -19.652 < 2.2e-16
## 
## Lambda: 0.72565, LR test value: 750.52, p-value: < 2.22e-16
## Asymptotic standard error: 0.018835
##     z-value: 38.526, p-value: < 2.22e-16
## Wald statistic: 1484.3, p-value: < 2.22e-16
## 
## Log likelihood: 2683.931 for error model
## ML residual variance (sigma squared): 0.00021785, (sigma: 0.01476)
## Nagelkerke pseudo-R-squared: 0.68145 
## Number of observations: 1000 
## Number of parameters estimated: 4 
## AIC: -5359.9, (AIC for lm: -4611.3)

Comparing to the linear model above, we can see that our model parameters are very similar, indicating that the spatial structure in our dataset does not appear to be dramatically biasing our understanding of the relationship between these two variables.

We can take the fitted values under the SAR models as values for our response variables where the effect of spatial autocorrelation has been removed. Let’s again organize those into lists by number of species.

richnessRange <- 2:max(cellDF$spRichness)

rangeListFitted <- vector('list', length(richnessRange))
meanNNlistFitted <- vector('list', length(richnessRange))
phyloNNlistFitted <- vector('list', length(richnessRange))

for (i in 1:length(richnessRange)) {
    rangeListFitted[[i]] <- rangeSAR$fitted.values[cellDF$spRichness == richnessRange[i]]
    meanNNlistFitted[[i]] <- morphNNdistSAR$fitted.values[cellDF$spRichness == richnessRange[i]]
    phyloNNlistFitted[[i]] <- phyloNNdistSAR$fitted.values[cellDF$spRichness == richnessRange[i]]
}

For morphological nearest neighbor, let’s compare the plot for the original data and for the SAR fitted values. We’ll plot the SAR fitted data in blue.

par(mfrow = c(1,2))

plot.new()
plot.window(xlim = log(range(richnessRange)), ylim = range(unlist(c(meanNNlist, meanNNlistFitted))))
axis(1, at = log(min(cellDF$spRichness):max(cellDF$spRichness)), labels = min(cellDF$spRichness):max(cellDF$spRichness), lwd = 0, lwd.ticks = 1)
mtext('log species richness', side = 1, line = 2.5)
axis(2, lwd = 0, lwd.ticks = 1)
mtext('raw values', side = 2, line = 2.5)
box(which = 'plot', bty = 'l')

# plot the points
points(jitter(log(cellDF$spRichness), factor = 0.5), cellDF$morphMeanNNdist, col = adjustcolor('gray70', alpha.f = 0.5))

# add per-richness means 
points(log(richnessRange), sapply(meanNNlist, mean), pch = 8)



# fitted values
plot.new()
plot.window(xlim = log(range(richnessRange)), ylim = range(unlist(c(meanNNlist, meanNNlistFitted))))
axis(1, at = log(min(cellDF$spRichness):max(cellDF$spRichness)), labels = min(cellDF$spRichness):max(cellDF$spRichness), lwd = 0, lwd.ticks = 1)
mtext('log species richness', side = 1, line = 2.5)
axis(2, lwd = 0, lwd.ticks = 1)
mtext('SAR fitted values', side = 2, line = 2.5)
box(which = 'plot', bty = 'l')

for (i in 1:length(richnessRange)) {
    points(jitter(rep(log(richnessRange[i]), length(meanNNlistFitted[[i]])), factor = 0.5), meanNNlistFitted[[i]], col = adjustcolor('gray70', alpha.f = 0.75))
}

# add per-richness means 
points(log(richnessRange), sapply(meanNNlistFitted, mean), pch = 8)