In this document we will showcase a number of features of the epm R package, via an empirical demonstration for North American chipmunks, using datasets that are bundled with the epm R package. This demo is not exhaustive in terms of what the epm package can do, but covers the main features. In a real empirical analysis, other data preparation steps will be required to generate the inputs needed for the epm package.
The epm R package is available on CRAN. Source code, bug reporting and additional documentation can be found on the epm GitHub page.
More in-depth documentation is available via the Github wiki page.
To install the epm package from Github, you can do:
remotes::install_github("ptitle/epm")
Some features in epm benefit from other R packages if they are present, but they are not required. To take full advantage of all the options in epm, consider installing the R packages tmap, data.table and spdep.
Let’s load the epm R package We will also load the sf and terra packages for handling spatial datasets, and the ape package for handling phylogenies.
library(epm)
library(sf)
library(terra)
library(ape)
For this tutorial, we have downloaded geographic range polygons from IUCN and extracted the geographic ranges for 23 chipmunk species, which are contained in a list object called tamiasPolyList
.
Geographic analyses of biodiversity patterns should preferentially be conducted with spatial data in an equal area projection, so that latitudinal effects on area do not affect diversity patterns. We have therefore projected the IUCN range polygons to the North America Albers Equal Area Projection.
Here is what the first range polygon looks like:
tamiasPolyList[[1]]
Importantly, these list items are named according to the taxon names.
names(tamiasPolyList)
## [1] "Tamias_palmeri" "Tamias_durangae" "Tamias_merriami"
## [4] "Tamias_obscurus" "Tamias_canipes" "Tamias_alpinus"
## [7] "Tamias_amoenus" "Tamias_cinereicollis" "Tamias_dorsalis"
## [10] "Tamias_minimus" "Tamias_ochrogenys" "Tamias_panamintinus"
## [13] "Tamias_quadrimaculatus" "Tamias_quadrivittatus" "Tamias_ruficaudus"
## [16] "Tamias_rufus" "Tamias_senex" "Tamias_siskiyou"
## [19] "Tamias_sonomae" "Tamias_speciosus" "Tamias_striatus"
## [22] "Tamias_townsendii" "Tamias_umbrinus"
We will work with a geometric morphometric dataset from Zelditch et al. 2017. This data table contains species means, with species names as rownames. It has 28 rows (species) and 196 columns. The object is called tamiasTraits
.
tamiasTraits[1:5, 1:5]
We will also make use of the chipmunk phylogeny from Zelditch et al. 2015. The tree is an object of class phylo
and is called tamiasTree
.
tamiasTree
##
## Phylogenetic tree with 25 tips and 24 internal nodes.
##
## Tip labels:
## Tamias_siskiyou, Tamias_amoenus, Tamias_sonomae, Tamias_senex, Tamias_townsendii, Tamias_ochrogenys, ...
##
## Rooted; includes branch lengths.
Spatial data, trait data, and/or phylogenetic data do not need to have identical species to be used in the epm R package. However it is always a good idea to examine this sort of thing to avoid any surprises down the road.
How many species are shared between the range data and the trait dataset?
length(intersect(names(tamiasPolyList), rownames(tamiasTraits)))
## [1] 23
And which species are unique to each dataset? Here, we will use the function setdiff
, which tells us what elements are in the first item, but not the second item. Therefore, the order of the inputs matters.
setdiff(names(tamiasPolyList), rownames(tamiasTraits))
## character(0)
setdiff(rownames(tamiasTraits), names(tamiasPolyList))
## [1] "Tamias_bulleri" "Tamias_sibiricus"
## [3] "Tamiasciurus_douglasii" "Tamiasciurus_hudsonicus"
## [5] "Tamiasciurus_mearnsi"
We see that all species with geographic ranges are present in the trait dataset, but there are 5 that we have trait data for but no geographic range, and this includes 3 species from the genus Tamiasciurus, which are not chipmunks!
What about with the phylogeny?
length(intersect(names(tamiasPolyList), tamiasTree$tip.label))
## [1] 23
And which species are unique to each dataset?
setdiff(names(tamiasPolyList), tamiasTree$tip.label)
## character(0)
setdiff(tamiasTree$tip.label, names(tamiasPolyList))
## [1] "Tamias_bulleri" "Tamias_sibiricus"
How many species are found across all datasets?
length(Reduce(intersect, list(names(tamiasPolyList), rownames(tamiasTraits),
tamiasTree$tip.label)))
## [1] 23
There are a number of decisions to make when creating an epmGrid object. You can choose between hexagonal or square grid cells, you can choose from two approaches for converting polygons to grids, you need to pick a reasonable and appropriate spatial resolution, you can specify a spatial extent. You need to decide how small-ranged species should be handled. Check out the Github wiki for descriptions of these options.
Drawing your own extent
The default option when creating an epmGrid object is to let the epm package determine the extent as the smallest extent needed to contain all of the geographic ranges. Sometimes, however, it is preferable to choose your own extent. A simple way to do this is to draw it on a map, using the interactiveExtent
function.
extentPoly <- interactiveExtent(tamiasPolyList)
This function will pop up a coarse species richness map, on which you will be able to click and create a polygon. Once completed, your custom polygon will be saved to the object you assigned it to. The polygon is saved in two formats:
These polygons will be in the same units as the input spatial data. As our spatial data are in an equal area projection in meters, then that is what format the returned polygons are in.
This is what the extentPoly
object looks like:
$poly
Geometry set for 1 feature
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -2848013 ymin: -1955334 xmax: 3109146 ymax: 3672902
CRS: +proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs
POLYGON ((-2263262 3672902, -2811466 2795775, -...
$wkt
[1] "POLYGON ((-2263262 3672902, -2811466 2795775, -2848013 -18343.59, -1970886 -1553317, -1422681 -1918787, -143536.2 -1955334, 1281796 -1443676, 2085830 -456907.5, 3109146 1297348, 1903095 2320664, -2263262 3672902))"
Note that if you see the message Failed to compute min/max, no valid pixels found in sampling. (GDAL error 1)
, you can safely ignore it.
We can plot it on a map to confirm that it makes sense (We will use a world map that is included with the epm package).
plot(extentPoly$poly, border = "red", lwd = 2)
plot(st_transform(epm:::worldmap, st_crs(extentPoly$poly)), add = TRUE,
lwd = 0.5)
We can copy/paste the wkt string into our R script so that in the future we can rerun this code and use the same extent.
extentWKT <- "POLYGON ((-2263262 3672902, -2811466 2795775, -2848013 -18343.59, -1970886 -1553317, -1422681 -1918787, -143536.2 -1955334, 1281796 -1443676, 2085830 -456907.5, 3109146 1297348, 1903095 2320664, -2263262 3672902))"
Now we can create an object of class epmGrid
. This is the main object type in the epm package. The function createEPMgrid()
will take as input the IUCN geographic range polygons, as well as some specifications, such as resolution, cell type, approach for converting polygons to a gridded system, and whether or not we want to retain small-ranged taxa that would otherwise disappear.
Here we will opt for hexagonal grid cells at a spatial resolution of 50km. We will use the custom extent, and we will use the percentOverlap approach for converting to grid.
tamiasEPM <- createEPMgrid(tamiasPolyList, resolution = 50000,
cellType = "hexagon", method = "percentOverlap", extent = extentWKT)
## 1 small-ranged species was preserved:
## Tamias_palmeri
We can see a summary by just providing the name of the object.
tamiasEPM
##
## Summary of epm object:
##
## metric: spRichness
## grid type: hexagon
## number of grid cells: 4919
## grid resolution: 50000 by 50000
## projected: TRUE
## crs: +proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs
##
## number of unique species: 23 (richness range: 1 - 6)
## data present: No
## phylogeny present: No
And plotting it will generate a plot of species richness.
plot(tamiasEPM)
Later, we will see how to enhance these plots and adjust various details.
Note that we could have generated a square grid epmGrid object, which uses the terra package under the hood. As a result, creation of the epmGrid object is faster and utilizes less memory.
tamiasEPM2 <- createEPMgrid(tamiasPolyList, resolution = 50000,
cellType = "square", method = "percentOverlap", extent = extentWKT)
## 1 small-ranged species was preserved:
## Tamias_palmeri
And here is the summary:
tamiasEPM2
##
## Summary of epm object:
##
## metric: spRichness
## grid type: square
## number of grid cells: 13680
## grid resolution: 50000 by 50000
## projected: TRUE
## crs: +proj=aea +lat_0=40 +lon_0=-96 +lat_1=20 +lat_2=60 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs
##
## number of unique species: 23 (richness range: 1 - 6)
## data present: No
## phylogeny present: No
We can see how specific species have been encoded in the epmGrid object. For example, what does Tamias dorsalis look like?
We’ll plot the original range polygon and then overlay the gridded version. We’ll do this for both hexagonal and square grids
par(mfrow = c(1, 2))
plot(st_geometry(tamiasPolyList[["Tamias_dorsalis"]]))
plotSpRange(tamiasEPM, taxon = "Tamias_dorsalis", alpha = 0.75,
add = TRUE)
plot(st_geometry(tamiasPolyList[["Tamias_dorsalis"]]))
plotSpRange(tamiasEPM2, taxon = "Tamias_dorsalis", alpha = 0.75,
add = TRUE)
It is also possible to build an epmGrid object from species occurrence records rather than from polygons. (Species-specific raster grids are also accepted, as is a presence/absence matrix of species at sites. See the examples in the documentation for the createEPMgrid
function.)
To demonstrate this, we will use the rgbif R package to download occurrence records for the genus Tamias. Using the rgbif package requires a bit of setup, see the documentation.
library(rgbif)
# the taxon key for genus Tamias is 2437422
# We will request records for genus Tamias, that have
# coordinates
occ_download(pred("taxonKey", 2437422), pred("hasGeospatialIssue",
FALSE), pred("hasCoordinate", TRUE), pred("occurrenceStatus",
"PRESENT"), format = "SIMPLE_CSV")
# you get a unique id number to download the results.
occdat <- occ_download_get("XXX-XXXXXXX") %>%
occ_download_import()
occdat <- as.data.frame(occdat)
Now that we have downloaded occurrence records, we will reorganize into a table of taxon names and coordinates. For a real analysis, further filtering and quality control would be pursued.
occdat <- occdat[, c("genus", "species", "decimalLongitude",
"decimalLatitude")]
# keep only records that have a species listed
occdat <- occdat[occdat$species != "", ]
# The species field is actually genus and species
occdat$gensp <- gsub("\\s+", "_", occdat$species)
occdat <- occdat[, c("gensp", "decimalLongitude", "decimalLatitude")]
# let's see how many records we have for each species
table(occdat$gensp)
##
## Eutamias_ateles Tamias_alpinus Tamias_amoenus
## 22 500 8395
## Tamias_aristus Tamias_bulleri Tamias_canipes
## 3 152 307
## Tamias_cinereicollis Tamias_dorsalis Tamias_durangae
## 886 3955 311
## Tamias_eviensis Tamias_merriami Tamias_minimus
## 1 2041 15066
## Tamias_obscurus Tamias_ochrogenys Tamias_palmeri
## 593 329 315
## Tamias_panamintinus Tamias_quadrimaculatus Tamias_quadrivittatus
## 576 505 2300
## Tamias_ruficaudus Tamias_rufus Tamias_senex
## 601 563 1451
## Tamias_sibiricus Tamias_siskiyou Tamias_sonomae
## 14004 573 1144
## Tamias_speciosus Tamias_striatus Tamias_townsendii
## 3419 31571 3735
## Tamias_umbrinus
## 3698
We will need to rename our columns so that epm recognizes them.
colnames(occdat) <- c("taxon", "long", "lat")
We will now convert this table of coordinates to a spatial points sf object. Because our coordinates are unprojected longitude/latitude, we will specify our CRS as 4326.
occSF <- st_as_sf(occdat, coords = c("long", "lat"), crs = 4326)
occSF
It is best to work with equal area projections for diversity metrics, so we will project our points to an equal area projection. We will use the same one that was used for the geographic range polygons.
EAproj <- st_crs(tamiasPolyList[[1]])
occSF <- st_transform(occSF, EAproj)
And now we can provide this spatial points object to the createEPMgrid
function. We will use the same custom extent to filter out non-North American taxa and will choose larger grid cells.
tamiasEPM_occ <- createEPMgrid(occSF, resolution = 1e+05, cellType = "hexagon",
extent = extentWKT)
## Detected 28 taxa with point data.
##
## 1 species is being dropped.
plot(tamiasEPM_occ)
Again, some quality control may be warranted. But this epmGrid object can now be used in all subsequent epm functions.
Now that we have an epmGrid object containing the geographic data, we can add in phylogenetic and morphological attributes.
We’ve already read in the phylogeny and the morphological data, so let’s add them.
tamiasEPM <- addPhylo(tamiasEPM, tamiasTree)
## Warning in addPhylo(tamiasEPM, tamiasTree): 2 species were pruned from the phylogeny because they lack geographic data.
tamiasEPM <- addTraits(tamiasEPM, data = tamiasTraits)
## Warning in addTraits(tamiasEPM, data = tamiasTraits): 5 species were dropped from the trait data because they lack geographic data.
We got some warnings about species being dropped, but that is expected.
We can see in the summary that now it lists info regarding phylogeny and traits.
tamiasEPM
##
## Summary of epm object:
##
## metric: spRichness
## grid type: hexagon
## number of grid cells: 4919
## grid resolution: 50000 by 50000
## projected: TRUE
## crs: +proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs
##
## number of unique species: 23 (richness range: 1 - 6)
## data present: Yes
## number of species shared between data and grid: 23
## phylogeny present: Yes (1 tree)
## number of species shared between phylogeny and grid: 23
At this point, it might be a good idea to write this object to file so that we don’t have to recreate it every time we want to do an analysis in a fresh R session.
# Save
write.epmGrid(tamiasEPM, "tamiasHexEPM.rds")
In a future R session, this would be read in with:
tamiasEPM <- read.epmGrid("tamiasHexEPM.rds")
We will now calculate a couple of diversity metrics based on morphological and phylogenetic information.
But first, the epmGrid object by default retains taxa that have geographic data, even if those taxa are not present in the phylogenetic or morphological datasets. This allows for greater flexibility, but may lead to potentially misleading results if not appreciated.
Here, we want to be sure that the geographic, morphological and phylogenetic metrics are all based on the same set of taxa.
We will therefore use the function reduceToCommonTaxa()
to do just that: It will identify the set of taxa that are shared across all aspects of the epmGrid object, and will retain only those taxa.
tamiasEPM <- reduceToCommonTaxa(tamiasEPM)
tamiasEPM
##
## Summary of epm object:
##
## metric: spRichness
## grid type: hexagon
## number of grid cells: 4919
## grid resolution: 50000 by 50000
## projected: TRUE
## crs: +proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs
##
## number of unique species: 23 (richness range: 1 - 6)
## data present: Yes
## number of species shared between data and grid: 23
## phylogeny present: Yes (1 tree)
## number of species shared between phylogeny and grid: 23
In this particular case, nothing looks obviously different because there was already nearly complete overlap across data types.
Now that we have an epmGrid object that contains trait and phylogenetic data, we can calculate diversity metrics. A number of metrics are available via the gridMetrics()
function, however it is also possible to implement a metric of your choosing with the customGridMetric()
function.
For instance, let’s calculate morphological mean nearest neighbor and phylogenetic mean nearest neighbor. These represent the average similarity of each species to its most similar species in morphospace and phylogenetic space.
morphNN <- gridMetrics(tamiasEPM, metric = "mean_NN_dist")
## ...dropping species that are not in trait data...
## ...calculating multivariate metric: mean_NN_dist...
phyloNN <- gridMetrics(tamiasEPM, metric = "meanPatristicNN")
## ...dropping species that are not in phylo data...
## ...calculating phylo metric: meanPatristicNN...
We will dive into plotting objects in more detail below, but note that simply running plot(morphNN)
will produce a map of the diversity metric.
We could also calculate a metric not available in the gridMetrics()
function.
For example, the dispRity R package has a number of relevant metrics that we may be interested in. Let’s say we would like to calculate the median distance from the centroid coordinates for a set of species in a grid cell.
We can define a simple function that applies that metric to a set of taxa. In order for the epm package to properly interpret this function, we need to refer to our morphological dataset as dat
.
Since this is a measure that requires at least 2 species to be calculated, we will also specify minTaxCount = 2
.
library(dispRity)
f <- function(sp) {
summary(dispRity(dat[sp, ], metric = c(median, centroids)))$obs
}
xx <- customGridMetric(tamiasEPM, fun = f, minTaxCount = 2, metricName = "medianCentroid")
Functions are also available in the epm package to calculate beta diversity metrics based on species presence/absence (betadiv_taxonomic
), phylogenetic information (betadiv_phylogenetic
) and based on trait data (betadiv_traits
). These functions calculate some metric for a grid cell along with the neighboring grid cells defined by some radius.
For instance, here we will calculate phylogenetic turnover with a radius of 70km.
beta <- betadiv_phylogenetic(tamiasEPM, radius = 70000, component = "full")
## Registered S3 method overwritten by 'spdep':
## method from
## plot.mst ape
## gridcell neighborhoods: median 6, range 1 - 6 cells
plot(beta, lwd = 0.25)
With the epm package, it is possible to integrate phylogenetic uncertainty by calculating diversity metrics across a set of trees, and then summarizing the result.
First, let’s replace the single tree that is currently associated with the chipmunk epmGrid object with a distribution of 10 trees. These come from the mammal phylogeny from Upham et al. 2019.
tamiasTreeSet
## 10 phylogenetic trees
tamiasEPM <- addPhylo(tamiasEPM, tamiasTreeSet, replace = TRUE)
## Warning in addPhylo(tamiasEPM, tamiasTreeSet, replace = TRUE): 2 species were pruned from the phylogeny because they lack geographic data.
tamiasEPM
##
## Summary of epm object:
##
## metric: spRichness
## grid type: hexagon
## number of grid cells: 4919
## grid resolution: 50000 by 50000
## projected: TRUE
## crs: +proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs
##
## number of unique species: 23 (richness range: 1 - 6)
## data present: Yes
## number of species shared between data and grid: 23
## phylogeny present: Yes (10 trees)
## number of species shared between phylogeny and grid: 23
Notice that now it states that there are 10 trees.
For instance, let’s calculate per-grid cell phylogenetic diversity, using as input a distribution of 10 trees.
xx <- gridMetrics(tamiasEPM, metric = "pd")
The resulting object is a list of 10 epmGrid objects, essentially gridMetrics() was run on each tree.
sapply(xx, class)
## [1] "epmGrid" "epmGrid" "epmGrid" "epmGrid" "epmGrid" "epmGrid" "epmGrid"
## [8] "epmGrid" "epmGrid" "epmGrid"
We can now summarize these results, for instance by taking the mean value for each grid cell from these 10 inputs.
xxMean <- summarizeEpmGridList(xx, fun = mean)
The result is now a single epmGrid object, similarly to when we ran gridMetrics with a single tree as input.
If you plot an epmGrid
object after it was created with createEPMgrid()
, it will plot the species richness. Let’s see both hexagonal and square versions.
plot(tamiasEPM)
plot(tamiasEPM2)
And of course, an epmGrid that results from the gridMetrics()
function will plot the values for that metric.
plot(morphNN)
These plots have used all defaults, but there are a lot of options that can be manipulated.
A simple world basemap is included in the epm package, which is what has been plotted beneath the epmGrid object. Since morphological nearest neighbor distance makes no sense for single species cells, those cells have been grayed out. If you have it installed, plotting is handled by the tmap package. Although this leads to nice plots, it can be difficult to customize plots in the ways that we might be used to.
If we set use_tmap = FALSE
, then plotting will be handled by the sf package (since this is a hexagon-based grid, otherwise it would be handled by the terra package for square grids). We will switch to that to more easily customize our plot.
For one thing, the worldmap included with epm is coarse-scale, therefore, it would be possible to plot a higher quality basemap, and to then plot the epmGrid object with add = TRUE
and basemap = 'none'
.
We can also customize the legend. We will use the epm function addLegend()
that will give us quite a bit of flexibility. For defining the location of the legend, we could opt for some predefined locations like “top”, “bottom”, “topleft”, etc. Or we could specify coordinates to place it exactly where we want it. To do so, we would provide a vector of 4 values: min X, max X, min Y, max Y.
Now we will plot the epmGrid again, but this time we will omit the legend, which we will add separately. We will also make the cell borders thinner (lwd
) and we will make the cell borders light gray.
plotdat <- plot(morphNN, lwd = 0.1, border = gray(0.9), use_tmap = FALSE,
legend = FALSE)
addLegend(morphNN, params = plotdat, location = "bottom", cex = 0.5,
adj = 0.2, labelDist = 0.25)
title(main = "morphological NN dist", line = -1, cex.main = 0.75)
So far, this tutorial has focused on creating epmGrid objects and calculating various diversity metrics. Now we will see how to extract information from epmGrid objects for analysis.
Exporting a species occurrence matrix
A common format for analyses of species communities is a presence/absence matrix of sites (rows) by species (columns). The function generateOccurrenceMatrix()
will convert a epmGrid object to this format. You can use the function coordsFromEpmGrid()
to also return the coordinates (grid cell centroids) of each grid cell (in the same order as the matrix rows). These could then be used as input for other R packages and approaches.
siteMat <- generateOccurrenceMatrix(tamiasEPM, sites = "all")
siteMat[1:5, 1:5]
## Tamias_alpinus Tamias_amoenus Tamias_canipes Tamias_cinereicollis
## site1 0 0 0 0
## site2 0 0 0 0
## site3 0 0 0 0
## site4 0 0 0 0
## site5 0 0 0 0
## Tamias_dorsalis
## site1 0
## site2 0
## site3 0
## site4 0
## site5 0
siteCoords <- coordsFromEpmGrid(tamiasEPM, sites = "all")
head(siteCoords)
## x y
## [1,] -2223013 166428.2
## [2,] -2223013 253030.8
## [3,] -2223013 339633.3
## [4,] -2223013 3024312.1
## [5,] -2223013 3110914.6
## [6,] -2198013 123127.0
Querying species by location
Using the function extractFromEpmGrid()
, it is possible to get the species that are found at certain cells, at certain coordinates, or within a provided polygon. Let’s define a polygon and then get the species that are found within.
Note that coordinates are provided as c(x,y)
or c(long, lat)
. We also need to provide points or polygons in the same units as the epmGrid. Here, these polygon vertices are in meters and in the same equal area projection as the input data.
# define some points and convert to polygon via minimum
# convex hull
pts <- rbind.data.frame(c(-1631943, -214170), c(-1631943, 191789),
c(-1208890, 191789), c(-1208890, -214170), c(-1631943, -214170))
colnames(pts) <- c("x", "y")
ptsSF <- st_as_sf(pts, coords = 1:2, crs = st_crs(tamiasEPM[[1]]))
poly <- st_convex_hull(st_union(ptsSF))
Plot the polygon to make sure it makes sense.
plot(tamiasEPM, use_tmap = FALSE)
plot(poly, add = TRUE, border = "red", lwd = 2)