This is supporting information for the Methods in Ecology and Evolution article entitled, Integrated SDM database: enhancing the relevance and utility of species distribution models in conservation management, by Veronica F. Frans*, Amélie A. Augé, Jim Fyfe, Yuqian Zhang, Nathan McNally, Hendrik Edelhoff, Niko Balkenhol, & Jan O. Engler. Please contact the corresponding author (*) for any inquiries.
Additional evaluations of the mainland prediction were done using the SDM predictions and additional functions within Maxent: coefficient of variation (CV), multivariate environmental similarity surface (MESS), most dissimilar variable (MoD), and analysis of limiting factors (Elith, Kearney & Phillips 2010).
As our example SDM algorithm (Maxent) does not generate uncertainty measures as an output (e.g. confidence/credible intervals), using multiple Maxent runs from subsets of the occurrence data allows us to use other statistical tests, such as CV. CV (in percent) is calculated as (standard deviation / mean) X 100, and is done using the raster
package. Across runs and for each pixel of prediction values, CV informs us on the spread of a prediction from the mean (for that pixel). When mapped, areas with a relatively high CV compared to other pixels would indicate a greater level of dispersion around the mean (higher uncertainty), while a relatively low CV indicates a lower level of dispersion around the mean (lower uncertainty). Here, CV is calculated for each state (100 iterations each).
MESS compares variable values in the entire training area (the species’ current range used to train the model) against variable values across the entire prediction area (the species’ predicted range for the model prediction of suitability). Its output is a raster of the prediction area with negative and positive values, showing a pixel’s degree of similarity (positive values) and dissimilarity (negative values) from the training area. This can be used as a measure of extrapolation error.
Corresponding to the MESS grid output is the MOD grid. This indicates which variable is the most ‘responsible’ for the resulting MESS value (i.e. it is mainly this variable x’s value in location z that can cause extrapolation errors in the SDM).
MESS and MOD provide information based on the entire training area, and are not particular to the species’ locations (the presence points where a model is trained) or the SDM. Maxent (Philips and Dudik 2008) offers a way to also evaluate which variables negatively affected suitability (or probability of presence) predictions in the model prediction, based on how the model is trained, using an analysis of limiting factors. In our article, this is what we refer to as potential restoration features, as the output is a raster map of the prediction area indicating which variable at which pixel is the most limiting for each behavioural state. If the value of that variable is improved at that location, then the suitability would also increase (Elith, Kearney & Phillips 2010). This is particular to each model run, so we calculated the mode (most frequent) limiting variable across runs for each of the three behavioural states.
The final outputs of this script are:
Three raster layers of the CV of the prediction area (one for each behavioural state).
A raster MESS grid of the prediction area (mainland New Zealand).
A raster MOD grid of the prediction area.
Three raster layers of the most frequent limiting factors across the prediction area (one for each behavioural state).
The script presented here was done using R (version 4.0.5; R Core Team 2021) and its packages. We used Maxent v. 3.3.3k (Phillips, Anderson, & Schapire, 2006) and are accessing the Maxent.jar file from a local directory.
# load libraries
library('raster') # raster data
library('rgdal') # input/output; predictions; reading ASCII files
library('RColorBrewer') # colours for graphics
Change raster options to maximise memory
# modify max memory from 10gb to 30gb (this is a 32GB RAM computer)
rasterOptions(maxmemory = 3e10)
We create and use multiple folders to organise our inputs and outputs. Similar directories can be made anywhere by only changing the root directory object, dir
.
# Root directory
dir <- c('G:\\R\\NZSL_MSSDM_MCA_2019_NZ\\')
# Data directory
dat.dir <- paste0(dir,'data\\')
# Maxent.jar file directory
maxent.dir <- paste0(dir,'maxent')
# Maxent results directory
maxout.dir <- paste0(dir,'m_output\\')
# Maxent cache files (training and prediction environmental variables)
# these are a type of compressed format, created in the first model run (runs faster)
proj.cache <- paste0(dat.dir,'maxent.cache\\projection\\')
train.cache <- paste0(dat.dir,'maxent.cache\\training\\')
# Intermediate data and layers
dir.create(paste0(dir,'data\\intermediate'),recursive=TRUE)
int.dir <- paste0(dir,'data\\intermediate\\')
# Final layers
dir.create(paste0(dir,'layers'),recursive=TRUE)
lay.dir <- paste0(dir,'layers\\')
# Final tables
dir.create(paste0(dir,'tables'),recursive=TRUE)
tab.dir <- paste0(dir,'tables\\')
# Final figures
dir.create(paste0(dir,'figures'),recursive=TRUE)
fig.dir <- paste0(dir,'figures\\')
# colours for variables
var_cols <-c('#88CCEE','#44AA99','#117733','#999933','#DDCC77','#CC6677',
'#882255','#AA4499')
Load rasters and project. The default projection is set to NZDG2000/New Zealand Transverse Mercator 2000 (EPSG: 2193).
# load raster and project
get.ras <- function(r_name,
# defaults
ras_dir=dat.dir, prj=c('+init=epsg:2193')){
# read raster
r <- raster(paste0(ras_dir,'\\',r_name))
# project and return raster
crs(r) <- CRS(paste0(prj))
return(r)
}
raster
packageTo calculate CV, we gather the 100 predictions per behavioural state and use the calc()
and cv()
functions from the raster
package.
# calculate CV across 100 predictions per state
for(state in 1:3){
# output directory
maxent.out <- paste0(dir,'m_output\\',state)
# CV per state
# empty vector
maxent_rasters <- list()
# For-loop
for (run in 1:100){
# Directory
runs.out <- paste0(maxent.out,"\\",run,"\\")
# Search for ".asc" files and stack
filenames <- list.files(path=paste0(runs.out),
pattern=paste0("projection_S",state,"_",
run,".asc"),full.names=TRUE)
maxent_rasters[[run]] <- filenames
}
# stack
maxent_100_rasters <- do.call(stack, maxent_rasters)
# calculate mean
cv_proj <- calc(x=maxent_100_rasters, fun=cv)
# save
writeRaster(cv_proj,paste0(dat.dir,"\\S",state,"_CV.tif"),
options="COMPRESS=LZW",overwrite=TRUE)
}
Next, create a raster stack to extract the saved CV layers.
# Empty vector
s123_cv <- list()
# For-loop (S1-3):
for(state in 1:3){
raster_names <- list.files(path=paste0(dat.dir),
pattern=paste0("\\S",state,"_CV.tif"),
full.names=TRUE)
# Add to list
s123_cv[[state]] <- raster_names
}
# stack
s123_cv <- do.call(stack,s123_cv)
Save a plot of CV.
# plot
plt <- spplot(s123_cv,
main=c('Coefficient of variation'),
maxpixels=500000,
col.regions=c(rev(brewer.pal(10,'Spectral'))),
cuts=9,
par.settings = list(panel.background=list(col='black'))
)
# save
png(width=2400,height=1500,
paste0(fig.dir,'CV_NZ.png'),res=300)
plt
dev.off()
Plot here.
knitr::include_graphics(paste0(fig.dir,'CV_NZ.png'))
First, get usage information from Maxent.jar for calculating MESS.
Note that MESS and MOD can also be calculated using the mess()
function in the dismo
package. Here, we chose to use Maxent.jar since we were working with a finer resolution at a large scale, so Maxent’s compressed cache files (.mxe
) allowed us to run the model faster.
# Usage information:
command <- paste0(#initiate java and allocate memory
'java -mx16000m -cp ',
#maxent directory and command
maxent.dir,'\\maxent.jar density.tools.Novel')
system(sprintf(command))
Usage: Novel [-m basemask] [-c colors] [-w] basedir projdir outfile
[1] 0
Then run according to those parameters, where:
# Run MESS and MOD analysis
command <- paste0(#initiate java and allocate memory
'java -mx16000m -cp ',
#maxent directory and command
maxent.dir,'\\maxent.jar density.tools.Novel ',
#training area data
train.cache,' ',
#prediction area data
proj.cache,' ',
#output directory and file name
#MOD is automatically saved as '*_limiting.asc'
lay.dir,'\\NZ_AI_MESS.asc')
system(sprintf(command))
The output grid for the limiting factor will have numbers corresponding to the variable names in alphabetical order, where:
Next, load rasters and plot.
# grab MESS and MOD rasters
MESS <- get.ras('NZ_AI_MESS.asc', ras_dir = lay.dir)
MOD <- get.ras('NZ_AI_MESS_limiting.asc', ras_dir = lay.dir)
Plot MESS.
# plot
plt <- spplot(MESS,
main=c('MESS grid'),
maxpixels=500000,
col.regions=c(rev(brewer.pal(10,'Spectral'))),
cuts=9,
par.settings = list(panel.background=list(col='black'))
)
# save
png(width=1800,height=1800,
paste0(fig.dir,'MESS_grid_NZ.png'),res=300)
plt
dev.off()
Display here.
knitr::include_graphics(paste0(fig.dir,'MESS_grid_NZ.png'))
Plot MOD.
# plot
var_labs <- c('cliff','coast\ndist.','forest\ndist.',
'grass\ndist.','land\ncover','sand\ndist.',
'slope','water\ndist.',' ')
plt <- spplot(MOD,
main=c('MOD grid'),
maxpixels=500000,
cuts=7,
col.regions=var_cols,
colorkey=list(space='bottom',
breaks=list(0,1,2,3,4,5,6,7,8),
labels=list(at=c(0,1,1.8,2.8,3.6,4.4,5.3,6.2,7),
labels=var_labs,
cex=0.8)),
par.settings = list(panel.background=list(col='black'))
)
# save
png(width=1800,height=1800,
paste0(fig.dir,'MoD_grid_NZ.png'),res=300)
plt
dev.off()
Display here.
knitr::include_graphics(paste0(fig.dir,'MoD_grid_NZ.png'))
To calculate limiting factors, Maxent needs its background information data (.lambdas and sampleAverages.csv files) from the state/run folders, since the evaluations are done for each prediction (100 runs from each state). We moved these from our directories of the runs and put them into a folder for each state.
This code was adapted from the multi-state SDM tutorial in Frans et al. (2018).
# lambdas files
for(state in 1:3){
# create a new directory to store the .lambdas and sample average files
dir.create(paste0(maxout.dir,state,'\\lambdas'), recursive=TRUE)
lam.dir <- paste0(maxout.dir,state,'\\lambdas\\')
# lambdas files
for(run in 1:100){
# select all files of '.lambdas' extension
filenames <- list.files(paste0(maxout.dir,state,'\\',run),
pattern='*.lambdas',full.names=TRUE)
# rename the file with state number added to it
renamed <- file.rename(from=filenames,
to=sub(pattern='species.lambdas',
replacement=paste0('S',state,'_run',run,'.lambdas'),
filenames))
# find the renamed files again
new_filenames <- list.files(paste0(maxout.dir,state,'\\',run),
pattern='*.lambdas',full.names=TRUE)
# copy the '.lambdas' files and move to a new folder
file.copy(from=new_filenames,to=lam.dir,overwrite=TRUE,
recursive=FALSE,copy.mode=TRUE)
# sample average files
# select all files of '.lambdas' extension
filenames <- list.files(paste0(maxout.dir,state,'\\',run),
pattern='*sampleAverages.csv',full.names=TRUE)
# rename the file with state number added to it
renamed <- file.rename(from=filenames,
to=sub(pattern='species_sampleAverages.csv',
replacement=paste0('S',state,'_run',run,
'_sampleAverages.csv'),
filenames))
# find the renamed files again
new_filenames <- list.files(paste0(maxout.dir,state,'\\',run),
pattern='*sampleAverages.csv',full.names=TRUE)
# copy the '.lambdas' files and move to a new folder
file.copy(from=new_filenames,to=lam.dir,overwrite=TRUE,
recursive=FALSE,copy.mode=TRUE)
}
}
First, get usage information from Maxent.jar for calculating the limiting factor grid.
Note that the limiting factor grid can also be calculated using the limiting()
function in the dismo
package. Here, we chose to use Maxent.jar since we were working with a finer resolution at a large scale, so Maxent’s compressed cache files (.mxe
) allowed us to run the model faster.
# Usage information:
command <- paste0(#initiate java and allocate memory
'java -mx16000m -cp ',
#maxent directory and command
maxent.dir,'\\maxent.jar density.tools.LimitingFactor')
system(sprintf(command))
Usage: density.tools.LimitingFactor lambdafile projectiondirectory outfile
[1] 0
Then run according to those parameters, where:
Additional options are added to indicate categorical variables.
# limiting factor analysis across each state and run
for(state in 1:3){
# lambda file directory and output
lam.dir <- paste0(maxout.dir,state,'\\lambdas\\')
for (run in 1:100){
command <- paste0(#initiate java and allocate memory
'java -mx16000m -cp ',
# maxent directory and command
maxent.dir,'\\maxent.jar density.tools.LimitingFactor ',
#lambda file name
lam.dir,'\\*',run,'.lambdas ',
#prediction area data
proj.cache,' ',
#output directory and file name (by run)
lam.dir,'NZ_limiting_',run,'.asc',
#option: indicate categorical variable file names
' -t cliff_edges -t landcover',
#option: don't save image--just the ASCII
' nopictures')
system(sprintf(command))
}
}
The limiting factor grids will have numbers corresponding to the variable names in alphabetical order, where:
Next, stack the limiting factor grids and calculate the mode (most frequent limiting factor across runs) for each state.
# empty list
state_lims <- list()
# limiting factors and mode
for(state in 1:3){
# output directory
lam.dir <- paste0(maxout.dir,state,'\\lambdas\\')
# Stack all .asc files in state folder
lim_rasters <- stack(list.files(path=lam.dir,pattern='.asc',
full.names=TRUE))
# calculate mode
mode_lim <- calc(x=lim_rasters, fun=modal)
# add to list
state_lims[[state]] <- mode_lim
# save
writeRaster(mode_lim,paste0(dat.dir,'S',state,'_limiting_mode.tif'),
options='COMPRESS=LZW',overwrite=TRUE)
}
# stack the modes
limits <- do.call(stack, state_lims)
# rename layers
names(limits) <- c('S1_limits', 'S2_limits', 'S3_limits')
# plot
var_labs <- c('cliff','coast\ndist.','forest\ndist.',
'grass\ndist.','land\ncover','sand\ndist.',
'slope','water\ndist.',' ')
plt <- spplot(limits,
main=c('Most frequent limiting factors (mode)'),
maxpixels=500000,
cuts=7,
col.regions=var_cols,
colorkey=list(space='bottom',
breaks=list(0,1,2,3,4,5,6,7,8),
labels=list(labels=var_labs,
at=seq(0,8,by=1),
cex = .8)),
par.settings = list(panel.background=list(col='black'))
)
# save
png(width=2400,height=1500,
paste0(fig.dir,'limiting_factors_NZ.png'),res=300)
plt
dev.off()
Plot here
knitr::include_graphics(paste0(fig.dir,'limiting_factors_NZ.png'))
# save workspace to load later if needed
save.image('G:/R/mess_mod_lim_NZ.RData')
This concludes the evaluation of multi-state SDM uncertainties.
Next, we integrated these with human impact and locations of inquiry evaluations into an integrated SDM database. See Appendix S5.
Elith, J., M. Kearney and S. Phillips. 2010. The art of modelling range-shifting species. Methods in Ecology and Evolution 1:330-342. https://doi.org/10.1111/j.2041-210X.2010.00036.x
Phillips, S. J., Anderson, R. P., & Schapire, R. E. (2006). Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190(3), 231–259. https://doi.org/10.1016/j.ecolmodel.2005.03.026
Phillips, S.J. & Dudik, M. (2008) Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography, 31 (2), 161-175. https://doi.org/10.1111/j.0906-7590.2008.5203.x