This is supporting information for the Methods in Ecology and Evolution article entitled, Integrated SDM database: enhancing the relevance and utility of species distribution models in conservation management, by Veronica F. Frans*, Amélie A. Augé, Jim Fyfe, Yuqian Zhang, Nathan McNally, Hendrik Edelhoff, Niko Balkenhol, & Jan O. Engler. Please contact the corresponding author (*) for any inquiries.

1. Methods summary

Additional evaluations of the mainland prediction were done using the SDM predictions and additional functions within Maxent: coefficient of variation (CV), multivariate environmental similarity surface (MESS), most dissimilar variable (MoD), and analysis of limiting factors (Elith, Kearney & Phillips 2010).

As our example SDM algorithm (Maxent) does not generate uncertainty measures as an output (e.g. confidence/credible intervals), using multiple Maxent runs from subsets of the occurrence data allows us to use other statistical tests, such as CV. CV (in percent) is calculated as (standard deviation / mean) X 100, and is done using the raster package. Across runs and for each pixel of prediction values, CV informs us on the spread of a prediction from the mean (for that pixel). When mapped, areas with a relatively high CV compared to other pixels would indicate a greater level of dispersion around the mean (higher uncertainty), while a relatively low CV indicates a lower level of dispersion around the mean (lower uncertainty). Here, CV is calculated for each state (100 iterations each).

MESS compares variable values in the entire training area (the species’ current range used to train the model) against variable values across the entire prediction area (the species’ predicted range for the model prediction of suitability). Its output is a raster of the prediction area with negative and positive values, showing a pixel’s degree of similarity (positive values) and dissimilarity (negative values) from the training area. This can be used as a measure of extrapolation error.

Corresponding to the MESS grid output is the MOD grid. This indicates which variable is the most ‘responsible’ for the resulting MESS value (i.e. it is mainly this variable x’s value in location z that can cause extrapolation errors in the SDM).

MESS and MOD provide information based on the entire training area, and are not particular to the species’ locations (the presence points where a model is trained) or the SDM. Maxent (Philips and Dudik 2008) offers a way to also evaluate which variables negatively affected suitability (or probability of presence) predictions in the model prediction, based on how the model is trained, using an analysis of limiting factors. In our article, this is what we refer to as potential restoration features, as the output is a raster map of the prediction area indicating which variable at which pixel is the most limiting for each behavioural state. If the value of that variable is improved at that location, then the suitability would also increase (Elith, Kearney & Phillips 2010). This is particular to each model run, so we calculated the mode (most frequent) limiting variable across runs for each of the three behavioural states.

The final outputs of this script are:

  1. Three raster layers of the CV of the prediction area (one for each behavioural state).

  2. A raster MESS grid of the prediction area (mainland New Zealand).

  3. A raster MOD grid of the prediction area.

  4. Three raster layers of the most frequent limiting factors across the prediction area (one for each behavioural state).

2. R Setup

The script presented here was done using R (version 4.0.5; R Core Team 2021) and its packages. We used Maxent v. 3.3.3k (Phillips, Anderson, & Schapire, 2006) and are accessing the Maxent.jar file from a local directory.

2.1. Libraries

# load libraries
  library('raster')       # raster data
  library('rgdal')        # input/output; predictions; reading ASCII files
  library('RColorBrewer') # colours for graphics

2.1.1 Options

Change raster options to maximise memory

# modify max memory from 10gb to 30gb (this is a 32GB RAM computer)
  rasterOptions(maxmemory = 3e10)

2.2. Directories

We create and use multiple folders to organise our inputs and outputs. Similar directories can be made anywhere by only changing the root directory object, dir.

# Root directory
  dir <- c('G:\\R\\NZSL_MSSDM_MCA_2019_NZ\\')

# Data directory
  dat.dir <- paste0(dir,'data\\')

# Maxent.jar file directory
  maxent.dir <- paste0(dir,'maxent')
  
# Maxent results directory
  maxout.dir <- paste0(dir,'m_output\\')
  
# Maxent cache files (training and prediction environmental variables)
  # these are a type of compressed format, created in the first model run (runs faster)
  proj.cache <- paste0(dat.dir,'maxent.cache\\projection\\')
  train.cache <- paste0(dat.dir,'maxent.cache\\training\\')
        
# Intermediate data and layers
  dir.create(paste0(dir,'data\\intermediate'),recursive=TRUE)
  int.dir <- paste0(dir,'data\\intermediate\\')
  
# Final layers
  dir.create(paste0(dir,'layers'),recursive=TRUE)
  lay.dir <- paste0(dir,'layers\\')
  
# Final tables
  dir.create(paste0(dir,'tables'),recursive=TRUE)
  tab.dir <- paste0(dir,'tables\\')
  
# Final figures
  dir.create(paste0(dir,'figures'),recursive=TRUE)
  fig.dir <- paste0(dir,'figures\\')

2.3 Custom colours

# colours for variables
  var_cols <-c('#88CCEE','#44AA99','#117733','#999933','#DDCC77','#CC6677',
               '#882255','#AA4499')

2.4 Custom functions

2.4.1 Load raster and ensure prediction

Load rasters and project. The default projection is set to NZDG2000/New Zealand Transverse Mercator 2000 (EPSG: 2193).

# load raster and project
  get.ras <- function(r_name,
                      # defaults
                      ras_dir=dat.dir, prj=c('+init=epsg:2193')){
      # read raster
        r <- raster(paste0(ras_dir,'\\',r_name))
      
      # project and return raster
        crs(r) <- CRS(paste0(prj))
        return(r)
  }

3. Calculate CV

3.1 Calculation from raster package

To calculate CV, we gather the 100 predictions per behavioural state and use the calc() and cv() functions from the raster package.

# calculate CV across 100 predictions per state
  for(state in 1:3){ 

      # output directory
        maxent.out <- paste0(dir,'m_output\\',state)
        
      # CV per state
        
        # empty vector
          maxent_rasters <- list()
        
        # For-loop
          for (run in 1:100){
               # Directory
                 runs.out <- paste0(maxent.out,"\\",run,"\\")
                 
               # Search for ".asc" files and stack
                 filenames <- list.files(path=paste0(runs.out),
                                         pattern=paste0("projection_S",state,"_",
                                                        run,".asc"),full.names=TRUE)  
                 maxent_rasters[[run]] <- filenames
               }
          # stack
            maxent_100_rasters <- do.call(stack, maxent_rasters)
            
          # calculate mean
            cv_proj <- calc(x=maxent_100_rasters, fun=cv)
  
          # save
            writeRaster(cv_proj,paste0(dat.dir,"\\S",state,"_CV.tif"),
                        options="COMPRESS=LZW",overwrite=TRUE)  
  }

3.2 Plot results

Next, create a raster stack to extract the saved CV layers.

# Empty vector
  s123_cv <- list()

# For-loop (S1-3): 
  for(state in 1:3){
                    raster_names <- list.files(path=paste0(dat.dir),
                                       pattern=paste0("\\S",state,"_CV.tif"),
                                       full.names=TRUE)
                    # Add to list
                    s123_cv[[state]] <- raster_names
  }

# stack
  s123_cv <- do.call(stack,s123_cv)

Save a plot of CV.

# plot
  plt <- spplot(s123_cv,
               main=c('Coefficient of variation'),
               maxpixels=500000,
               col.regions=c(rev(brewer.pal(10,'Spectral'))),
               cuts=9,
               par.settings = list(panel.background=list(col='black'))
               )

# save
  png(width=2400,height=1500,
      paste0(fig.dir,'CV_NZ.png'),res=300)
  plt
  dev.off()

Plot here.

knitr::include_graphics(paste0(fig.dir,'CV_NZ.png'))

4. Calulate MESS and MOD

4.1 Running analysis from Maxent.jar

First, get usage information from Maxent.jar for calculating MESS.

Note that MESS and MOD can also be calculated using the mess() function in the dismo package. Here, we chose to use Maxent.jar since we were working with a finer resolution at a large scale, so Maxent’s compressed cache files (.mxe) allowed us to run the model faster.

# Usage information:
  command <- paste0(#initiate java and allocate memory
                    'java -mx16000m -cp ',
                    #maxent directory and command
                    maxent.dir,'\\maxent.jar density.tools.Novel')
  system(sprintf(command))
Usage: Novel [-m basemask] [-c colors] [-w] basedir projdir outfile
[1] 0

Then run according to those parameters, where:

  • basedir is the directory for the training area data.
  • projdir is the directory for the prediction area data.
  • outfile is the directory, file name and file extension for the output grids.
# Run MESS and MOD analysis
  command <- paste0(#initiate java and allocate memory
                    'java -mx16000m -cp ',
                    #maxent directory and command
                    maxent.dir,'\\maxent.jar density.tools.Novel ',
                    #training area data
                    train.cache,' ',
                    #prediction area data
                    proj.cache,' ',
                    #output directory and file name
                    #MOD is automatically saved as '*_limiting.asc'
                    lay.dir,'\\NZ_AI_MESS.asc')
  system(sprintf(command))

The output grid for the limiting factor will have numbers corresponding to the variable names in alphabetical order, where:

  • 0 = cliff_edges
  • 1 = coast_dist
  • 2 = forest_dist
  • 3 = grass_dist
  • 4 = landcover
  • 5 = sand_dist
  • 6 = slope
  • 7 = water_dist

4.2 Plot results

Next, load rasters and plot.

# grab MESS and MOD rasters
  MESS <- get.ras('NZ_AI_MESS.asc', ras_dir = lay.dir)
  MOD <- get.ras('NZ_AI_MESS_limiting.asc', ras_dir = lay.dir)

Plot MESS.

# plot
  plt <- spplot(MESS,
               main=c('MESS grid'),
               maxpixels=500000,
               col.regions=c(rev(brewer.pal(10,'Spectral'))),
               cuts=9,
               par.settings = list(panel.background=list(col='black'))
               )

# save
  png(width=1800,height=1800,
      paste0(fig.dir,'MESS_grid_NZ.png'),res=300)
  plt
  dev.off()

Display here.

knitr::include_graphics(paste0(fig.dir,'MESS_grid_NZ.png'))

Plot MOD.

# plot
  var_labs <- c('cliff','coast\ndist.','forest\ndist.',
                'grass\ndist.','land\ncover','sand\ndist.',
                'slope','water\ndist.',' ')
 
  plt <- spplot(MOD,
               main=c('MOD grid'),
               maxpixels=500000,
               cuts=7,
               col.regions=var_cols,
               colorkey=list(space='bottom',
                             breaks=list(0,1,2,3,4,5,6,7,8),
                             labels=list(at=c(0,1,1.8,2.8,3.6,4.4,5.3,6.2,7),
                                         labels=var_labs,
                             cex=0.8)),
               par.settings = list(panel.background=list(col='black'))
               )

# save
  png(width=1800,height=1800,
      paste0(fig.dir,'MoD_grid_NZ.png'),res=300)
  plt
  dev.off()

Display here.

knitr::include_graphics(paste0(fig.dir,'MoD_grid_NZ.png'))

5. Limiting factor grids

5.1 Extracting Maxent background information

To calculate limiting factors, Maxent needs its background information data (.lambdas and sampleAverages.csv files) from the state/run folders, since the evaluations are done for each prediction (100 runs from each state). We moved these from our directories of the runs and put them into a folder for each state.

This code was adapted from the multi-state SDM tutorial in Frans et al. (2018).

# lambdas files
  for(state in 1:3){
      # create a new directory to store the .lambdas and sample average files
        dir.create(paste0(maxout.dir,state,'\\lambdas'), recursive=TRUE) 
        lam.dir <- paste0(maxout.dir,state,'\\lambdas\\')
        
        # lambdas files
          for(run in 1:100){
          # select all files of '.lambdas' extension
            filenames <- list.files(paste0(maxout.dir,state,'\\',run),
                                    pattern='*.lambdas',full.names=TRUE)
            
          # rename the file with state number added to it
            renamed <- file.rename(from=filenames,
                                  to=sub(pattern='species.lambdas',
                                  replacement=paste0('S',state,'_run',run,'.lambdas'),
                                  filenames))
          
          # find the renamed files again
            new_filenames <- list.files(paste0(maxout.dir,state,'\\',run),
                                        pattern='*.lambdas',full.names=TRUE)
          
          # copy the '.lambdas' files and move to a new folder
            file.copy(from=new_filenames,to=lam.dir,overwrite=TRUE,
                      recursive=FALSE,copy.mode=TRUE)
            
        # sample average files
          # select all files of '.lambdas' extension
            filenames <- list.files(paste0(maxout.dir,state,'\\',run),
                                    pattern='*sampleAverages.csv',full.names=TRUE)
            
          # rename the file with state number added to it
            renamed <- file.rename(from=filenames,
                                  to=sub(pattern='species_sampleAverages.csv',
                                  replacement=paste0('S',state,'_run',run,
                                                     '_sampleAverages.csv'),
                                  filenames))
          
          # find the renamed files again
            new_filenames <- list.files(paste0(maxout.dir,state,'\\',run),
                                        pattern='*sampleAverages.csv',full.names=TRUE)
          
          # copy the '.lambdas' files and move to a new folder
            file.copy(from=new_filenames,to=lam.dir,overwrite=TRUE,
                      recursive=FALSE,copy.mode=TRUE)
       }
      }

5.2 Running analysis from Maxent.jar

First, get usage information from Maxent.jar for calculating the limiting factor grid.

Note that the limiting factor grid can also be calculated using the limiting() function in the dismo package. Here, we chose to use Maxent.jar since we were working with a finer resolution at a large scale, so Maxent’s compressed cache files (.mxe) allowed us to run the model faster.

# Usage information:
  command <- paste0(#initiate java and allocate memory
                    'java -mx16000m -cp ',
                    #maxent directory and command
                    maxent.dir,'\\maxent.jar density.tools.LimitingFactor')
  system(sprintf(command))
Usage: density.tools.LimitingFactor lambdafile projectiondirectory outfile
[1] 0

Then run according to those parameters, where:

  • lambdafile is the directory for the .lambdas files.
  • projectiondirectory is the directory for the prediction area data.
  • outfile is the directory, file name and file extension for the output grids.

Additional options are added to indicate categorical variables.

# limiting factor analysis across each state and run
  for(state in 1:3){
        # lambda file directory and output
          lam.dir <- paste0(maxout.dir,state,'\\lambdas\\')
  for (run in 1:100){
        command <- paste0(#initiate java and allocate memory
                          'java -mx16000m -cp ',
                          # maxent directory and command
                          maxent.dir,'\\maxent.jar density.tools.LimitingFactor ',
                          #lambda file name
                          lam.dir,'\\*',run,'.lambdas ',
                          #prediction area data
                          proj.cache,' ',
                          #output directory and file name (by run)
                          lam.dir,'NZ_limiting_',run,'.asc',
                          #option: indicate categorical variable file names
                          ' -t cliff_edges -t landcover',
                          #option: don't save image--just the ASCII
                          ' nopictures')
        system(sprintf(command))
  }
  }

The limiting factor grids will have numbers corresponding to the variable names in alphabetical order, where:

  • 0 = cliff_edges
  • 1 = coast_dist
  • 2 = forest_dist
  • 3 = grass_dist
  • 4 = landcover
  • 5 = sand_dist
  • 6 = slope
  • 7 = water_dist

Next, stack the limiting factor grids and calculate the mode (most frequent limiting factor across runs) for each state.

# empty list
  state_lims <- list()

# limiting factors and mode
  for(state in 1:3){ 

      # output directory
        lam.dir <- paste0(maxout.dir,state,'\\lambdas\\')
        
      # Stack all .asc files in state folder
        lim_rasters <- stack(list.files(path=lam.dir,pattern='.asc',
                              full.names=TRUE))
            
        # calculate mode
          mode_lim <- calc(x=lim_rasters, fun=modal)
          
        # add to list
          state_lims[[state]] <- mode_lim

        # save
          writeRaster(mode_lim,paste0(dat.dir,'S',state,'_limiting_mode.tif'),
                      options='COMPRESS=LZW',overwrite=TRUE)  
  }

# stack the modes
  limits <- do.call(stack, state_lims)
  
# rename layers
  names(limits) <- c('S1_limits', 'S2_limits', 'S3_limits')

5.3 Plot results

# plot
  var_labs <- c('cliff','coast\ndist.','forest\ndist.',
                'grass\ndist.','land\ncover','sand\ndist.',
                'slope','water\ndist.',' ')
 
  plt <- spplot(limits,
               main=c('Most frequent limiting factors (mode)'),
               maxpixels=500000,
               cuts=7,
               col.regions=var_cols,
               colorkey=list(space='bottom',
                             breaks=list(0,1,2,3,4,5,6,7,8),
                             labels=list(labels=var_labs,
                                         at=seq(0,8,by=1),
                                    cex = .8)),
               par.settings = list(panel.background=list(col='black'))
               )

# save
  png(width=2400,height=1500,
      paste0(fig.dir,'limiting_factors_NZ.png'),res=300)
  plt
  dev.off()

Plot here

knitr::include_graphics(paste0(fig.dir,'limiting_factors_NZ.png'))

6. Save workspace

# save workspace to load later if needed
  save.image('G:/R/mess_mod_lim_NZ.RData')

This concludes the evaluation of multi-state SDM uncertainties.

Next, we integrated these with human impact and locations of inquiry evaluations into an integrated SDM database. See Appendix S5.

7. References

Elith, J., M. Kearney and S. Phillips. 2010. The art of modelling range-shifting species. Methods in Ecology and Evolution 1:330-342. https://doi.org/10.1111/j.2041-210X.2010.00036.x

Phillips, S. J., Anderson, R. P., & Schapire, R. E. (2006). Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190(3), 231–259. https://doi.org/10.1016/j.ecolmodel.2005.03.026

Phillips, S.J. & Dudik, M. (2008) Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography, 31 (2), 161-175. https://doi.org/10.1111/j.0906-7590.2008.5203.x