Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models

Hao, Tianxiao; Elith, Jane; Lahoz‐Monfort, José J.; Guillera‐Arroita, Gurutzeta

doi:10.5061/dryad.tqjq2bvv2

Published March 12, 2020 | Version v1

Dataset Open

Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models

1. University of Melbourne

Predictive performance is important to many applications of species distribution models (SDMs). The SDM 'ensemble' approach, which combines predictions across different modelling methods, is believed to improve predictive performance, and is used in many recent SDM studies. Here, we aim to compare the predictive performance of ensemble species distribution models to that of individual models, using a large presence-absence dataset of eucalypt tree species. To test model performance, we divided our dataset into calibration and evaluation folds using two spatial blocking strategies (checkerboard-pattern and latitudinal slicing). We calibrated and cross-validated all models within the calibration folds, using both repeated random division of data (a common approach) and spatial blocking. Ensembles were built using the software package 'biomod2', with standard ("untuned") settings. Boosted regression tree (BRT) models were also fitted to the same data, tuned according to published procedures. We then used evaluation folds to compare ensembles against both their component untuned individual models, and against the BRTs. We used area under the receiver-operating characteristic curve (AUC) and log-likelihood for assessing model performance. In all our tests, ensemble models performed well, but not consistently better than their component untuned individual models or tuned BRTs across all tests. Moreover, choosing untuned individual models with best cross-validation performance also yielded good external performance, with blocked cross-validation proving better suited for this choice, in this study, than repeated random cross-validation. The latitudinal slice test was only possible for four species; this showed some individual models, and particularly the tuned one, performing better than ensembles. This study shows no particular benefit to using ensembles over individual tuned models. It also suggests that further robust testing of performance is required for situations where models are used to predict to distant places or environments.

Notes

See readme.txt file in the data package

Funding provided by: Australian Research Council
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100000923
Award Number: DE160100904

Funding provided by: Australian Research Council
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100000923
Award Number: DP160101003

Files

Hao_et_al_Ecography_code_package.zip

Files (36.3 kB)

Name	Size	Download all
Hao_et_al_Ecography_code_package.zip md5:75e058a2e8414198e8a0c1f9616f54d0	36.3 kB	Preview Download

Additional details

Is cited by: 10.1111/ecog.04890 (DOI); 10.1111/2041-210X.12242 (DOI)

	All versions	This version
Views	136	136
Downloads	11	11
Data volume	399.4 kB	399.4 kB

Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models

Authors/Creators

Description

Notes

Files

Hao_et_al_Ecography_code_package.zip

Files (36.3 kB)

Additional details

Related works