sperrorest
is a flexible interface for multiple types of
parallelized spatial and non-spatial cross-validation
and bootstrap error estimation and parallelized permutation-based
assessment of spatial variable importance.
sperrorest(formula, data, coords = c("x", "y"), model_fun, model_args = list(), pred_fun = NULL, pred_args = list(), smp_fun = partition_cv, smp_args = list(), train_fun = NULL, train_param = NULL, test_fun = NULL, test_param = NULL, err_fun = err_default, imp_variables = NULL, imp_permutations = 1000, importance = !is.null(imp_variables), distance = FALSE, par_args = list(par_mode = "foreach", par_units = NULL, par_option = NULL), do_gc = 1, progress = "all", out_progress = "", benchmark = FALSE, ...)
formula | A formula specifying the variables used by the |
---|---|
data | a |
coords | vector of length 2 defining the variables in |
model_fun | Function that fits a predictive model, such as |
model_args | Arguments to be passed to |
pred_fun | Prediction function for a fitted model object created
by |
pred_args | (optional) Arguments to |
smp_fun | A function for sampling training and test sets from
|
smp_args | (optional) Arguments to be passed to |
train_fun | (optional) A function for resampling or subsampling the training sample in order to achieve, e.g., uniform sample sizes on all training sets, or maintaining a certain ratio of positives and negatives in training sets. E.g. resample_uniform or resample_strat_uniform. |
train_param | (optional) Arguments to be passed to |
test_fun | (optional) Like |
test_param | (optional) Arguments to be passed to |
err_fun | A function that calculates selected error measures from the
known responses in |
imp_variables | (optional; used if |
imp_permutations | (optional; used if |
importance | logical (default: |
distance | logical (default: |
par_args | list of parallelization parameters:
|
do_gc | numeric (default: 1): defines frequency of memory garbage
collection by calling gc; if |
progress | character (default: |
out_progress | only used if |
benchmark | (optional) logical (default: |
... | Further options passed to makeCluster for
|
A list (object of class sperrorest
) with (up to) six components:
a sperrorestreperror
object containing
predictive performances at the repetition level
a sperroresterror
object containing predictive
performances at the fold level
a represampling()
object
a sperrorestimportance
object containing
permutation-based variable importances at the fold level
a sperrorestbenchmark
object containing
information on the system the code is running on, starting and
finishing times, number of available CPU cores, parallelization mode,
number of parallel units, and runtime performance
a sperrorestpackageversion
object containing
information about the sperrorest
package version
By default sperrorest
runs in parallel on all cores using
foreach
with the future backend. If this is not desired, specify
par_units
in par_args
or set par_mode = "sequential"
.
Available parallelization modes include par_mode = "apply"
(calls pbmclapply on Unix, parApply on Windows) and
future
(future_lapply).
For the latter and par_mode = "foreach"
, par_option
(default to multiprocess
and
cluster
, respectively) can be specified. See plan for further details.
Custom predict functions passed to pred_fun
, which consist of
multiple custom defined child functions, must be defined in one function.
Brenning, A. 2012. Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: the R package 'sperrorest'. 2012 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 23-27 July 2012, p. 5372-5375.
Brenning, A. 2005. Spatial prediction models for landslide hazards: review, comparison and evaluation. Natural Hazards and Earth System Sciences, 5(6): 853-862.
Brenning, A., S. Long & P. Fieguth. Forthcoming. Detecting rock glacier flow structures using Gabor filters and IKONOS imagery. Submitted to Remote Sensing of Environment.
Russ, G. & A. Brenning. 2010a. Data mining in precision agriculture: Management of spatial information. In 13th International Conference on Information Processing and Management of Uncertainty, IPMU 2010; Dortmund; 28 June - 2 July 2010. Lecture Notes in Computer Science, 6178 LNAI: 350-359.
Russ, G. & A. Brenning. 2010b. Spatial variable importance assessment for yield prediction in Precision Agriculture. In Advances in Intelligent Data Analysis IX, Proceedings, 9th International Symposium, IDA 2010, Tucson, AZ, USA, 19-21 May 2010. Lecture Notes in Computer Science, 6065 LNCS: 184-195.
# NOT RUN { ##------------------------------------------------------------ ## Classification tree example using non-spatial partitioning ## setup and default parallel mode ("foreach") ##------------------------------------------------------------ data(ecuador) # Muenchow et al. (2012), see ?ecuador fo <- slides ~ dem + slope + hcurv + vcurv + log.carea + cslope library(rpart) mypred_part <- function(object, newdata) predict(object, newdata)[, 2] ctrl <- rpart.control(cp = 0.005) # show the effects of overfitting fit <- rpart(fo, data = ecuador, control = ctrl) ### Non-spatial 5-repeated 10-fold cross-validation: mypred_part <- function(object, newdata) predict(object, newdata)[, 2] par_nsp_res <- sperrorest(data = ecuador, formula = fo, model_fun = rpart, model_args = list(control = ctrl), pred_fun = mypred_part, progress = TRUE, smp_fun = partition_cv, smp_args = list(repetition = 1:5, nfold = 10)) summary(par_nsp_res$error_rep) summary(par_nsp_res$error_fold) summary(par_nsp_res$represampling) # plot(par_nsp_res$represampling, ecuador) ### Spatial 5-repeated 10-fold spatial cross-validation: par_sp_res <- sperrorest(data = ecuador, formula = fo, model_fun = rpart, model_args = list(control = ctrl), pred_fun = mypred_part, progress = TRUE, smp_fun = partition_kmeans, smp_args = list(repetition = 1:5, nfold = 10)) summary(par_sp_res$error_rep) summary(par_sp_res$error_fold) summary(par_sp_res$represampling) # plot(par_sp_res$represampling, ecuador) smry <- data.frame( nonspat_training = unlist(summary(par_nsp_res$error_rep, level = 1)$train_auroc), nonspat_test = unlist(summary(par_nsp_res$error_rep, level = 1)$test_auroc), spatial_training = unlist(summary(par_sp_res$error_rep, level = 1)$train_auroc), spatial_test = unlist(summary(par_sp_res$error_rep, level = 1)$test_auroc)) boxplot(smry, col = c('red','red','red','green'), main = 'Training vs. test, nonspatial vs. spatial', ylab = 'Area under the ROC curve') ##------------------------------------------------------------ ## Logistic regression example (glm) using partition_kmeans ## and computation of permutation based variable importance ##------------------------------------------------------------ data(ecuador) fo <- slides ~ dem + slope + hcurv + vcurv + log.carea + cslope out <- sperrorest(data = ecuador, formula = fo, model_fun = glm, model_args = list(family = "binomial"), pred_fun = predict, pred_args = list(type = "response"), smp_fun = partition_cv, smp_args = list(repetition = 1:2, nfold = 4), par_args = list(par_mode = "future"), importance = TRUE, imp_permutations = 10) summary(out$error_rep) summary(out$importance) # }