partition_cv_strat
creates a set of sample indices corresponding
to cross-validation test and training sets.
partition_cv_strat(data, coords = c("x", "y"), nfold = 10, return_factor = FALSE, repetition = 1, seed1 = NULL, strat)
data |
|
---|---|
coords | vector of length 2 defining the variables in |
nfold | number of partitions (folds) in |
return_factor | if |
repetition | numeric vector: cross-validation repetitions
to be generated. Note that this is not the number of repetitions,
but the indices of these repetitions. E.g., use |
seed1 |
|
strat | character: column in |
A represampling
object, see also
partition_cv
. partition_strat_cv
, however,
stratified with respect to the variable data[,strat]
;
i.e., cross-validation partitioning is done within each set
data[data[,strat]==i,]
(i
in levels(data[, strat])
), and
the i
th folds of all levels are combined into one cross-validation
fold.
sperrorest
, as.resampling
,
resample_strat_uniform
data(ecuador) parti <- partition_cv_strat(ecuador, strat = 'slides', nfold = 5, repetition = 1) idx <- parti[['1']][[1]]$train mean(ecuador$slides[idx] == 'TRUE') / mean(ecuador$slides == 'TRUE')#> [1] 0.9996672# always == 1 # Non-stratified cross-validation: parti <- partition_cv(ecuador, nfold = 5, repetition = 1) idx <- parti[['1']][[1]]$train mean(ecuador$slides[idx] == 'TRUE') / mean(ecuador$slides == 'TRUE')#> [1] 1.002166# close to 1 because of large sample size, but with some random variation