Create/coerce and print resampling objects, e.g., partitionings or boostrap samples derived from a data set.
as.resampling(object, ...) # S3 method for default as.resampling(object, ...) # S3 method for factor as.resampling(object, ...) # S3 method for list as.resampling(object, ...) validate.resampling(object) is.resampling(x, ...) # S3 method for resampling print(x, ...)
object | depending on the function/method, a list or a vector of type factor defining a partitioning of the dataset. |
---|---|
... | currently not used. |
x | object of class |
as.resampling
methods: An object of class resampling
.
A resampling
object is a list of lists defining a set of
training and test samples.
In the case of k
-fold cross-validation partitioning, for example,
the corresponding resampling
object would be of length k
,
i.e. contain k
lists. Each of these k
lists defines a training
set of size n(k-1)/k
(where n
is the overall sample size), and
a test set of size n/k
.
The resampling
object does, however, not contain the data itself, but
only indices between 1
and n
identifying the selection
(see Examples).
Another example is bootstrap resampling. represampling_bootstrap
with argument oob = TRUE
generates rep
resampling
objects
with indices of a bootstrap sample in the train
component and indices
of the out-of-bag sample in the test component (see Examples below).
as.resampling.factor
: For each factor level of the input variable,
as.resampling.factor
determines the indices of samples in this level
(= test samples) and outside this level (= training samples). Empty levels of
object
are dropped without warning.
as.resampling_list
checks if the list in object
has a valid
resampling
object structure (with components train
and
test
etc.) and assigns the class attribute 'resampling'
if
successful.
represampling, partition_cv, partition_kmeans, represampling_bootstrap, etc.
data(ecuador) # Muenchow et al. (2012), see ?ecuador # Partitioning by elevation classes in 200 m steps: parti <- factor( as.character( floor( ecuador$dem / 200 ) ) ) smp <- as.resampling(parti) summary(smp)#> n.train n.test #> 10 600 151 #> 11 585 166 #> 12 660 91 #> 13 641 110 #> 14 727 24 #> 15 747 4 #> 8 730 21 #> 9 567 184# Compare: summary(parti)#> 10 11 12 13 14 15 8 9 #> 151 166 91 110 24 4 21 184# k-fold (non-spatial) cross-validation partitioning: parti <- partition_cv(ecuador) parti <- parti[[1]] # the first (and only) resampling object in parti # data corresponding to the test sample of the first fold: str( ecuador[ parti[[1]]$test , ])#> 'data.frame': 76 obs. of 13 variables: #> $ x : num 714042 715282 713962 713412 714902 ... #> $ y : num 9558482 9557602 9561082 9560472 9559262 ... #> $ dem : num 2408 2837 1839 1869 2363 ... #> $ slope : num 24.1 34.4 63.4 16.9 50.7 ... #> $ hcurv : num 0.00659 -0.02191 -0.04951 -0.00156 -0.01407 ... #> $ vcurv : num 0.01041 -0.00579 -0.00529 0.00406 0.00547 ... #> $ carea : num 773 2247 3282 2421 519 ... #> $ cslope : num 27.5 37.7 29.1 31 35.3 ... #> $ distroad : num 300 300 173 300 300 ... #> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ... #> $ distdeforest : num 300 300 33.8 52.5 300 ... #> $ distslidespast: num 6 100 100 39 59 100 37 100 25 0 ... #> $ log.carea : num 2.89 3.35 3.52 3.38 2.72 ...# the corresponding training sample - larger: str( ecuador[ parti[[1]]$train , ])#> 'data.frame': 675 obs. of 13 variables: #> $ x : num 712882 715232 715392 715042 715382 ... #> $ y : num 9560002 9559582 9560172 9559312 9560142 ... #> $ dem : num 1912 2199 1989 2320 2021 ... #> $ slope : num 25.6 23.2 40.5 42.9 42 ... #> $ hcurv : num -0.00681 -0.00501 -0.01919 -0.01106 0.00958 ... #> $ vcurv : num -0.00029 -0.00649 -0.04051 -0.04634 0.02642 ... #> $ carea : num 5577 1399 351155 501 671 ... #> $ cslope : num 34.4 30.7 32.8 33.9 41.6 ... #> $ distroad : num 300 300 300 300 300 ... #> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ... #> $ distdeforest : num 15 300 300 300 300 9.15 300 300 300 0 ... #> $ distslidespast: num 9 21 40 100 21 2 100 100 41 5 ... #> $ log.carea : num 3.75 3.15 5.55 2.7 2.83 ...# Bootstrap training sets, out-of-bag test sets: parti <- represampling_bootstrap(ecuador, oob = TRUE) parti <- parti[[1]] # the first (and only) resampling object in parti # out-of-bag test sample: approx. one-third of nrow(ecuador): str( ecuador[ parti[[1]]$test , ])#> 'data.frame': 290 obs. of 13 variables: #> $ x : num 715232 715042 715382 712802 714842 ... #> $ y : num 9559582 9559312 9560142 9559952 9558892 ... #> $ dem : num 2199 2320 2021 1838 2483 ... #> $ slope : num 23.2 42.9 42 52.1 68.8 ... #> $ hcurv : num -0.00501 -0.01106 0.00958 0.00183 -0.04921 ... #> $ vcurv : num -0.00649 -0.04634 0.02642 -0.09203 -0.12438 ... #> $ carea : num 1399 501 671 634 754 ... #> $ cslope : num 30.7 33.9 41.6 30.3 53.7 ... #> $ distroad : num 300 300 300 300 300 ... #> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ... #> $ distdeforest : num 300 300 300 9.15 300 ... #> $ distslidespast: num 21 100 21 2 100 5 20 100 100 100 ... #> $ log.carea : num 3.15 2.7 2.83 2.8 2.88 ...# bootstrap training sample: same size as nrow(ecuador): str( ecuador[ parti[[1]]$train , ])#> 'data.frame': 751 obs. of 13 variables: #> $ x : num 715382 713132 715432 714892 715722 ... #> $ y : num 9558062 9560672 9558592 9559282 9557532 ... #> $ dem : num 2799 1897 2622 2374 3097 ... #> $ slope : num 50.6 24.7 19 42.8 28.8 ... #> $ hcurv : num -0.00812 0.00306 -0.00301 -0.01057 0.02327 ... #> $ vcurv : num -0.02208 -0.00436 -0.01099 0.02427 0.01833 ... #> $ carea : num 951 2603 1146 381 300 ... #> $ cslope : num 50.9 29 20.1 27.2 25.9 ... #> $ distroad : num 300 10 300 300 300 300 300 300 300 300 ... #> $ slides : Factor w/ 2 levels "FALSE","TRUE": 1 2 1 2 1 1 2 2 2 1 ... #> $ distdeforest : num 300 166 300 300 300 ... #> $ distslidespast: num 100 2 26 46 100 55 100 100 11 45 ... #> $ log.carea : num 2.98 3.42 3.06 2.58 2.48 ...