Create/coerce and print resampling objects, e.g., partitionings or boostrap samples derived from a data set.

as.resampling(object, ...)

# S3 method for default
as.resampling(object, ...)

# S3 method for factor
as.resampling(object, ...)

# S3 method for list
as.resampling(object, ...)

validate.resampling(object)

is.resampling(x, ...)

# S3 method for resampling
print(x, ...)

Arguments

object

depending on the function/method, a list or a vector of type factor defining a partitioning of the dataset.

...

currently not used.

x

object of class resampling.

Value

as.resampling methods: An object of class resampling.

Details

A resampling object is a list of lists defining a set of training and test samples.

In the case of k-fold cross-validation partitioning, for example, the corresponding resampling object would be of length k, i.e. contain k lists. Each of these k lists defines a training set of size n(k-1)/k (where n is the overall sample size), and a test set of size n/k. The resampling object does, however, not contain the data itself, but only indices between 1 and n identifying the selection (see Examples).

Another example is bootstrap resampling. represampling_bootstrap with argument oob = TRUE generates represampling objects with indices of a bootstrap sample in the train component and indices of the out-of-bag sample in the test component (see Examples below). as.resampling.factor: For each factor level of the input variable, as.resampling.factor determines the indices of samples in this level (= test samples) and outside this level (= training samples). Empty levels of object are dropped without warning. as.resampling_list checks if the list in object has a valid resampling object structure (with components train and test etc.) and assigns the class attribute 'resampling' if successful.

See also

represampling, partition_cv, partition_kmeans, represampling_bootstrap, etc.

Examples

data(ecuador) # Muenchow et al. (2012), see ?ecuador # Partitioning by elevation classes in 200 m steps: parti <- factor( as.character( floor( ecuador$dem / 200 ) ) ) smp <- as.resampling(parti) summary(smp)
#> n.train n.test #> 10 600 151 #> 11 585 166 #> 12 660 91 #> 13 641 110 #> 14 727 24 #> 15 747 4 #> 8 730 21 #> 9 567 184
# Compare: summary(parti)
#> 10 11 12 13 14 15 8 9 #> 151 166 91 110 24 4 21 184
# k-fold (non-spatial) cross-validation partitioning: parti <- partition_cv(ecuador) parti <- parti[[1]] # the first (and only) resampling object in parti # data corresponding to the test sample of the first fold: str( ecuador[ parti[[1]]$test , ])
#> 'data.frame': 76 obs. of 13 variables: #> $ x : num 714042 715282 713962 713412 714902 ... #> $ y : num 9558482 9557602 9561082 9560472 9559262 ... #> $ dem : num 2408 2837 1839 1869 2363 ... #> $ slope : num 24.1 34.4 63.4 16.9 50.7 ... #> $ hcurv : num 0.00659 -0.02191 -0.04951 -0.00156 -0.01407 ... #> $ vcurv : num 0.01041 -0.00579 -0.00529 0.00406 0.00547 ... #> $ carea : num 773 2247 3282 2421 519 ... #> $ cslope : num 27.5 37.7 29.1 31 35.3 ... #> $ distroad : num 300 300 173 300 300 ... #> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ... #> $ distdeforest : num 300 300 33.8 52.5 300 ... #> $ distslidespast: num 6 100 100 39 59 100 37 100 25 0 ... #> $ log.carea : num 2.89 3.35 3.52 3.38 2.72 ...
# the corresponding training sample - larger: str( ecuador[ parti[[1]]$train , ])
#> 'data.frame': 675 obs. of 13 variables: #> $ x : num 712882 715232 715392 715042 715382 ... #> $ y : num 9560002 9559582 9560172 9559312 9560142 ... #> $ dem : num 1912 2199 1989 2320 2021 ... #> $ slope : num 25.6 23.2 40.5 42.9 42 ... #> $ hcurv : num -0.00681 -0.00501 -0.01919 -0.01106 0.00958 ... #> $ vcurv : num -0.00029 -0.00649 -0.04051 -0.04634 0.02642 ... #> $ carea : num 5577 1399 351155 501 671 ... #> $ cslope : num 34.4 30.7 32.8 33.9 41.6 ... #> $ distroad : num 300 300 300 300 300 ... #> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ... #> $ distdeforest : num 15 300 300 300 300 9.15 300 300 300 0 ... #> $ distslidespast: num 9 21 40 100 21 2 100 100 41 5 ... #> $ log.carea : num 3.75 3.15 5.55 2.7 2.83 ...
# Bootstrap training sets, out-of-bag test sets: parti <- represampling_bootstrap(ecuador, oob = TRUE) parti <- parti[[1]] # the first (and only) resampling object in parti # out-of-bag test sample: approx. one-third of nrow(ecuador): str( ecuador[ parti[[1]]$test , ])
#> 'data.frame': 290 obs. of 13 variables: #> $ x : num 715232 715042 715382 712802 714842 ... #> $ y : num 9559582 9559312 9560142 9559952 9558892 ... #> $ dem : num 2199 2320 2021 1838 2483 ... #> $ slope : num 23.2 42.9 42 52.1 68.8 ... #> $ hcurv : num -0.00501 -0.01106 0.00958 0.00183 -0.04921 ... #> $ vcurv : num -0.00649 -0.04634 0.02642 -0.09203 -0.12438 ... #> $ carea : num 1399 501 671 634 754 ... #> $ cslope : num 30.7 33.9 41.6 30.3 53.7 ... #> $ distroad : num 300 300 300 300 300 ... #> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ... #> $ distdeforest : num 300 300 300 9.15 300 ... #> $ distslidespast: num 21 100 21 2 100 5 20 100 100 100 ... #> $ log.carea : num 3.15 2.7 2.83 2.8 2.88 ...
# bootstrap training sample: same size as nrow(ecuador): str( ecuador[ parti[[1]]$train , ])
#> 'data.frame': 751 obs. of 13 variables: #> $ x : num 715382 713132 715432 714892 715722 ... #> $ y : num 9558062 9560672 9558592 9559282 9557532 ... #> $ dem : num 2799 1897 2622 2374 3097 ... #> $ slope : num 50.6 24.7 19 42.8 28.8 ... #> $ hcurv : num -0.00812 0.00306 -0.00301 -0.01057 0.02327 ... #> $ vcurv : num -0.02208 -0.00436 -0.01099 0.02427 0.01833 ... #> $ carea : num 951 2603 1146 381 300 ... #> $ cslope : num 50.9 29 20.1 27.2 25.9 ... #> $ distroad : num 300 10 300 300 300 300 300 300 300 300 ... #> $ slides : Factor w/ 2 levels "FALSE","TRUE": 1 2 1 2 1 1 2 2 2 1 ... #> $ distdeforest : num 300 166 300 300 300 ... #> $ distslidespast: num 100 2 26 46 100 55 100 100 11 45 ... #> $ log.carea : num 2.98 3.42 3.06 2.58 2.48 ...