Generates n
bootstrap samples of data
and
returns the bootstrapped data frames as list-variable.
bootstrap(data, n, size)
data | A data frame. |
---|---|
n | Number of bootstraps to be generated. |
size | Optional, size of the bootstrap samples. May either be a number
between 1 and |
A tibble
with one column: a list-variable
strap
, which contains resample-objects of class sj_resample
.
These resample-objects are lists with three elements:
the original data frame, data
the rownmumbers id
, i.e. rownumbers of data
, indicating the resampled rows with replacement
the resample.id
, indicating the index of the resample (i.e. the position of the sj_resample
-object in the list strap
)
By default, each bootstrap sample has the same number of observations
as data
. To generate bootstrap samples without resampling
same observations (i.e. sampling without replacement), use
size
to get bootstrapped data with a specific number
of observations. However, specifying the size
-argument is much
less memory-efficient than the bootstrap with replacement. Hence,
it is recommended to ignore the size
-argument, if it is
not really needed.
This function applies nonparametric bootstrapping, i.e. the function
draws samples with replacement.
There is an as.data.frame
- and a print
-method to get or
print the resampled data frames. See 'Examples'. The as.data.frame
-
method automatically applies whenever coercion is done because a data
frame is required as input. See 'Examples' in boot_ci
.
boot_ci
to calculate confidence intervals from
bootstrap samples.
data(efc) bs <- bootstrap(efc, 5) # now run models for each bootstrapped sample lapply(bs$strap, function(x) lm(neg_c_7 ~ e42dep + c161sex, data = x))#> [[1]] #> #> Call: #> lm(formula = neg_c_7 ~ e42dep + c161sex, data = x) #> #> Coefficients: #> (Intercept) e42dep c161sex #> 6.9210 1.4884 0.3424 #> #> #> [[2]] #> #> Call: #> lm(formula = neg_c_7 ~ e42dep + c161sex, data = x) #> #> Coefficients: #> (Intercept) e42dep c161sex #> 5.8124 1.6404 0.6793 #> #> #> [[3]] #> #> Call: #> lm(formula = neg_c_7 ~ e42dep + c161sex, data = x) #> #> Coefficients: #> (Intercept) e42dep c161sex #> 6.571 1.393 0.586 #> #> #> [[4]] #> #> Call: #> lm(formula = neg_c_7 ~ e42dep + c161sex, data = x) #> #> Coefficients: #> (Intercept) e42dep c161sex #> 6.6124 1.5940 0.3681 #> #> #> [[5]] #> #> Call: #> lm(formula = neg_c_7 ~ e42dep + c161sex, data = x) #> #> Coefficients: #> (Intercept) e42dep c161sex #> 6.7139 1.6056 0.2693 #> #># generate bootstrap samples with 600 observations for each sample bs <- bootstrap(efc, 5, 600) # generate bootstrap samples with 70% observations of the original sample size bs <- bootstrap(efc, 5, .7) # compute standard error for a simple vector from bootstraps # use the `as.data.frame()`-method to get the resampled # data frame bs <- bootstrap(efc, 100) bs$c12hour <- unlist(lapply(bs$strap, function(x) { mean(as.data.frame(x)$c12hour, na.rm = TRUE) })) # or as tidyverse-approach library(dplyr)#> #>#>#> #>#>#> #>library(purrr) bs <- efc %>% bootstrap(100) %>% mutate( c12hour = map_dbl(strap, ~mean(as.data.frame(.x)$c12hour, na.rm = TRUE)) ) # bootstrapped standard error boot_se(bs, c12hour)#> term std.err #> 1 c12hour 1.735893#> [1] 1.691623