cvCovEst() identifies the optimal covariance matrix estimator from among a set of candidate estimators.

cvCovEst(
  dat,
  estimators = c(linearShrinkEst, thresholdingEst, sampleCovEst),
  estimator_params = list(linearShrinkEst = list(alpha = 0), thresholdingEst =
    list(gamma = 0)),
  cv_loss = cvMatrixFrobeniusLoss,
  cv_scheme = "v_fold",
  mc_split = 0.5,
  v_folds = 10L,
  center = TRUE,
  scale = FALSE,
  parallel = FALSE
)

Arguments

dat

A numeric data.frame, matrix, or similar object.

estimators

A list of estimator functions to be considered in the cross-validated estimator selection procedure.

estimator_params

A named list of arguments corresponding to the hyperparameters of covariance matrix estimators in estimators. The name of each list element should match the name of an estimator passed to estimators. Each element of the estimator_params is itself a named list, with the names corresponding to a given estimator's hyperparameter(s). The hyperparameter(s) may be in the form of a single numeric or a numeric vector. If no hyperparameter is needed for a given estimator, then the estimator need not be listed.

cv_loss

A function indicating the loss function to be used. This defaults to the Frobenius loss, cvMatrixFrobeniusLoss(). An observation-based version, cvFrobeniusLoss(), is also made available. Additionally, the cvScaledMatrixFrobeniusLoss() is included for situations in which dat's variables are of different scales.

cv_scheme

A character indicating the cross-validation scheme to be employed. There are two options: (1) V-fold cross-validation, via "v_folds"; and (2) Monte Carlo cross-validation, via "mc". Defaults to Monte Carlo cross-validation.

mc_split

A numeric between 0 and 1 indicating the proportion of observations to be included in the validation set of each Monte Carlo cross-validation fold.

v_folds

An integer larger than or equal to 1 indicating the number of folds to use for cross-validation. The default is 10, regardless of the choice of cross-validation scheme.

center

A logical indicating whether to center the columns of dat to have mean zero.

scale

A logical indicating whether to scale the columns of dat to have unit variance.

parallel

A logical option indicating whether to run the main cross-validation loop with future_lapply(). This is passed directly to cross_validate().

Value

A list of results containing the following elements:

  • estimate - A matrix corresponding to the estimate of the optimal covariance matrix estimator.

  • estimator - A character indicating the optimal estimator and corresponding hyperparameters, if any.

  • risk_df - A tibble providing the cross-validated risk estimates of each estimator.

  • cv_df - A tibble providing each estimators' loss over the folds of the cross-validated procedure.

  • args - A named list containing arguments passed to cvCovEst.

Examples

cvCovEst( dat = mtcars, estimators = c( linearShrinkLWEst, thresholdingEst, sampleCovEst ), estimator_params = list( thresholdingEst = list(gamma = seq(0.1, 0.3, 0.1)) ), center = TRUE, scale = TRUE )
#> $estimate #> mpg cyl disp hp drat wt #> mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.68117191 -0.8676594 #> cyl -0.8521620 1.0000000 0.9020329 0.8324475 -0.69993811 0.7824958 #> disp -0.8475514 0.9020329 1.0000000 0.7909486 -0.71021393 0.8879799 #> hp -0.7761684 0.8324475 0.7909486 1.0000000 -0.44875912 0.6587479 #> drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 1.00000000 -0.7124406 #> wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065 1.0000000 #> qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476 -0.1747159 #> vs 0.6640389 -0.8108118 -0.7104159 -0.7230967 0.44027846 -0.5549157 #> am 0.5998324 -0.5226070 -0.5912270 -0.2432043 0.71271113 -0.6924953 #> gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013 -0.5832870 #> carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980 0.4276059 #> qsec vs am gear carb #> mpg 0.41868403 0.6640389 0.59983243 0.4802848 -0.55092507 #> cyl -0.59124207 -0.8108118 -0.52260705 -0.4926866 0.52698829 #> disp -0.43369788 -0.7104159 -0.59122704 -0.5555692 0.39497686 #> hp -0.70822339 -0.7230967 -0.24320426 -0.1257043 0.74981247 #> drat 0.09120476 0.4402785 0.71271113 0.6996101 -0.09078980 #> wt -0.17471588 -0.5549157 -0.69249526 -0.5832870 0.42760594 #> qsec 1.00000000 0.7445354 -0.22986086 -0.2126822 -0.65624923 #> vs 0.74453544 1.0000000 0.16834512 0.2060233 -0.56960714 #> am -0.22986086 0.1683451 1.00000000 0.7940588 0.05753435 #> gear -0.21268223 0.2060233 0.79405876 1.0000000 0.27407284 #> carb -0.65624923 -0.5696071 0.05753435 0.2740728 1.00000000 #> #> $estimator #> [1] "sampleCovEst, hyperparameters = NA" #> #> $risk_df #> # A tibble: 5 x 3 #> estimator hyperparameters cv_risk #> <chr> <chr> <dbl> #> 1 sampleCovEst hyperparameters = NA 65.9 #> 2 thresholdingEst gamma = 0.3 66.0 #> 3 thresholdingEst gamma = 0.1 66.1 #> 4 linearShrinkLWEst hyperparameters = NA 66.2 #> 5 thresholdingEst gamma = 0.2 67.0 #> #> $cv_df #> # A tibble: 50 x 4 #> estimator hyperparameters loss fold #> <chr> <chr> <dbl> <int> #> 1 linearShrinkLWEst hyperparameters = NA 112. 1 #> 2 thresholdingEst gamma = 0.1 107. 1 #> 3 thresholdingEst gamma = 0.2 107. 1 #> 4 thresholdingEst gamma = 0.3 106. 1 #> 5 sampleCovEst hyperparameters = NA 106. 1 #> 6 linearShrinkLWEst hyperparameters = NA 21.3 2 #> 7 thresholdingEst gamma = 0.1 20.9 2 #> 8 thresholdingEst gamma = 0.2 21.3 2 #> 9 thresholdingEst gamma = 0.3 20.6 2 #> 10 sampleCovEst hyperparameters = NA 20.4 2 #> # … with 40 more rows #> #> $args #> $args$cv_loss #> <quosure> #> expr: ^cvMatrixFrobeniusLoss #> env: 0x7fa3ef508758 #> #> $args$cv_scheme #> [1] "v_fold" #> #> $args$mc_split #> [1] 0.5 #> #> $args$v_folds #> [1] 10 #> #> $args$center #> [1] TRUE #> #> $args$scale #> [1] TRUE #> #> $args$parallel #> [1] FALSE #> #> #> attr(,"class") #> [1] "cvCovEst"