Select a set of predictors with minimal multicollinearity using the variance inflation factor (VIF) as criteria to remove collinear variables. The algorithm will: (i) compute the VIF value of the correlation matrix containing the variables selected in ...; (ii) arrange the VIF values and delete the variable with the highest VIF; and (iii) iterate step ii until VIF value is less than or equal to max_vif.

non_collinear_vars(
  .data,
  ...,
  max_vif = 10,
  missingval = "pairwise.complete.obs"
)

Arguments

.data

The data set containing the variables.

...

Variables to be submitted to selection. If ... is null then all the numeric variables from .data are used. It must be a single variable name or a comma-separated list of unquoted variables names.

max_vif

The maximum value for the Variance Inflation Factor (threshold) that will be accepted in the set of selected predictors.

missingval

How to deal with missing values. For more information, please see cor().

Value

A data frame showing the number of selected predictors, maximum VIF value, condition number, determinant value, selected predictors and removed predictors from the original set of variables.

Examples

# \donttest{ library(metan) # All numeric variables non_collinear_vars(data_ge2)
#> Parameter values #> 1 Predictors 10 #> 2 VIF 7.16 #> 3 Condition Number 56.797 #> 4 Determinant 0.0008810515 #> 5 Selected PERK, EP, CDED, NKR, PH, NR, TKW, EL, CD, ED #> 6 Removed EH, CL, CW, KW, NKE
# Select variables and choose a VIF threshold to 5 non_collinear_vars(data_ge2, EH, CL, CW, KW, NKE, max_vif = 5)
#> Parameter values #> 1 Predictors 4 #> 2 VIF 2.934 #> 3 Condition Number 11.248 #> 4 Determinant 0.2400583901 #> 5 Selected NKE, EH, CL, CW #> 6 Removed KW
# }