`hcai-impute` adds various imputation methods to an existing recipe. Currently supports mean (numeric only), new_category (categorical only), bagged trees, or knn.
hcai_impute(recipe, nominal_method = "new_category", numeric_method = "mean", numeric_params = NULL, nominal_params = NULL)
recipe | A recipe object. imputation will be added to the sequence of operations for this recipe. |
---|---|
nominal_method | Defaults to |
numeric_method | Defaults to |
numeric_params | A named list with parmeters to use with chosen
imputation method on numeric data. Options are |
nominal_params | A named list with parmeters to use with chosen
imputation method on nominal data. Options are |
An updated version of `recipe` with the new step added to the sequence of existing steps.
library(recipes)#>#> #>#>#> #>#>#> #>#>#> #>#>#> #>#>#> #>n = 100 set.seed(9) d <- tibble::tibble(patient_id = 1:n, age = sample(c(30:80, NA), size = n, replace = TRUE), hemoglobin_count = rnorm(n, mean = 15, sd = 1), hemoglobin_category = sample(c("Low", "Normal", "High", NA), size = n, replace = TRUE), disease = ifelse(hemoglobin_count < 15, "Yes", "No") ) # Initialize my_recipe <- recipe(disease ~ ., data = d) # Create recipe my_recipe <- my_recipe %>% hcai_impute() my_recipe#> Data Recipe #> #> Inputs: #> #> role #variables #> outcome 1 #> predictor 4 #> #> Operations: #> #> Mean Imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes()# Train recipe trained_recipe <- prep(my_recipe, training = d) # Apply recipe data_modified <- bake(trained_recipe, newdata = d) missingness(data_modified)#> variable percent_missing #> 1 patient_id 0 #> 2 age 0 #> 3 hemoglobin_count 0 #> 4 hemoglobin_category 0 #> 5 disease 0# Specify methods: my_recipe <- my_recipe %>% hcai_impute(numeric_method = "bagimpute", nominal_method = "new_category") my_recipe#> Data Recipe #> #> Inputs: #> #> role #variables #> outcome 1 #> predictor 4 #> #> Operations: #> #> Mean Imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes() #> Bagged tree imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes()# Specify methods and params: my_recipe <- my_recipe %>% hcai_impute(numeric_method = "knnimpute", numeric_params = list(knn_K = 4)) my_recipe#> Data Recipe #> #> Inputs: #> #> role #variables #> outcome 1 #> predictor 4 #> #> Operations: #> #> Mean Imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes() #> Bagged tree imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes() #> 4-nearest neighbor imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes()