step_missing.Rd
step_missing
creates a specification of a recipe that
will replace NA values with a new factor level, missing
.
step_missing(recipe, ..., role = NA, trained = FALSE, na_percentage = NULL, skip = FALSE)
recipe | A recipe object. The step will be added to the sequence of operations for this recipe. |
---|---|
... | One or more selector functions to choose which variables are
affected by the step. See |
role | Not used by this step since no new variables are created. |
trained | A logical to indicate if the number of NA values have been counted in preprocessing. |
na_percentage | A named numeric vector of NA percentages. This
is |
skip | A logical. Should the step be skipped when the recipe is baked? |
An updated version of recipe
with the new step
added to the sequence of existing steps (if any). For the
tidy
method, a tibble with columns terms
(the
selectors or variables selected) and value
(the
NA counts).
NA values are counted when the recipe is trained using
prep.recipe
. bake.recipe
then fills in the missing values for
the new data.
library(recipes) n = 100 d <- tibble::tibble(encounter_id = 1:n, patient_id = sample(1:20, size = n, replace = TRUE), hemoglobin_count = rnorm(n, mean = 15, sd = 1), hemoglobin_category = sample(c("Low", "Normal", "High", NA), size = n, replace = TRUE), disease = ifelse(hemoglobin_count < 15, "Yes", "No") ) # Initialize my_recipe <- recipe(disease ~ ., data = d) # Create recipe my_recipe <- my_recipe %>% step_missing(all_nominal()) my_recipe#> Data Recipe #> #> Inputs: #> #> role #variables #> outcome 1 #> predictor 4 #> #> Operations: #> #> Filling NA with missing for all_nominal()