missingness.Rd
Finds the percent of NAs in a vector or in each column of a dataframe or matrix or in a vector. Possible mis-coded missing values are searched for and a warning issued if they are found.
missingness(d, return_df = TRUE, to_search = c("NA", "NAs", "na", "NaN", "?", "??", "nil", "NULL", " ", ""))
d | A data frame or matrix |
---|---|
return_df | If TRUE (default) a data frame is returned, which generally makes reading the output easier. If variable names are so long that the data frame gets wrapped poorly, set this to FALSE. |
to_search | A vector of strings that might represent missingness. If
found in |
A data frame with two columns: variable names in d
and the
percent of entries in each variable that are missing.
d <- data.frame(x = c("a", "nil", "b"), y = c(1, NaN, 3), z = c(1:2, NA)) missingness(d)#> Warning: Found these strings that may represent missing values: "nil". If they do represent missingness, replace them with NA with: `make_na(d, c("nil"))`#> # A tibble: 3 x 2 #> variable percent_missing #> * <chr> <dbl> #> 1 x 0 #> 2 y 33.3 #> 3 z 33.3missingness(d) %>% plot()#> Warning: Found these strings that may represent missing values: "nil". If they do represent missingness, replace them with NA with: `make_na(d, c("nil"))`