This function replaces specific values of variables with NA
.
set_na_if()
is a scoped variant of set_na()
, where values
will be replaced only with NA's for those variables that match the logical
condition of predicate
.
set_na(x, ..., na, drop.levels = TRUE, as.tag = FALSE) set_na_if(x, predicate, na, drop.levels = TRUE, as.tag = FALSE)
x | A vector or data frame. |
---|---|
... | Optional, unquoted names of variables that should be selected for
further processing. Required, if |
na | Numeric vector with values that should be replaced with NA values,
or a character vector if values of factors or character vectors should be
replaced. For labelled vectors, may also be the name of a value label. In
this case, the associated values for the value labels in each vector
will be replaced with |
drop.levels | Logical, if |
as.tag | Logical, if |
predicate | A predicate function to be applied to the columns. The
variables for which |
x
, with all values in na
being replaced by NA
.
If x
is a data frame, the complete data frame x
will
be returned, with NA's set for variables specified in ...
;
if ...
is not specified, applies to all variables in the
data frame.
set_na()
converts all values defined in na
with
a related NA
or tagged NA value (see tagged_na
).
Tagged NA
s work exactly like regular R missing values
except that they store one additional byte of information: a tag,
which is usually a letter ("a" to "z") or character number ("0" to "9").
Different NA values for different variables
If na
is a named vector and as.tag = FALSE
, the names
indicate variable names, and the associated values indicate those values
that should be replaced by NA
in the related variable. For instance,
set_na(x, na = c(v1 = 4, v2 = 3))
would replace all 4 in v1
with NA
and all 3 in v2
with NA
.
If na
is a named list and as.tag = FALSE
, it is possible
to replace different multiple values by NA
for different variables
separately. For example, set_na(x, na = list(v1 = c(1, 4), v2 = 5:7))
would replace all 1 and 4 in v1
with NA
and all 5 to 7 in
v2
with NA
.
Furthermore, see also 'Details' in get_na
.
Labels from values that are replaced with NA and no longer used will be
removed from x
, however, other value and variable label
attributes are preserved. For more details on labelled data,
see vignette Labelled Data and the sjlabelled-Package.
replace_na
to replace NA
's with specific
values, rec
for general recoding of variables and
recode_to
for re-shifting value ranges. See
get_na
to get values of missing values in
labelled vectors.
# create random variable dummy <- sample(1:8, 100, replace = TRUE) # show value distribution table(dummy)#> dummy #> 1 2 3 4 5 6 7 8 #> 15 12 12 13 12 10 15 11# set value 1 and 8 as missings dummy <- set_na(dummy, na = c(1, 8)) # show value distribution, including missings table(dummy, useNA = "always")#> dummy #> 2 3 4 5 6 7 <NA> #> 12 12 13 12 10 15 26# add named vector as further missing value set_na(dummy, na = c("Refused" = 5), as.tag = TRUE)#> [1] NA 3 7 NA 7 4 NA NA NA 4 NA 6 4 NA 3 NA 3 7 6 NA 6 NA 7 NA 6 #> [26] 7 6 NA 7 4 2 3 NA NA NA 6 7 NA 2 NA 4 NA 4 NA 2 NA 3 7 7 7 #> [51] NA 4 6 NA 2 2 2 4 6 NA NA 7 3 6 2 NA 3 2 3 2 3 2 NA 4 NA #> [76] NA NA 7 7 3 4 NA 7 4 NA NA 6 NA 7 NA NA NA 4 2 2 4 3 3 NA NA #> attr(,"labels") #> Refused #> NA# see different missing types library(haven) library(sjlabelled) print_tagged_na(set_na(dummy, na = c("Refused" = 5), as.tag = TRUE))#> [1] NA 3 7 NA(5) 7 4 NA NA(5) NA 4 NA 6 #> [13] 4 NA 3 NA(5) 3 7 6 NA 6 NA 7 NA #> [25] 6 7 6 NA 7 4 2 3 NA(5) NA NA 6 #> [37] 7 NA 2 NA(5) 4 NA(5) 4 NA 2 NA 3 7 #> [49] 7 7 NA 4 6 NA 2 2 2 4 6 NA(5) #> [61] NA(5) 7 3 6 2 NA 3 2 3 2 3 2 #> [73] NA 4 NA NA NA 7 7 3 4 NA(5) 7 4 #> [85] NA NA 6 NA 7 NA(5) NA(5) NA 4 2 2 4 #> [97] 3 3 NA NA(5)# create sample data frame dummy <- data.frame(var1 = sample(1:8, 100, replace = TRUE), var2 = sample(1:10, 100, replace = TRUE), var3 = sample(1:6, 100, replace = TRUE)) # set value 2 and 4 as missings dummy %>% set_na(na = c(2, 4)) %>% head()#> var1 var2 var3 #> 1 8 7 1 #> 2 3 5 1 #> 3 3 NA NA #> 4 NA NA NA #> 5 5 9 5 #> 6 NA 8 1#> $var1 #> 2 4 #> NA NA #> #> $var2 #> 2 4 #> NA NA #> #> $var3 #> 2 4 #> NA NA #>#> $var1 #> [1] "NA(2)" "NA(4)" #> #> $var2 #> [1] "NA(2)" "NA(4)" #> #> $var3 #> [1] "NA(2)" "NA(4)" #>data(efc) dummy <- data.frame( var1 = efc$c82cop1, var2 = efc$c83cop2, var3 = efc$c84cop3 ) # check original distribution of categories lapply(dummy, table, useNA = "always")#> $var1 #> #> 1 2 3 4 <NA> #> 3 97 591 210 7 #> #> $var2 #> #> 1 2 3 4 <NA> #> 186 547 130 39 6 #> #> $var3 #> #> 1 2 3 4 <NA> #> 516 252 82 52 6 #># set 3 to NA for two variables lapply(set_na(dummy, var1, var3, na = 3), table, useNA = "always")#> $var1 #> #> 1 2 4 <NA> #> 3 97 210 598 #> #> $var2 #> #> 1 2 3 4 <NA> #> 186 547 130 39 6 #> #> $var3 #> #> 1 2 4 <NA> #> 516 252 52 88 #># if 'na' is a named vector *and* 'as.tag = FALSE', different NA-values # can be specified for each variable set.seed(1) dummy <- data.frame( var1 = sample(1:8, 10, replace = TRUE), var2 = sample(1:10, 10, replace = TRUE), var3 = sample(1:6, 10, replace = TRUE) ) dummy#> var1 var2 var3 #> 1 3 3 6 #> 2 3 2 2 #> 3 5 7 4 #> 4 8 4 1 #> 5 2 8 2 #> 6 8 5 3 #> 7 8 8 1 #> 8 6 10 3 #> 9 6 4 6 #> 10 1 8 3# Replace "3" in var1 with NA, "5" in var2 and "6" in var3 set_na(dummy, na = c(var1 = 3, var2 = 5, var3 = 6))#> var1 var2 var3 #> 1 NA 3 NA #> 2 NA 2 2 #> 3 5 7 4 #> 4 8 4 1 #> 5 2 8 2 #> 6 8 NA 3 #> 7 8 8 1 #> 8 6 10 3 #> 9 6 4 NA #> 10 1 8 3# if 'na' is a named list *and* 'as.tag = FALSE', for each # variable different multiple NA-values can be specified set_na(dummy, na = list(var1 = 1:3, var2 = c(7, 8), var3 = 6))#> var1 var2 var3 #> 1 NA 3 NA #> 2 NA 2 2 #> 3 5 NA 4 #> 4 8 4 1 #> 5 NA NA 2 #> 6 8 5 3 #> 7 8 NA 1 #> 8 6 10 3 #> 9 6 4 NA #> 10 NA NA 3# drop unused factor levels when being set to NA x <- factor(c("a", "b", "c")) x#> [1] a b c #> Levels: a b cset_na(x, na = "b", as.tag = TRUE)#> [1] a <NA> c #> attr(,"labels") #> b #> NA #> Levels: a cset_na(x, na = "b", drop.levels = FALSE, as.tag = TRUE)#> [1] a <NA> c #> attr(,"labels") #> b #> NA #> Levels: a b c# set_na() can also remove a missing by defining the value label # of the value that should be replaced with NA. This is in particular # helpful if a certain category should be set as NA, however, this category # is assigned with different values accross variables x1 <- sample(1:4, 20, replace = TRUE) x2 <- sample(1:7, 20, replace = TRUE) x1 <- set_labels(x1, labels = c("Refused" = 3, "No answer" = 4)) x2 <- set_labels(x2, labels = c("Refused" = 6, "No answer" = 7)) tmp <- data.frame(x1, x2) get_labels(tmp)#> $x1 #> [1] "Refused" "No answer" #> #> $x2 #> [1] "Refused" "No answer" #>table(tmp, useNA = "always")#> x2 #> x1 1 2 3 4 5 6 7 <NA> #> 1 0 1 0 2 0 0 0 0 #> 2 0 0 1 2 0 1 0 0 #> 3 2 0 2 0 2 0 2 0 #> 4 1 1 1 1 0 0 1 0 #> <NA> 0 0 0 0 0 0 0 0#> $x1 #> [1] "Refused" #> #> $x2 #> [1] "Refused" #>table(set_na(tmp, na = "No answer"), useNA = "always")#> x2 #> x1 1 2 3 4 5 6 <NA> #> 1 0 1 0 2 0 0 0 #> 2 0 0 1 2 0 1 0 #> 3 2 0 2 0 2 0 2 #> <NA> 1 1 1 1 0 0 1# show values tmp#> x1 x2 #> 1 2 4 #> 2 3 7 #> 3 2 4 #> 4 1 2 #> 5 4 1 #> 6 3 1 #> 7 4 3 #> 8 1 4 #> 9 3 5 #> 10 2 3 #> 11 4 7 #> 12 3 3 #> 13 4 4 #> 14 3 3 #> 15 3 5 #> 16 4 2 #> 17 1 4 #> 18 2 6 #> 19 3 1 #> 20 3 7set_na(tmp, na = c("Refused", "No answer"))#> x1 x2 #> 1 2 4 #> 2 NA NA #> 3 2 4 #> 4 1 2 #> 5 NA 1 #> 6 NA 1 #> 7 NA 3 #> 8 1 4 #> 9 NA 5 #> 10 2 3 #> 11 NA NA #> 12 NA 3 #> 13 NA 4 #> 14 NA 3 #> 15 NA 5 #> 16 NA 2 #> 17 1 4 #> 18 2 NA #> 19 NA 1 #> 20 NA NA