prop() calculates the proportion of a value or category in a variable. props() does the same, but allows for multiple logical conditions in one statement. It is similar to mean() with logical predicates, however, both prop() and props() work with grouped data frames.

prop(data, ..., weight.by = NULL, na.rm = TRUE, digits = 4)

props(data, ..., na.rm = TRUE, digits = 4)

Arguments

data

A data frame. May also be a grouped data frame (see 'Examples').

...

One or more value pairs of comparisons (logical predicates). Put variable names the left-hand-side and values to match on the right hand side. Expressions may be quoted or unquoted. See 'Examples'.

weight.by

Vector of weights that will be applied to weight all observations. Must be a vector of same length as the input vector. Default is NULL, so no weights are used.

na.rm

Logical, whether to remove NA values from the vector when the proportion is calculated. na.rm = FALSE gives you the raw percentage of a value in a vector, na.rm = TRUE the valid percentage.

digits

Amount of digits for returned values.

Value

For one condition, a numeric value with the proportion of the values inside a vector. For more than one condition, a tibble with one column of conditions and one column with proportions. For grouped data frames, returns a tibble with one column per group with grouping categories, followed by one column with proportions per condition.

Details

prop() only allows one logical statement per comparison, while props() allows multiple logical statements per comparison. However, prop() supports weighting of variables before calculating proportions, and comparisons may also be quoted. Hence, prop() also processes comparisons, which are passed as character vector (see 'Examples').

Examples

data(efc) # proportion of value 1 in e42dep prop(efc, e42dep == 1)
#> [1] 0.0733
# expression may also be completely quoted prop(efc, "e42dep == 1")
#> [1] 0.0733
# use "props()" for multiple logical statements props(efc, e17age > 70 & e17age < 80)
#> [1] 0.3199
# proportion of value 1 in e42dep, and all values greater # than 2 in e42dep, including missing values. will return a tibble prop(efc, e42dep == 1, e42dep > 2, na.rm = FALSE)
#> # A tibble: 2 x 2 #> condition prop #> <chr> <dbl> #> 1 e42dep==1 0.0727 #> 2 e42dep>2 0.672
# for factors or character vectors, use quoted or unquoted values library(sjmisc) # convert numeric to factor, using labels as factor levels efc$e16sex <- to_label(efc$e16sex) efc$n4pstu <- to_label(efc$n4pstu) # get proportion of female older persons prop(efc, e16sex == female)
#> [1] 0.6715
# get proportion of male older persons prop(efc, e16sex == "male")
#> [1] 0.3285
# "props()" needs quotes around non-numeric factor levels props(efc, e17age > 70 & e17age < 80, n4pstu == 'Care Level 1' | n4pstu == 'Care Level 3' )
#> # A tibble: 2 x 2 #> condition prop #> <chr> <dbl> #> 1 e17age>70&e17age<80 0.320 #> 2 n4pstu==CareLevel1|n4pstu==CareLevel3 0.314
# also works with pipe-chains library(dplyr) efc %>% prop(e17age > 70)
#> [1] 0.8092
efc %>% prop(e17age > 70, e16sex == 1)
#> # A tibble: 2 x 2 #> condition prop #> <chr> <dbl> #> 1 e17age>70 0.809 #> 2 e16sex==1 0
# and with group_by efc %>% group_by(e16sex) %>% prop(e42dep > 2)
#> # A tibble: 2 x 2 #> `elder's gender` `e42dep>2` #> <chr> <dbl> #> 1 male 0.685 #> 2 female 0.674
efc %>% select(e42dep, c161sex, c172code, e16sex) %>% group_by(c161sex, c172code) %>% prop(e42dep > 2, e16sex == 1)
#> # A tibble: 6 x 4 #> `carer's gender` `carer's level of education` `e42dep>2` `e16sex==1` #> <chr> <chr> <dbl> <dbl> #> 1 Male low level of education 0.683 0 #> 2 Male intermediate level of education 0.659 0 #> 3 Male high level of education 0.787 0 #> 4 Female low level of education 0.710 0 #> 5 Female intermediate level of education 0.593 0 #> 6 Female high level of education 0.688 0
# same for "props()" efc %>% select(e42dep, c161sex, c172code, c12hour, n4pstu) %>% group_by(c161sex, c172code) %>% props( e42dep > 2, c12hour > 20 & c12hour < 40, n4pstu == 'Care Level 1' | n4pstu == 'Care Level 3' )
#> # A tibble: 6 x 5 #> `carer's gender` `carer's level of educatio~ `e42dep>2` `c12hour>20&c12hour<~ #> <chr> <chr> <dbl> <dbl> #> 1 Male low level of education 0.683 0.244 #> 2 Male intermediate level of educ~ 0.659 0.176 #> 3 Male high level of education 0.787 0.149 #> 4 Female low level of education 0.710 0.196 #> 5 Female intermediate level of educ~ 0.593 0.150 #> 6 Female high level of education 0.688 0.202 #> # ... with 1 more variable: `n4pstu==CareLevel1|n4pstu==CareLevel3` <dbl>