HEAD
count_data returns the number and percentage of observations for
categorical variables.
count_data(data, ..., na.rm = FALSE)A data frame.
One or more unquoted (categorical) column names from the data frame, separated by commas.
Logical. Should missing values (including NaN) be removed?
The data frame can be grouped using dplyr's group_by
so that the number of observations will be calculated within each group
level.
# Load dplyr for access to the %>% operator and group_by()
library(dplyr)
# 1 variable
count_data(quote_source, source)
#> # A tibble: 2 × 3
#> source n pct
#> <chr> <int> <dbl>
#> 1 Bin Laden 3101 48.9
#> 2 Washington 3242 51.1
# 2 variables
count_data(quote_source, source, sex)
#> # A tibble: 6 × 4
#> source sex n pct
#> <chr> <chr> <int> <dbl>
#> 1 Bin Laden female 2067 32.6
#> 2 Bin Laden male 1029 16.2
#> 3 Bin Laden NA 5 0.0788
#> 4 Washington female 2206 34.8
#> 5 Washington male 1031 16.3
#> 6 Washington NA 5 0.0788
# Ignore missing values
count_data(quote_source, source, sex, na.rm = TRUE)
#> # A tibble: 4 × 4
#> source sex n pct
#> <chr> <chr> <int> <dbl>
#> 1 Bin Laden female 2067 32.6
#> 2 Bin Laden male 1029 16.2
#> 3 Washington female 2206 34.8
#> 4 Washington male 1031 16.3
# Use group_by() to get percentages within each group
quote_source %>%
group_by(source) %>%
count_data(sex)
#> # A tibble: 6 × 4
#> # Groups: source [2]
#> source sex n pct
#> <chr> <chr> <int> <dbl>
#> 1 Bin Laden female 2067 66.7
#> 2 Bin Laden male 1029 33.2
#> 3 Bin Laden NA 5 0.161
#> 4 Washington female 2206 68.0
#> 5 Washington male 1031 31.8
#> 6 Washington NA 5 0.154