We have created a function that takes one grouping variable and one summary variable. It would make sense to take multiple grouping variables instead of just one. Quoting and unquoting multiple variables is pretty much the same process as for single arguments:
Unquoting multiple arguments requires a variant of !!
, the big bang operator !!!
.
Quoting multiple arguments can be done in two ways: internal quoting with the plural variant enquos()
and external quoting with vars()
.
The dot-dot-dot argument is one of the nicest aspect of the R language. A function that takes ...
accepts any number of arguments, named or unnamed. As a programmer you can do three things with ...
:
Evaluate the arguments contained in the dots and materialise them in a list by forwarding the dots to list()
:
materialise <- function(data, ...) {
dots <- list(...)
dots
}
The dots names conveniently become the names of the list:
materialise(mtcars, 1 + 2, important_name = letters)
#> [[1]]
#> [1] 3
#>
#> $important_name
#> [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
#> [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
Quote the arguments in the dots with enquos()
:
capture <- function(data, ...) {
dots <- enquos(...)
dots
}
All arguments passed to ...
are automatically quoted and returned as a list. The names of the arguments become the names of that list:
capture(mtcars, 1 + 2, important_name = letters)
#> [[1]]
#> <quosure>
#> expr: ^1 + 2
#> env: global
#>
#> $important_name
#> <quosure>
#> expr: ^letters
#> env: global
Forward the dots to another function:
forward <- function(data, ...) {
forwardee(...)
}
When dots are forwarded the names of arguments in ...
are matched to the arguments of the forwardee:
forwardee <- function(foo, bar, ...) {
list(foo = foo, bar = bar, ...)
}
Let’s call the forwarding function with a bunch of named and unnamed arguments:
forward(mtcars, bar = 100, 1, 2, 3)
#> $foo
#> [1] 1
#>
#> $bar
#> [1] 100
#>
#> [[3]]
#> [1] 2
#>
#> [[4]]
#> [1] 3
The unnamed argument 1
was matched to foo
positionally. The named argument bar
was matched to bar
. The remaining arguments were passed in order.
For the purpose of writing tidy eval functions the last two techniques are important. There are two distinct situations:
You don’t need to modify the arguments in any way, just passing them through. Then simply forward ...
to other quoting functions in the ordinary way.
You’d like to change the argument names (which become column names in dplyr::mutate()
calls) or modify the arguments themselves (for instance negate a dplyr::select()
ion). In that case you’ll need to use enquos()
to quote the arguments in the dots. You’ll then pass the quoted arguments to other quoting functions by forwarding them with the help of !!!
.
...
If you are not modifying the arguments in ...
in any way and just want to pass them to another quoting function, just forward ...
like usual! There is no need for quoting and unquoting because of the magic of forwarding. The arguments in ...
are transported to their final destination where they will be quoted.
The function grouped_mean()
is still going to need some remodelling because it is good practice to take all important named arguments before the dots. Let’s start by swapping grouped_var
and summary_var
:
grouped_mean <- function(data, summary_var, group_var) {
summary_var <- enquo(summary_var)
group_var <- enquo(group_var)
data %>%
group_by(!!group_var) %>%
summarise(mean = mean(!!summary_var))
}
Then we replace group_var
with ...
and pass it to group_by()
:
grouped_mean <- function(data, summary_var, ...) {
summary_var <- enquo(summary_var)
data %>%
group_by(...) %>%
summarise(mean = mean(!!summary_var))
}
It is good practice to make one final adjustment. Because arguments in ...
can have arbitrary names, we don’t want to “use up” valid names. In tidyverse packages we use the convention of prefixing named arguments with a dot so that conflicts are less likely:
grouped_mean <- function(.data, .summary_var, ...) {
.summary_var <- enquo(.summary_var)
.data %>%
group_by(...) %>%
summarise(mean = mean(!!.summary_var))
}
Let’s check this function now works with any number of grouping variables:
grouped_mean(mtcars, disp, cyl, am)
#> # A tibble: 6 x 3
#> # Groups: cyl [?]
#> cyl am mean
#> <dbl> <dbl> <dbl>
#> 1 4 0 136.
#> 2 4 1 93.6
#> 3 6 0 205.
#> 4 6 1 155
#> # ... with 2 more rows
grouped_mean(mtcars, disp, cyl, am, vs)
#> # A tibble: 7 x 4
#> # Groups: cyl, am [?]
#> cyl am vs mean
#> <dbl> <dbl> <dbl> <dbl>
#> 1 4 0 1 136.
#> 2 4 1 0 120.
#> 3 4 1 1 89.8
#> 4 6 0 1 205.
#> # ... with 3 more rows
When we need to modify the arguments or their names, we can’t simply forward the dots. We’ll have to quote and unquote with the plural variants of enquo()
and !!
.
enquos()
.!!!
.While the singular enquo()
returns a single quoted argument, the plural variant enquos()
returns a list of quoted arguments. Let’s use it to quote the dots:
grouped_mean2 <- function(data, summary_var, ...) {
summary_var <- enquo(summary_var)
group_vars <- enquos(...)
data %>%
group_by(!!group_vars) %>%
summarise(mean = mean(!!summary_var))
}
grouped_mean()
now accepts and automatically quotes any number of grouping variables. However it doesn’t work quite yet:
FIXME: Depend on dev rlang to get a better error message.
grouped_mean2(mtcars, disp, cyl, am)
#> Error in mutate_impl(.data, dots): Column `structure(list(~cyl, ~am), .Names = c("", ""), class = "quosures")` must be length 32 (the number of rows) or one, not 2
Instead of forwarding the individual arguments to group_by()
we have passed the list of arguments itself! Unquoting is not the right operation here. Fortunately tidy eval provides a special operator that makes it easy to forward a list of arguments.
The unquote-splice operator !!!
takes each element of a list and unquotes them as independent arguments to the surrounding function call. The arguments are spliced in the function call. This is just what we need for forwarding multiple quoted arguments.
Let’s use qq_show()
to observe the difference between !!
and !!!
in a group_by()
expression. We can only use enquos()
within a function so let’s create a list of quoted names for the purpose of experimenting:
vars <- list(
quote(cyl),
quote(am)
)
qq_show()
shows the difference between unquoting a list and unquote-splicing a list:
rlang::qq_show(group_by(!!vars))
#> group_by(<list: cyl, am>)
rlang::qq_show(group_by(!!!vars))
#> group_by(cyl, am)
When we use the unquote operator !!
, group_by()
gets a list of expressions. When we unquote-splice with !!!
, the expressions are forwarded as individual arguments to group_by()
. Let’s use the latter to can fix grouped_mean2()
:
grouped_mean2 <- function(.data, .summary_var, ...) {
summary_var <- enquo(.summary_var)
group_vars <- enquos(...)
.data %>%
group_by(!!!group_vars) %>%
summarise(mean = mean(!!summary_var))
}
The quote and unquote version of grouped_mean()
does a bit more work but is functionally identical to the forwarding version:
grouped_mean(mtcars, disp, cyl, am)
#> # A tibble: 6 x 3
#> # Groups: cyl [?]
#> cyl am mean
#> <dbl> <dbl> <dbl>
#> 1 4 0 136.
#> 2 4 1 93.6
#> 3 6 0 205.
#> 4 6 1 155
#> # ... with 2 more rows
grouped_mean2(mtcars, disp, cyl, am)
#> # A tibble: 6 x 3
#> # Groups: cyl [?]
#> cyl am mean
#> <dbl> <dbl> <dbl>
#> 1 4 0 136.
#> 2 4 1 93.6
#> 3 6 0 205.
#> 4 6 1 155
#> # ... with 2 more rows
When does it become useful to do all this extra work? Whenever you need to modify the arguments or their names.
Up to now we have used the quote-and-unquote pattern to pass quoted arguments to other quoting functions “as is”. With this simple and powerful pattern you can extract complex combinations of quoting verbs into reusable functions.
However tidy eval provides much more flexibility. It is a general purpose meta-programming framework that makes it easy to modify quoted arguments before evaluation. In this section you’ll learn about basic metaprogramming patterns.
Functions like grouped_mean()
create new columns in the data frame. It might be helpful to automatically create names that reflect the meaning of those columns. In this section you’ll learn how to create default names for quoted arguments and how to unquote names.
If you are familiar with dplyr you have probably noticed that new columns are given default names when you don’t supply one explictly to mutate()
or summarise()
. These default names are not practical for further manipulation but they are helpful to remind rushed users what their new column is about:
starwars %>% summarise(average = mean(height, na.rm = TRUE))
#> # A tibble: 1 x 1
#> average
#> <dbl>
#> 1 174.
starwars %>% summarise(mean(height, na.rm = TRUE))
#> # A tibble: 1 x 1
#> `mean(height, na.rm = TRUE)`
#> <dbl>
#> 1 174.
You can create default names by applying quo_name()
to any expressions:
var1 <- quote(height)
var2 <- quote(mean(height))
quo_name(var1)
#> [1] "height"
quo_name(var2)
#> [1] "mean(height)"
Including automatically quoted arguments:
arg_name <- function(var) {
var <- enquo(var)
quo_name(var)
}
arg_name(height)
#> [1] "height"
arg_name(mean(height))
#> [1] "mean(height)"
Lists of quoted expressions require a different approach because we don’t want to override user-supplied names. The easiest way is call enquos()
with .named = TRUE
. When this option, all unnamed arguments get a default name:
args_names <- function(...) {
vars <- enquos(..., .named = TRUE)
names(vars)
}
args_names(mean(height), weight)
#> [1] "mean(height)" "weight"
args_names(avg = mean(height), weight)
#> [1] "avg" "weight"
Argument names are one of the most common occurrence of quotation in R. There is no fundamental difference between these two ways of creating a "myname"
string:
names(c(Mickey = NA))
#> [1] "Mickey"
quo_name(quote(Mickey))
#> [1] "Mickey"
Where there is quotation it is natural to have unquotation. For this reason, tidy eval makes it possible to use !!
to unquote names. Unfortunately we’ll have to use a somewhat peculiar syntax to unquote names because using complex expressions on the left-hand side of =
is not valid R code:
nm <- "Mickey"
args_names(!!nm = 1)
#> Error: <text>:2:17: unexpected '='
#> 1: nm <- "Mickey"
#> 2: args_names(!!nm =
#> ^
Instead you’ll have to unquote of the LHS of :=
. This vestigial operator is interpreted by tidy eval functions in exactly the same way as =
but with !!
support:
nm <- "Mickey"
args_names(!!nm := 1)
#> [1] "Mickey"
Another way of achieving the same result is to splice a named list of arguments:
args <- setNames(list(1), nm)
args_names(!!!args)
#> [1] "Mickey"
This works because !!!
uses the names of the list as argument names. This is a great pattern when you are dealing with multiple arguments:
nms <- c("Mickey", "Minnie")
args <- setNames(list(1, 2), nms)
args_names(!!!args)
#> [1] "Mickey" "Minnie"
Now that we know how to unquote argument, let’s apply informative prefixes to the names of the columns created in grouped_mean()
. We’ll start with the summary variable:
!!
and :=
.grouped_mean2 <- function(.data, .summary_var, ...) {
summary_var <- enquo(.summary_var)
group_vars <- enquos(...)
# Get and modify the default name
summary_nm <- quo_name(summary_var)
summary_nm <- paste0("avg_", summary_nm)
.data %>%
group_by(!!!group_vars) %>%
summarise(!!summary_nm := mean(!!summary_var)) # Unquote the name
}
grouped_mean2(mtcars, disp, cyl, am)
#> # A tibble: 6 x 3
#> # Groups: cyl [?]
#> cyl am avg_disp
#> <dbl> <dbl> <dbl>
#> 1 4 0 136.
#> 2 4 1 93.6
#> 3 6 0 205.
#> 4 6 1 155
#> # ... with 2 more rows
names(grouped_mean2(mtcars, disp, cyl, am))
#> [1] "cyl" "am" "avg_disp"
Regarding the grouping variables, this is a case where explictly quoting and unquoting ...
pays off because we need to change the names of the list of quoted dots:
.named = TRUE
.grouped_mean2 <- function(.data, .summary_var, ...) {
summary_var <- enquo(.summary_var)
# Quote the dots with default names
group_vars <- enquos(..., .named = TRUE)
summary_nm <- quo_name(summary_var)
summary_nm <- paste0("avg_", summary_nm)
# Modify the names of the list of quoted dots
names(group_vars) <- paste0("groups_", names(group_vars))
.data %>%
group_by(!!!group_vars) %>% # Unquote-splice as usual
summarise(!!summary_nm := mean(!!summary_var))
}
grouped_mean2(mtcars, disp, cyl, am)
#> # A tibble: 6 x 3
#> # Groups: groups_cyl [?]
#> groups_cyl groups_am avg_disp
#> <dbl> <dbl> <dbl>
#> 1 4 0 136.
#> 2 4 1 93.6
#> 3 6 0 205.
#> 4 6 1 155
#> # ... with 2 more rows
names(grouped_mean2(mtcars, disp, cyl, am))
#> [1] "groups_cyl" "groups_am" "avg_disp"
The quote-and-unquote pattern is a powerful and versatile technique. In this section we’ll use it for modifying quoted arguments.
Say we would like a version of grouped_mean()
where we take multiple summary variables rather than multiple grouping variables. We could start by replacing summary_var
with the ...
argument:
grouped_mean3 <- function(.data, .group_var, ...) {
group_var <- enquo(.group_var)
summary_vars <- enquos(..., .named = TRUE)
.data %>%
group_by(!!group_var) %>%
summarise(!!!summary_vars) # How do we take the mean?
}
The quoting part is easy. But how do we go about taking the average of each argument before passing them on to summarise()
? We’ll have to modify the list of summary variables.
expr()
Quoting and unquoting is an effective technique for modifying quoted expressions. But we’ll need to add one more function to our toolbox to work around the lack of unquoting support in quote()
.
As we saw, the fundamental quoting function in R is quote()
. All it does is return its quoted argument:
quote(mean(mass))
#> mean(mass)
quote()
does not support quasiquotation but tidy eval provides a variant that does. With expr()
, you can quote expressions with full unquoting support:
vars <- list(quote(mass), quote(height))
expr(mean(!!vars[[1]]))
#> mean(mass)
expr(group_by(!!!vars))
#> group_by(mass, height)
Note what just happened: by quoting-and-unquoting, we have expanded existing quoted expressions! This is the key to modifying expressions before passing them on to other quoting functions. For instance we could loop over the summary variables and unquote each of them in a mean()
expression:
purrr::map(vars, function(var) expr(mean(!!var, na.rm = TRUE)))
#> [[1]]
#> mean(mass, na.rm = TRUE)
#>
#> [[2]]
#> mean(height, na.rm = TRUE)
Let’s fix grouped_mean3()
using this pattern:
grouped_mean3 <- function(.data, .group_var, ...) {
group_var <- enquo(.group_var)
summary_vars <- enquos(..., .named = TRUE)
# Wrap the summary variables with mean()
summary_vars <- purrr::map(summary_vars, function(var) {
expr(mean(!!var, na.rm = TRUE))
})
# Prefix the names with `avg_`
names(summary_vars) <- paste0("avg_", names(summary_vars))
.data %>%
group_by(!!group_var) %>%
summarise(!!!summary_vars)
}
grouped_mean3(starwars, species, height)
#> # A tibble: 38 x 2
#> species avg_height
#> <chr> <dbl>
#> 1 Aleena 79
#> 2 Besalisk 198
#> 3 Cerean 198
#> 4 Chagrian 196
#> # ... with 34 more rows
grouped_mean3(starwars, species, height, mass)
#> # A tibble: 38 x 3
#> species avg_height avg_mass
#> <chr> <dbl> <dbl>
#> 1 Aleena 79 15
#> 2 Besalisk 198 102
#> 3 Cerean 198 82
#> 4 Chagrian 196 NaN
#> # ... with 34 more rows