In the introductory vignette we learned that creating tidy eval functions boils down to a single pattern: quote and unquote. In this vignette we’ll apply this pattern in a series of recipes for dplyr.
This vignette is organised so that you can quickly find your way to a copy-paste solution when you face an immediate problem.
enquo()
and !!
- Quote and unquote argumentsWe start with a quick recap of the introductory vignette. Creating a function around dplyr pipelines involves three steps: abstraction, quoting, and unquoting.
Abstraction step
First identify the varying parts:
df1 %>% group_by(x1) %>% summarise(mean = mean(y1))
df2 %>% group_by(x2) %>% summarise(mean = mean(y2))
df3 %>% group_by(x3) %>% summarise(mean = mean(y3))
df4 %>% group_by(x4) %>% summarise(mean = mean(y4))
And abstract those away with a informative argument names:
data %>% group_by(group_var) %>% summarise(mean = mean(summary_var))
And wrap in a function:
grouped_mean <- function(data, group_var, summary_var) {
data %>%
group_by(group_var) %>%
summarise(mean = mean(summary_var))
}
Quoting step
Identify all the arguments where the user is allowed to refer to data frame columns directly. The function can’t evaluate these arguments right away. Instead they should be automatically quoted. Apply enquo()
to these arguments
group_var <- enquo(group_var)
summary_var <- enquo(summary_var)
Unquoting step
Identify where these variables are passed to other quoting functions and unquote with !!
. In this case we pass group_var
to group_by()
and summary_var
to summarise()
:
data %>%
group_by(!!group_var) %>%
summarise(mean = mean(!!summary_var))
We end up with a function that automatically quotes its arguments group_var
and summary_var
and unquotes them when they are passed to other quoting functions:
grouped_mean <- function(data, group_var, summary_var) {
group_var <- enquo(group_var)
summary_var <- enquo(summary_var)
data %>%
group_by(!!group_var) %>%
summarise(mean = mean(!!summary_var))
}
grouped_mean(mtcars, cyl, mpg)
#> # A tibble: 3 x 2
#> cyl mean
#> <dbl> <dbl>
#> 1 4 26.7
#> 2 6 19.7
#> 3 8 15.1
quo_name()
- Create default column namesUse quo_name()
to transform a quoted expression to a column name:
simple_var <- quote(height)
quo_name(simple_var)
#> [1] "height"
These names are only a default stopgap. For more complex uses, you’ll probably want to let the user override the default. Here is a case where the default name is clearly suboptimal:
complex_var <- quote(mean(height, na.rm = TRUE))
quo_name(complex_var)
#> [1] "mean(height, na.rm = TRUE)"
:=
and !!
- Unquote column namesIn expressions like c(name = NA)
, the argument name is quoted. Because of the quoting it’s not possible to make an indirect reference to a variable that contains a name:
name <- "the real name"
c(name = NA)
#> name
#> NA
In tidy eval function it is possible to unquote argument names with !!
. However you need the special :=
operator:
rlang::qq_show(c(!!name := NA))
#> c("the real name" := NA)
This unusual operator is needed because using !
on the left-hand side of =
is not valid R code:
rlang::qq_show(c(!!name = NA))
#> Error: <text>:1:25: unexpected '='
#> 1: rlang::qq_show(c(!!name =
#> ^
Let’s use this !!
technique to pass custom column names to group_by()
and summarise()
:
grouped_mean <- function(data, group_var, summary_var) {
group_var <- enquo(group_var)
summary_var <- enquo(summary_var)
# Create default column names
group_nm <- quo_name(group_var)
summary_nm <- quo_name(summary_var)
# Prepend with an informative prefix
group_nm <- paste0("group_", group_nm)
summary_nm <- paste0("mean_", summary_nm)
data %>%
group_by(!!group_nm := !!group_var) %>%
summarise(!!summary_nm := mean(!!summary_var))
}
grouped_mean(mtcars, cyl, mpg)
#> # A tibble: 3 x 2
#> group_cyl mean_mpg
#> <dbl> <dbl>
#> 1 4 26.7
#> 2 6 19.7
#> 3 8 15.1
...
- Forward multiple argumentsWe have created a function that takes one grouping variable and one summary variable. It would make sense to take multiple grouping variables instead of just one. Let’s adjust our function with a ...
argument.
Replace group_var
by ...
:
function(data, ..., summary_var)
Swap ...
and summary_var
because arguments on the right-hand side of ...
are harder to pass. They can only be passed with their full name explictly specified while arguments on the left-hand side can be passed without name:
function(data, summary_var, ...)
It’s good practice to prefix named arguments with a .
to reduce the risk of conflicts between your arguments and the arguments passed to ...
:
function(.data, .summary_var, ...)
Because of the magic of dots forwarding we don’t have to use the quote-and-unquote pattern. We can just pass ...
to other quoting functions like group_by()
:
grouped_mean <- function(.data, .summary_var, ...) {
summary_var <- enquo(.summary_var)
.data %>%
group_by(...) %>% # Forward `...`
summarise(mean = mean(!!summary_var))
}
grouped_mean(mtcars, disp, cyl, am)
#> # A tibble: 6 x 3
#> # Groups: cyl [?]
#> cyl am mean
#> <dbl> <dbl> <dbl>
#> 1 4 0 136.
#> 2 4 1 93.6
#> 3 6 0 205.
#> 4 6 1 155
#> # ... with 2 more rows
Forwarding ...
is straightforward but has the downside that you can’t modify the arguments or their names.
enquos()
and !!!
- Quote and unquote multiple argumentsQuoting and unquoting multiple variables with ...
is pretty much the same process as for single arguments:
Quoting multiple arguments can be done in two ways: internal quoting with the plural variant enquos()
and external quoting with vars()
. Use internal quoting when your function takes expressions with ...
and external quoting when your function takes a list of expressions.
Unquoting multiple arguments requires a variant of !!
, the unquote-splice operator !!!
which unquotes each element of a list as an independent argument in the surrounding function call.
Quote the dots with enquos()
and unquote-splice them with !!!
:
grouped_mean2 <- function(.data, .summary_var, ...) {
summary_var <- enquo(.summary_var)
group_vars <- enquos(...) # Get a list of quoted dots
.data %>%
group_by(!!!group_vars) %>% # Unquote-splice the list
summarise(mean = mean(!!summary_var))
}
grouped_mean2(mtcars, disp, cyl, am)
#> # A tibble: 6 x 3
#> # Groups: cyl [?]
#> cyl am mean
#> <dbl> <dbl> <dbl>
#> 1 4 0 136.
#> 2 4 1 93.6
#> 3 6 0 205.
#> 4 6 1 155
#> # ... with 2 more rows
The quote-and-unquote pattern does more work than simple forwarding of ...
and is functionally identical. Don’t do this extra work unless you need to modify the arguments or their names.
expr()
- Modify quoted argumentsModifying quoted expressions is often necessary when dealing with multiple arguments. Say we’d like a grouped_mean()
variant that takes multiple summary variables rather than multiple grouping variables. We need to somehow take the mean()
of each summary variable.
One easy way is to use the quote-and-unquote pattern with expr()
. This function is just like quote()
from base R. It plainly returns your argument, quoted:
quote(height)
#> height
expr(height)
#> height
quote(mean(height))
#> mean(height)
expr(mean(height))
#> mean(height)
But expr()
has a twist, it has full unquoting support:
vars <- list(quote(height), quote(mass))
expr(mean(!!vars[[1]]))
#> mean(height)
expr(group_by(!!!vars))
#> group_by(height, mass)
You can loop over a list of arguments and modify each of them:
purrr::map(vars, function(var) expr(mean(!!var, na.rm = TRUE)))
#> [[1]]
#> mean(height, na.rm = TRUE)
#>
#> [[2]]
#> mean(mass, na.rm = TRUE)
This makes it easy to take multiple summary variables, wrap them in a call to mean()
, before unquote-splicing within summarise()
:
grouped_mean3 <- function(.data, .group_var, ...) {
group_var <- enquo(.group_var)
summary_vars <- enquos(...) # Get a list of quoted summary variables
summary_vars <- purrr::map(summary_vars, function(var) {
expr(mean(!!var, na.rm = TRUE))
})
.data %>%
group_by(!!group_var) %>%
summarise(!!!summary_vars) # Unquote-splice the list
}
vars()
- Quote multiple arguments externallyHow could we take multiple summary variables in addition to multiple grouping variables? Internal quoting with ...
has a major disadvantage: the arguments in ...
can only have one purpose. If you need to quote multiple sets of variables you have to delegate the quoting to another function. That’s the purpose of vars()
which quotes its arguments and returns a list:
vars(species, gender)
#> [[1]]
#> <quosure>
#> expr: ^species
#> env: global
#>
#> [[2]]
#> <quosure>
#> expr: ^gender
#> env: global
The arguments can be complex expressions and have names:
vars(h = height, m = mass / 100)
#> $h
#> <quosure>
#> expr: ^height
#> env: global
#>
#> $m
#> <quosure>
#> expr: ^mass / 100
#> env: global
When the quoting is external you don’t use enquos()
. Simply take lists of expressions in your function and forward the lists to other quoting functions with !!!
:
grouped_mean3 <- function(data, group_vars, summary_vars) {
stopifnot(
is.list(group_vars),
is.list(summary_vars)
)
summary_vars <- purrr::map(summary_vars, function(var) {
expr(mean(!!var, na.rm = TRUE))
})
data %>%
group_by(!!!group_vars) %>%
summarise(n = n(), !!!summary_vars)
}
grouped_mean3(starwars, vars(species, gender), vars(height))
#> # A tibble: 43 x 4
#> # Groups: species [?]
#> species gender n `mean(height, na.rm = TRUE)`
#> <chr> <chr> <int> <dbl>
#> 1 Aleena male 1 79
#> 2 Besalisk male 1 198
#> 3 Cerean male 1 198
#> 4 Chagrian male 1 196
#> # ... with 39 more rows
grouped_mean3(starwars, vars(gender), vars(height, mass))
#> # A tibble: 5 x 4
#> gender n `mean(height, na.rm = TRUE… `mean(mass, na.rm = TRU…
#> <chr> <int> <dbl> <dbl>
#> 1 female 19 165. 54.0
#> 2 hermaphrodite 1 175 1358
#> 3 male 62 179. 81.0
#> 4 none 2 200 140
#> # ... with 1 more row
One advantage of vars()
is that it lets users specify their own names:
grouped_mean3(starwars, vars(gender), vars(h = height, m = mass))
#> # A tibble: 5 x 4
#> gender n h m
#> <chr> <int> <dbl> <dbl>
#> 1 female 19 165. 54.0
#> 2 hermaphrodite 1 175 1358
#> 3 male 62 179. 81.0
#> 4 none 2 200 140
#> # ... with 1 more row
enquos(.named = TRUE)
- Automatically add default namesIf you pass .named = TRUE
to enquos()
the unnamed expressions are automatically given default names:
f <- function(...) names(enquos(..., .named = TRUE))
f(height, mean(mass))
#> [1] "height" "mean(mass)"
User-supplied names are never overridden:
f(height, m = mean(mass))
#> [1] "height" "m"
This is handy when you need to modify the names of quoted expressions. In this example we’ll ensure the list is named before adding a prefix:
grouped_mean2 <- function(.data, .summary_var, ...) {
summary_var <- enquo(.summary_var)
group_vars <- enquos(..., .named = TRUE) # Ensure quoted dots are named
# Prefix the names of the list of quoted dots
names(group_vars) <- paste0("group_", names(group_vars))
.data %>%
group_by(!!!group_vars) %>% # Unquote-splice the list
summarise(mean = mean(!!summary_var))
}
grouped_mean2(mtcars, disp, cyl, am)
#> # A tibble: 6 x 3
#> # Groups: group_cyl [?]
#> group_cyl group_am mean
#> <dbl> <dbl> <dbl>
#> 1 4 0 136.
#> 2 4 1 93.6
#> 3 6 0 205.
#> 4 6 1 155
#> # ... with 2 more rows
One big downside of this technique is that all arguments get a prefix, including the arguments that were given specific names by the user:
grouped_mean2(mtcars, disp, c = cyl, a = am)
#> # A tibble: 6 x 3
#> # Groups: group_c [?]
#> group_c group_a mean
#> <dbl> <dbl> <dbl>
#> 1 4 0 136.
#> 2 4 1 93.6
#> 3 6 0 205.
#> 4 6 1 155
#> # ... with 2 more rows
In general it’s better to preserve the names explicitly passed by the user. To do that we can’t automatically add default names with enquos()
because once the list is fully named we don’t have any way of detecting which arguments were passed with an explicit names. We’ll have to add default names manually with quos_auto_name()
.
quos_auto_name()
- Manually add default namesIt can be helpful add default names to the list of quoted dots manually:
vars()
.Let’s add default names manually with quos_auto_name()
to lists of externally quoted variables. We’ll detect unnamed arguments and only add a prefix to this subset of arguments. This way we preserve user-supplied names:
grouped_mean3 <- function(data, group_vars, summary_vars) {
stopifnot(
is.list(group_vars),
is.list(summary_vars)
)
# Detect and prefix unnamed arguments:
unnamed <- names(summary_vars) == ""
# Add the default names:
summary_vars <- rlang::quos_auto_name(summary_vars)
prefixed_nms <- paste0("mean_", names(summary_vars)[unnamed])
names(summary_vars)[unnamed] <- prefixed_nms
# Expand the argument _after_ giving the list its default names
summary_vars <- purrr::map(summary_vars, function(var) {
expr(mean(!!var, na.rm = TRUE))
})
data %>%
group_by(!!!group_vars) %>%
summarise(n = n(), !!!summary_vars) # Unquote-splice the renamed list
}
Note how we add the default names before wrapping the arguments in a mean()
call. This way we avoid including mean()
in the name:
quo_name(quote(mass))
#> [1] "mass"
quo_name(quote(mean(mass, na.rm = TRUE)))
#> [1] "mean(mass, na.rm = TRUE)"
We get nicely prefixed default names:
grouped_mean3(starwars, vars(gender), vars(height, mass))
#> # A tibble: 5 x 4
#> gender n mean_height mean_mass
#> <chr> <int> <dbl> <dbl>
#> 1 female 19 165. 54.0
#> 2 hermaphrodite 1 175 1358
#> 3 male 62 179. 81.0
#> 4 none 2 200 140
#> # ... with 1 more row
And the user is able to fully override the names:
grouped_mean3(starwars, vars(gender), vars(h = height, m = mass))
#> # A tibble: 5 x 4
#> gender n h m
#> <chr> <int> <dbl> <dbl>
#> 1 female 19 165. 54.0
#> 2 hermaphrodite 1 175 1358
#> 3 male 62 179. 81.0
#> 4 none 2 200 140
#> # ... with 1 more row
select()
recipesTODO
filter()
recipesTODO
https://stackoverflow.com/questions/51902438/rlangsym-in-anonymous-functions
This is overall a good answer but the issue is not the nested mutate. It’s the unquoting inside an anonymous function, as you have shown with the !!i
example. Unquoting happens immediately and anonymous functions create a future scope, so there’s a timing problem.