Standard Non-Standard Evaluation: Tidy Eval

2018/03/13

Introduction

Hadley Wickham is creating awesome new resources for learning tidy eval: new Advanced R chapters, a youtube video, and talks.

I’ve been slowly working my way through this material, but you never really learn anything until you teach it. So I thought I’d share my understanding of tidy eval and how I’ve used in in my new package leadr.

Typed Words and Variables

The central issue tidy eval solves, as I understand it, is creating the distinction between typed words and variables.

For example:

bar <- 4
foo(bar)

The actual typed word is bar, but the value of variable bar is 4.

Sometimes, however, we want the typed word to have meaning. For example,

library(tidyverse)
iris %>%
  group_by(Species) %>%
  summarise(n = n())
## # A tibble: 3 x 2
##   Species        n
##   <fct>      <int>
## 1 setosa        50
## 2 versicolor    50
## 3 virginica     50

Note that Species has no value in the current R environment:

Species
## Error in eval(expr, envir, enclos): object 'Species' not found

Yet in the function group_by the typed word Species to has special meaning: it refers to that column in the iris dataset.

The rest of the post will be looking at various ways we can give typed words meaning.

Strings

Extending the example in the previous section, say we don’t know the group_by variable ahead time and we read it in as a string:

group_var <- "Species"
iris %>%
  group_by(group_var) %>%
  summarise(n = n())
## Error in grouped_df_impl(data, unname(vars), drop): Column `group_var` is unknown

Reading the error message, group_by thinks that group_var is the column name. Remember, group_by just reads the words typed between the parentheses. It doesn’t care about the value of the variable.

So we need a way to tell group_by that it should use the value of group_var, not the actual thing typed out between the parentheses.

We can do that by combining rlang::sym1 with !!. sym makes the value of the variable a set of typed words, and !! tells group_by2 that it should use the value of the variable (which is a set of typed words) instead of the typed words actually in the parentheses.

library(rlang)
group_var <- "Species"
group_var <- sym(group_var)
iris %>%
  group_by(!!group_var) %>%
  summarise(n = n())
## # A tibble: 3 x 2
##   Species        n
##   <fct>      <int>
## 1 setosa        50
## 2 versicolor    50
## 3 virginica     50

It bears repeating: !! tells group_by to use the value of the variable between the parentheses, and not the actual typed words between the parentheses.

We can also see this from group_by’s perspective with expr. This is useful for debugging when the expressions get more complicated:

group_var <- "Species"
group_var <- sym(group_var)
expr(
  iris %>%
    group_by(!!group_var) %>%
    summarise(n = n())
)
## iris %>% group_by(Species) %>% summarise(n = n())

expr shows us the expression being evaluated.

And the sym/!! pair works with functions too.

grouper <- function(data, group_var) {
  group_var <- sym(group_var)
  data %>%
    group_by(!!group_var) %>%
    summarise(n = n())
}
grouper(iris, "Species")
## # A tibble: 3 x 2
##   Species        n
##   <fct>      <int>
## 1 setosa        50
## 2 versicolor    50
## 3 virginica     50

Expressions

What we if want to use Species directly with grouper like we do with group_by? That is, we’d like to type in Species in the group_var argument for grouper.

grouper(iris, Species)
## Error in is_symbol(x): object 'Species' not found

Clearly we need to change something. sym converts a string to a set of typed words, but our argument is different. Instead of passing a string, we are passing the typed word Species which has no value in the environment.

The solution is rlang::expr.

group_var <- expr(Species)
iris %>%
  group_by(!!group_var) %>%
  summarise(n = n())
## # A tibble: 3 x 2
##   Species        n
##   <fct>      <int>
## 1 setosa        50
## 2 versicolor    50
## 3 virginica     50

The variable group_var now represents the typed word Species.

print(group_var)
## Species

Of course we still need !! to tell group_by to use the value of group_var and not the typed word group_var in the parentheses.

Again, we can verify/debug this process by enclosing everything in expr:

expr(
  iris %>%
  group_by(!!group_var) %>%
  summarise(n = n())
)
## iris %>% group_by(Species) %>% summarise(n = n())

Without !!, group_by thinks we typed in group_var:

expr(
  iris %>%
  group_by(group_var) %>%
  summarise(n = n())
)
## iris %>% group_by(group_var) %>% summarise(n = n())

Functions

Great, let’s wrap this into a function!

grouper <- function(data, group_var) {
  group_expr <- expr(group_var)
  data %>%
    group_by(!!group_expr) %>%
    summarise(n = n())
}
grouper(iris, Species)
## Error in grouped_df_impl(data, unname(vars), drop): Column `group_var` is unknown

Once we think about this, we start to understand how literal expr is. Before with group_var <- expr(Species), expr created an expression from the typed word Species that we typed between the parentheses. Here, group_expr <- expr(group_var) is creating an expression from the typed word group_var.

We can work around it like this:

grouper <- function(data, group_var) {
  data %>%
    group_by(!!group_var) %>%
    summarise(n = n())
}
grouper(iris, expr(Species))
## # A tibble: 3 x 2
##   Species        n
##   <fct>      <int>
## 1 setosa        50
## 2 versicolor    50
## 3 virginica     50

But we don’t want to have to wrap with expr. The whole point is to be able to write Species without worrying about strings or functions.

Quosures

We need a way to tell expr to not be so literal: don’t make the typed words between the parentheses the expression. Rather, make the value of that variable (which itself is a set of typed words) the expression.

We can solve this with rlang::enquo.

grouper <- function(data, group_var) {
  group_quo <- enquo(group_var)
  print(group_quo)
  data %>%
    group_by(!!group_quo) %>%
    summarise(n = n())
}
grouper(iris, Species)
## <quosure>
##   expr: ^Species
##   env:  global
## # A tibble: 3 x 2
##   Species        n
##   <fct>      <int>
## 1 setosa        50
## 2 versicolor    50
## 3 virginica     50

Notice that enquo creates an object called a quosure that contains the expression (the typed word Species) and the environment in which it was typed.

This is crucial to solve the expr problem. enquo isn’t being so literal to think that the words typed in this environment (like group_var) is the actual name to use. Rather, it looks to the previous environment global to see that we actually typed Species.

Nested Functions

Suppose grouper is part of a larger function:

super_grouper <- function(data, super_group_var) {
  # do other stuff
  grouper(data, super_group_var)
}
super_grouper(iris, Species)
## <quosure>
##   expr: ^super_group_var
##   env:  0x7ff7362e52d0
## Error in grouped_df_impl(data, unname(vars), drop): Column `super_group_var` is unknown

Here, grouper (within super_grouper) is trying to group on the column super_group_var. This lets us know the limitations of enquo.

Look to the printed quosure object. First, notice that the environment is no longer the global environment where we actually typed Species. Rather, it is in the super_grouper environment where the typed name is super_group_var.

Therefore, enquo can only find the typed name from the parent environment.

To make this work, grouper needs to think that the typed word in the parent environment is Species. That’s exactly the problem same problem we solved with !! and enquo previously!

super_grouper <- function(data, super_group_var) {
  # do other stuff
  grouper(data, !!enquo(super_group_var))
}
super_grouper(iris, Species)
## <quosure>
##   expr: ^Species
##   env:  global
## # A tibble: 3 x 2
##   Species        n
##   <fct>      <int>
## 1 setosa        50
## 2 versicolor    50
## 3 virginica     50

Parsing Expressions

When reading and working with data, sometimes we need to convert strings to an R expression that can be evaluted.

These expressions are different from the typed words we made from strings with rlang::sym3, because these expressions need to be evaluated as valid R code.

In this section, we’ll use rlang::parse_expr to solve a situation I encountered when writing leadr::peak4 using the techniques described in this Stackoverflow answer.

Suppose we are given the following data:

col_names <- c("Sepal.Length", "Sepal.Width")
col_values <- c(5.0, 3.6)

We need to use this data to filter the iris dataset. If we were to do it manually, it would look like this:

iris %>% 
  filter(Sepal.Length == 5.0 & Sepal.Width == 3.6)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1            5         3.6          1.4         0.2  setosa

We could handle it as before:

iris %>%
  filter(!!sym(col_names[1]) == col_values[1] &
         !!sym(col_names[2]) == col_values[2])
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1            5         3.6          1.4         0.2  setosa

But there’s a catch. The length of col_names and col_values are not known ahead of time.

Unknown Number of Columns

If the input data has multiple columns, we need to programmatically add a statement like col_names[n] == col_value[n] for each pair of name and value. The collapse parameter in the base function paste does this5:

collapsed <- paste(col_names, "==", col_values, collapse = "&")
collapsed
## [1] "Sepal.Length == 5&Sepal.Width == 3.6"

collapsed is a string, so we might try sym:

iris %>%
  filter(!!sym(collapsed))
## Error in filter_impl(.data, quo): Evaluation error: object 'Sepal.Length == 5&Sepal.Width == 3.6' not found.

But it doesn’t work, because sym creates the “name” of some object (that may or may not exist in the current environment), not code to be evaluated.

We need collapsed to be evaluated as a valid R expression. We do this with rlang::parse_expr:

iris %>%
  filter(!!parse_expr(collapsed))
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1            5         3.6          1.4         0.2  setosa

Using expr, we verify that the previous expression is evaluated as intended:

expr(
  iris %>%
    filter(!!parse_expr(collapsed))
)
## iris %>% filter(Sepal.Length == 5 & Sepal.Width == 3.6)

This is exactly what we typed manually at the beginning of this section.

Conclusions

Tidy eval really clicked for me when I thought about the difference between the typed words and variables. I’m not sure if this metaphor holds up to the formal definitions, but it has helped understand enough tidy eval to use it in my own work.

I read a lot on the topic, but didn’t include everything in the post. Here’s some more recommended reading:


  1. The base function as.name works as well if you don’t want to import rlang.

  2. Or any other tidy eval function.

  3. As I learned by asking in this stackoverflow question and in this Github issue.

  4. This is the source code.

  5. The nice thing about this solution is that we can replace "&" with "||". Other methods such as !!! only support & filters.