13 Basics

Likely without realizing it, you’ve been using two types of variables in R. First, there are what we’ll call environment variables. When you create an environment variable, you bind a value to a name in the current environment. For example, the following code creates an environment variable named y that refers to 1.

Now, when we call y, we get whatever value y refers to.

The second type of variable we’ll call data variables. Data variables refer to columns in data frames, and only make sense in the context of those data frames. For example, manufacturer is a data variable that exists in the context of mpg. dplyr, ggplot2, and other tidyverse packages understand this, which is why functions like select() behave as you want.

Instead of figuring out what manufacturer refers to in the environment, select() looks for a column named “manufacturer” in mpg.

Let’s take a look at our failed function from the intro again.

group_by() thinks that var_group is a data variable, and so it looks inside mpg for a column called “var_group”, doesn’t find it, and so throws an error.

We actually want group_var to behave as a hybrid between an environment variable and a data variable. Like an environment variable, we want it to refer to another value (manufacturer). Then, we want group_by() to treat that value (manufacturer) as a data variable and look inside mpg for the matching column.

13.2 Forwarding ...

You might have noticed that some functions, like scoped verbs and the purrr functions, take ... as a final argument, allowing you to specify any number of additional arguments. You can use ... in your own functions. There are two common use-cases.

13.2.1 Passing full expressions

Functions like filter() take expressions, like year == 1999 or manufacturer == "audi". If you want to build a function that takes full expressions, you can use ....

... can take any number of arguments, so we can filter by an unlimited number of conditions.

mpg_filter() forwards ... to filter(), which allows filter() to act on the contents of ... just as it would outside of the function.

Here’s another example that uses select().

13.3 Assigning names

When you want to pass the name of a column into your function, you need to:

  • Embrace the name with {{ }}.
  • Use := instead of = to assign the name.

You have to use := instead of just plain = because you can’t use {{ }} on both sides of a =.

(:= is called the walrus operator because it looks like a sideways walrus.)

13.4 Recoding

Say you want to recode a variable:

It’s often a good idea to store your recode mapping as a vector in your parameters section. To get this to work, you’ll need another tidyeval operator, !!!.

!!! has two tasks:

  • Unpack recode_drv so that each element is passed as a separate argument to recode() (i.e., "f" = "front", "r" = "rear", "4" = "four" instead of c("f" = "front", "r" = "rear", "4" = "four")).
  • Make sure that recode() treats the individual elements as data variables.