4 Vector functions

Vector functions take a vector as input and produce a vector of the same length as output.

Vector functions make working with vectors easy. For example, log10(), like most mathematical functions in R, is a vector function, which allows you to take the log of each element in a vector all at once.

The simple mathematical operators are also vector functions:

In contrast, functions that can only take a length one input and produce a length one output are called scalar functions.

As you’ll see in the next section, the distinction between scalar and vector functions is important when working with tibbles.

4.2 Vector functions and mutate()

Recall that a tibble is a list of vectors. Each column of the tibble is a vector, and all these vectors have to be the same length. New columns must also be vectors of the same length, which means that when you use mutate() to create a new column, mutate() has to create a new vector of the correct length.

If you want, you can actually explicitly give mutate() a vector of the correct length.

You can also give mutate() a single value, and it will repeat that value until it has a vector of the correct length.

However, if you try to give mutate() a vector with a length other than 1 or nrow(df), you’ll get an error:

mutate() doesn’t know how to turn a length 2 vector into a vector that has a value for each row in the tibble.

As you already know, you usually create new columns by applying functions to existing ones. Say we want to convert our temperatures from Fahrenheit to Celcius.

When you reference a column inside mutate(), you reference the entire vector. So when we pass temperature to fahrenheit_to_celcius(), we pass the entire temperature vector.

Mathematical operations are vectorized, so fahrenheit_to_celcius() takes the temperature vector and return a vector of the same length.

mutate() then takes this new vector and succesfully adds a column to the tibble.

You can probably predict now what will happen if we try to use our scalar function, recommendation_1(), in the same way:

mutate() passes the entire temperature vector to recommendation_1(), which can’t handle a vector and so only processes the first element of temperature.

However, because of how mutate() behaves when given a single value, the recommendation for the first temperature is copied for every single row.

This isn’t very helpful, because now our tibble gives the same recommendation for every temperature.

4.3 Vectorizing if-else statements

There are several ways to vectorize recommendation_1() so that it gives an accurate recommendation for each temperature in df.

First, there’s a vectorized if-else function called if_else():

However, in order to rewrite recommendation_1() using if_else(), we’d need to nest if_else() repeatedly and the function would become difficult to read. Another vector function, case_when(), is a better option.

For other helpful vector functions, take a look at the “Vector Functions” section of the dplyr cheat sheet.