# 2 Vectors, lists, and tibbles

In R, vectors are the most common data structure. In this book, we’ll often represent vectors like this:

Each orange cell represents one element of the vector. As you’ll see, different kinds of vectors can hold different kinds of elements.

There are two kinds of vectors: *atomic vectors* and *lists*. *Tibbles* are a specific kind of list.

In this chapter, we’ll cover these three data structures, explaining how they differ and showing you how to manipulate each one.

## 2.1 Atomic vectors

Atomic vectors are the “atoms” of R—the simple building blocks upon which all else is built. There are four types of atomic vector that are important for data analysis:

**integer**vectors (`<int>`

) contain integers.**double**vectors (`<dbl>`

) contain real numbers.**character**vectors (`<chr>`

) contain strings made with`""`

.**logical**vectors (`<lgl>`

) contain`TRUE`

or`FALSE`

.

Integer atomic vectors contain only integers, double atomic vectors contain only doubles, and so on. Together, integer and double vectors are known as *numeric* vectors. All vectors can also contain the missing value `NA`

.

In R, single numbers, logicals, and strings are just atomic vectors of length 1, so

creates a character vector. Likewise,

creates a double vector.

To create atomic vectors with more than one element, use `c()`

to combine values.

```
logical <- c(TRUE, FALSE, FALSE)
double <- c(1.5, 2.8, pi)
character <- c("this", "is", "a character", "vector")
```

To create an integer vector by hand, you’ll need to add `L`

to the end of each number.

Without the `L`

s, R will create a vector of doubles.

### 2.1.1 Properties

Vectors (both atomic vectors and lists) all have two key properties: *type* and *length*.

You can check the type of any vector with `typeof()`

.

Use `length()`

to find a vector’s length.

Atomic vectors can also have names.

```
v_named <- c(guava = 2, pineapple = 4, dragonfruit = 1)
v_named
#> guava pineapple dragonfruit
#> 2 4 1
```

You can access a vector’s names with `names()`

.

### 2.1.2 Subsetting

`v`

is an atomic vector of doubles.

We can *subset* `v`

to select specific elements, ignoring the others.

The operators `[`

and `[[`

subset vectors. Use `[`

to select multiple elements, and `[[`

to select just one. We’ll cover four ways to use `[`

and then discuss `[[`

.

**Positive integers**

Subset with a vector of positive integers to extract elements by position.

Note that, in R, indices start at 1, not 0, so the above code extracts the first two elements of `v`

.

You can also use `:`

to create a vector of adjacent integers. The following select the first three elements of `v`

.

**Negative integers**

Subset with a vector of negative integers to exclude elements. The following code removes the first and third elements of `v`

.

**Names**

If a vector has names, you can subset with a character vector.

```
v_named <- c(guava = 2, pineapple = 4, dragonfruit = 1)
v_named[c("guava", "dragonfruit")]
#> guava dragonfruit
#> 2 1
```

**Logical vectors**

If you supply a vector of `TRUE`

s and `FALSE`

s, `[`

will select the elements that correspond to the `TRUE`

s.

The following extracts just the first and third elements.

You’ll rarely subset by typing out `TRUE`

s and `FALSE`

s. Instead, you’ll typically create a logical vector with a function or condition.

For example, the following code selects just the elements of `v`

greater than 2.

`v > 2`

results in a logical vector the same length as `v`

.

`[`

then uses this logical vector to subset `v`

, resulting in just the elements of `v`

greater than 2.

`v_missing`

has `NA`

s.

We can pass `!is.na(v_missing)`

into `[`

to extract out just the non-`NA`

elements.

**Select single values with [[**

Unlike `[`

, `[[`

can only extract single elements.

You’ll get an error if you try to use `[[`

to select more than one element.

Use `[[`

instead of `[`

if you want to make it clear that your code only selects one item. As you’ll see in the Lists section, the distinction between `[[`

and `[`

is more important for lists than for atomic vectors.

### 2.1.3 Applying functions

Vectors are central to programming in R, and so many R functions are designed to work with vectors of any length.

You already saw how to call `typeof()`

to return the type of a vector.

`sum()`

sums a vector’s elements.

You can use `sum()`

with both numeric (i.e., double and integer) vectors, as well as with logical vectors.

When applied to a logical vector, `sum()`

returns the number of `TRUE`

s.

`mean()`

works similarly.

The Base R Cheat Sheet has some other basic helpful functions, particularly under the *Vector Functions* and *Maths Functions* sections.

### 2.1.4 Augmented vectors

Augmented vectors are atomic vectors with additional metadata. There are four important augmented vectors:

**factors**`<fct>`

, which are used to represent categorical variables can take one of a fixed and known set of possible values (called the levels).**ordered factors**`<ord>`

, which are like factors but where the levels have an intrinsic ordering (i.e. it’s reasonable to say that one level is “less than” or “greater than” another variable).**dates**`<dt>`

, record a date.**date-times**`<dttm>`

, which are also known as POSIXct, record a date and a time.

For now, you just need to recognize these when you encounter them. You’ll learn how to create each type of augmented vector later in the course.

## 2.2 Lists

Unlike atomic vectors, which can only contain a single type, lists can contain any collection of R objects.

### 2.2.1 Basics

The following reading will introduce you to lists.

- Recursive vectors (lists)[r4ds-20.5]

### 2.2.2 Flattening

You can flatten a list into an atomic vector with `unlist()`

.

`unlist()`

returns an atomic vector even if the original list contains other lists or vectors.

## 2.3 Tibbles

Tibbles are actually lists.

Every tibble is a list of vectors.

These vectors form the tibble columns.

Take the tibble `mpg`

.

```
mpg
#> # A tibble: 234 x 11
#> manufacturer model displ year cyl trans drv cty hwy fl class
#> <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
#> 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa…
#> 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa…
#> 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa…
#> 4 audi a4 2 2008 4 auto(av) f 21 30 p compa…
#> 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa…
#> 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa…
#> # … with 228 more rows
```

Each variable in `mpg`

(`manufacturer`

, `model`

, `displ`

, etc.) is a vector. `manufacturer`

is a character vector, `displ`

is a double vector, and so on.

### 2.3.1 Creation

There are two ways to create tibbles by hand. First, you can use `tibble()`

.

```
my_tibble <-
tibble(
x = c(1, 9, 5),
y = c(TRUE, FALSE, FALSE),
z = c("apple", "pear", "banana")
)
my_tibble
#> # A tibble: 3 x 3
#> x y z
#> <dbl> <lgl> <chr>
#> 1 1 TRUE apple
#> 2 9 FALSE pear
#> 3 5 FALSE banana
```

`tibble()`

takes individual vectors and turns them into a tibble.

Second, you can use `tribble()`

.

```
my_tibble <-
tribble(
~x, ~y, ~z,
1, TRUE, "apple",
9, FALSE, "pear",
5, FALSE, "banana"
)
my_tibble
#> # A tibble: 3 x 3
#> x y z
#> <dbl> <lgl> <chr>
#> 1 1 TRUE apple
#> 2 9 FALSE pear
#> 3 5 FALSE banana
```

Typically, it will be obvious whether it’s better to use `tibble()`

or `tribble()`

. One representation will either be much easier to type or much clearer than the other.

### 2.3.2 Variables

There are several ways to extract variables out of tibbles. Tibbles are lists, so `[[`

and `$`

still work.

Use `pull()`

, the dplyr equivalent, when you want to use a pipe.

Note that `pull()`

, like `[[`

and `$`

, will return just the vector of values for a given column,

while `select()`

returns a tibble.

### 2.3.3 Dimensions

Printing a tibble tells you the column names and overall dimensions.

```
mpg
#> # A tibble: 234 x 11
#> manufacturer model displ year cyl trans drv cty hwy fl class
#> <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
#> 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa…
#> 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa…
#> 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa…
#> 4 audi a4 2 2008 4 auto(av) f 21 30 p compa…
#> 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa…
#> 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa…
#> # … with 228 more rows
```

To access the dimensions directly, you have three options:

To get the variable names, use `names()`

: