Data @ Reed

Data types and accessing data points

Vectors

A vector is just a sequence of same-type objects (character strings, integers, logicals, etc.) stored in R. For example, the letters vector built into R is a list of the lowercase letters:

c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", 
"t", "u", "v", "w", "x", "y", "z")

Vectors can also have names attached to each element, like the following list of animals where the name of each element is the first letter of the animal:

animals <- c("M" = "Mountain beaver","C" = "Cow","G" = "Grey wolf","H" = "Horse")

You can access a subset of a vector using numerical or named indices in brackets. For example, to access the Horse element of the animals vector, the code could be either of these two methods:

animals[4]
animals["H"]

To get both the cow and grey wolf, these methods would all work.

animals[c(2,3)]
animals[2:3]
animals[c("C", "G")]

Important: indices in R start with 1 (some other languages use 0, like Python), so to get the 1st item in a vector you have to use vector[1] instead of vector[0].

Lists

A list is the most flexible type of data object that R has. The objects inside don’t have to be of the same type, and you can nest lists within each other.

z <- list(list("a", 2), c(3, 4))

In this list z, the first element is a list containing the letter “a” and the number 2, and the second element is the numeric vector c(3,4). Much like vectors, a list can have named elements.

There are two main methods for extracting elements from a list, single- and double-bracket. A single-bracket subset will always return a list. For example, running z[2] will return a list containing one object, the vector c(2,3). A double-bracket subset will go one level deeper into the structure, so z[[2]] would return just the vector c(2,3).

Further reading on list structure.

Data frames

A data frame can be thought of as a 2-dimensional matrix of data, where each column has its own consistent data type (character, numeric, logical) all the way down. Technically, a data frame is just a list of vectors that are all the same length. To access the third column (Petal.Length) of the built-in dataset iris as a vector, any of these methods will work:

iris[[3]]
iris[["Petal.Length"]]
iris[ ,3]
iris$Petal.Length

This last method will probably be most useful, because it’s shorter to type.

To access a specific cell of a data frame, the bracket notation data[i,j] is helpful. This will access the cell in the -th row and the -th column. If you leave  blank, it will return the entire -th column (like the third example above); vice versa if you leave  blank it will return the entire -th row.

Further reading on data frame structure.