Data types and accessing data points
Vectors
A vector is just a sequence of same-type objects (character strings, integers, logicals, etc.) stored in R. For example, the letters
vector built into R is a list of the lowercase letters:
c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s",
"t", "u", "v", "w", "x", "y", "z")
Vectors can also have names attached to each element, like the following list of animals where the name of each element is the first letter of the animal:
animals <- c("M" = "Mountain beaver","C" = "Cow","G" = "Grey wolf","H" = "Horse")
You can access a subset of a vector using numerical or named indices in brackets. For example, to access the Horse element of the animals vector, the code could be either of these two methods:
animals[4]
animals["H"]
To get both the cow and grey wolf, these methods would all work.
animals[c(2,3)]
animals[2:3]
animals[c("C", "G")]
Important: indices in R start with 1 (some other languages use 0, like Python), so to get the 1st item in a vector you have to use vector[1]
instead of vector[0]
.
Lists
A list is the most flexible type of data object that R has. The objects inside don’t have to be of the same type, and you can nest lists within each other.
z <- list(list("a", 2), c(3, 4))
In this list z
, the first element is a list containing the letter “a” and the number 2, and the second element is the numeric vector c(3,4)
. Much like vectors, a list can have named elements.
There are two main methods for extracting elements from a list, single- and double-bracket. A single-bracket subset will always return a list. For example, running z[2]
will return a list containing one object, the vector c(2,3)
. A double-bracket subset will go one level deeper into the structure, so z[[2]]
would return just the vector c(2,3)
.
Data frames
A data frame can be thought of as a 2-dimensional matrix of data, where each column has its own consistent data type (character, numeric, logical) all the way down. Technically, a data frame is just a list of vectors that are all the same length. To access the third column (Petal.Length
) of the built-in dataset iris
as a vector, any of these methods will work:
iris[[3]]
iris[["Petal.Length"]]
iris[ ,3]
iris$Petal.Length
This last method will probably be most useful, because it’s shorter to type.
To access a specific cell of a data frame, the bracket notation data[i,j]
is helpful. This will access the cell in the -th row and the -th column. If you leave blank, it will return the entire -th column (like the third example above); vice versa if you leave blank it will return the entire -th row.