Data @ Reed

Modifying rows and columns

The dplyr package is a good resource for modifying rows and columns of a data frame.

Rows

To add rows, you can use bind_rows to attach data to the bottom of your data set. This is especially useful when you have a list of many data frames, for example after reading in multiple files of the same format that might be separated by county or age group.

library(dplyr)
new_data <- data.frame(Sepal.Length = c(3,4,5),
                       Sepal.Width = c(5,4,3),
                       Petal.Length = c(1,2,3),
                       Petal.Width = c(3,2,1),
                       Species = c("setosa", "versicolor", "virginica"))
bind_rows(iris, new_data)

To select a subset of rows by index, you can use bracket notation or the slice command.

iris[20:30, ]
slice(iris, 20:30)

To select a subset of rows by a condition, the filter command is useful. Something we might be interested in is only looking at the biggest Versicolors, say any Versicolor with a petal length greater than 4.5.

filter(iris, Species == "versicolor", Petal.Length > 4.5)

Columns

To pick out columns by index or name, you can use bracket notation or the select command. Say we only want the species and petal size of iris, that is we don’t care about sepal size. Any of these methods will work to obtain those columns.

iris[ ,3:5]
select(iris, 3:5)
select(iris, Petal.Length, Petal.Width, Species)
select(iris, -c(1,2))

To add a column from data you already have as a vector, you can use the typical bracket notation. If we have a vector of planting dates called x that we want to add onto iris as a column, we could just assign this data to the column plant_date. That column didn’t exist before this step, but after this it will show up as part of the data set.

iris$plant_date <- x

Sometimes you’ll want to create new data from your existing data. For example, if we’re interested in the ratio of petal length to width of each flower, we’ll want to divide Petal.Length by Petal.Width down the whole dataset. We can add this with the mutate command and call this new column Petal.Ratio.

mutate(iris, Petal.Ratio = Petal.Length / Petal.Width)