restructuring_pivot.knit

Restructuring with `pivot_wider()` and `pivot_longer()`

One useful data skill is being able to move from untidy data to tidy data and back again. The functions that allow you to do this, pivot_wider() and pivot_longer(), exist in the package tidyr. This package is included in tidyverse, along with many other helpful packages. To access these tools, install and load tidyverse:

install.packages("tidyverse")
library(tidyverse)

With the tools loaded, you can restructure the summary dataset from above. Begin with the first, tidy version:

penguins_sum

## # A tibble: 9 × 3
## # Groups:   island [3]
##   island     year mean_body_mass_g
##   <fct>     <int>            <dbl>
## 1 Biscoe     2007            4741.
## 2 Biscoe     2008            4628.
## 3 Biscoe     2009            4793.
## 4 Dream      2007            3684.
## 5 Dream      2008            3779.
## 6 Dream      2009            3691.
## 7 Torgersen  2007            3763.
## 8 Torgersen  2008            3856.
## 9 Torgersen  2009            3489.

To make the untidy version, “pivot” this data from long to wide format using the pivot_wider() function from tidyr:

penguins_wide <- penguins_sum %>%
  pivot_wider(id_cols = c("island", "year"), 
              names_from = year, 
              values_from = mean_body_mass_g)

penguins_wide

## # A tibble: 3 × 4
## # Groups:   island [3]
##   island    `2007` `2008` `2009`
##   <fct>      <dbl>  <dbl>  <dbl>
## 1 Biscoe     4741.  4628.  4793.
## 2 Dream      3684.  3779.  3691.
## 3 Torgersen  3763.  3856.  3489.

Looking more closely at pivot_wider()

id_cols (read: ID columns) are the variables that, together, identify what makes a row unique in the original data
names_from determines what variable from the old data will determine the names of columns in the new data
values_from is the name of the variable in the old data that will be contained in cells in the new data

Given this untidy table, you can tidy the data by pivoting from “wide” to “long” using pivot_longer().

penguins_wide %>%
  pivot_longer(cols = c("2007", "2008", "2009"), 
               names_to = "year",
               values_to = "mean_body_mass_g")

## # A tibble: 9 × 3
## # Groups:   island [3]
##   island    year  mean_body_mass_g
##   <fct>     <chr>            <dbl>
## 1 Biscoe    2007             4741.
## 2 Biscoe    2008             4628.
## 3 Biscoe    2009             4793.
## 4 Dream     2007             3684.
## 5 Dream     2008             3779.
## 6 Dream     2009             3691.
## 7 Torgersen 2007             3763.
## 8 Torgersen 2008             3856.
## 9 Torgersen 2009             3489.

As you might have noticed, pivot_longer() and pivot_wider() are inverse operations. Pivoting a widened dataset to a longer format gives you back the original dataset, and vice versa.

If you are interested in learning more about tidy data and pivoting, see the Tidy Data chapter in R for Data Science.

Restructuring with pivot_wider() and pivot_longer()

Restructuring with `pivot_wider()` and `pivot_longer()`