pivot_wider()
and pivot_longer()
One useful data skill is being able to move from untidy data to tidy data and back again. The functions that allow you to do this, pivot_wider()
and pivot_longer()
, exist in the package tidyr
. This package is included in tidyverse
, along with many other helpful packages. To access these tools, install and load tidyverse
:
install.packages("tidyverse")
library(tidyverse)
With the tools loaded, you can restructure the summary dataset from above. Begin with the first, tidy version:
penguins_sum
## # A tibble: 9 × 3
## # Groups: island [3]
## island year mean_body_mass_g
## <fct> <int> <dbl>
## 1 Biscoe 2007 4741.
## 2 Biscoe 2008 4628.
## 3 Biscoe 2009 4793.
## 4 Dream 2007 3684.
## 5 Dream 2008 3779.
## 6 Dream 2009 3691.
## 7 Torgersen 2007 3763.
## 8 Torgersen 2008 3856.
## 9 Torgersen 2009 3489.
To make the untidy version, “pivot” this data from long to wide format using the pivot_wider()
function from tidyr
:
penguins_wide <- penguins_sum %>%
pivot_wider(id_cols = c("island", "year"),
names_from = year,
values_from = mean_body_mass_g)
penguins_wide
## # A tibble: 3 × 4
## # Groups: island [3]
## island `2007` `2008` `2009`
## <fct> <dbl> <dbl> <dbl>
## 1 Biscoe 4741. 4628. 4793.
## 2 Dream 3684. 3779. 3691.
## 3 Torgersen 3763. 3856. 3489.
Looking more closely at pivot_wider()
id_cols
(read: ID columns) are the variables that, together, identify what makes a row unique in the original datanames_from
determines what variable from the old data will determine the names of columns in the new datavalues_from
is the name of the variable in the old data that will be contained in cells in the new dataGiven this untidy table, you can tidy the data by pivoting from “wide” to “long” using pivot_longer()
.
penguins_wide %>%
pivot_longer(cols = c("2007", "2008", "2009"),
names_to = "year",
values_to = "mean_body_mass_g")
## # A tibble: 9 × 3
## # Groups: island [3]
## island year mean_body_mass_g
## <fct> <chr> <dbl>
## 1 Biscoe 2007 4741.
## 2 Biscoe 2008 4628.
## 3 Biscoe 2009 4793.
## 4 Dream 2007 3684.
## 5 Dream 2008 3779.
## 6 Dream 2009 3691.
## 7 Torgersen 2007 3763.
## 8 Torgersen 2008 3856.
## 9 Torgersen 2009 3489.
As you might have noticed, pivot_longer()
and pivot_wider()
are inverse operations. Pivoting a widened dataset to a longer format gives you back the original dataset, and vice versa.
If you are interested in learning more about tidy data and pivoting, see the Tidy Data chapter in R for Data Science.