Tidy data principles, reshaping data, and tidyr

Packages in the tidyverse, like ggplot2 (useful for visualizing data) and dplyr (a key tool in data wrangling), are built to work with so-called “tidy” data.

Illustrations from the Openscapes blog Tidy Data for reproducibility, efficiency, and collaboration by Julia Lowndes and Allison Horst

One of the common descriptions of tidy data is often summarized with an adaptation of the Leo Tolstoy quote “Happy families are all alike; every unhappy family is unhappy in its own way.”

Illustrations from the Openscapes blog Tidy Data for reproducibility, efficiency, and collaboration by Julia Lowndes and Allison Horst

With a consistent set of rules for formatting data, the ideas and tools used to conduct one data analysis can be more easily transferred to another.

Illustrations from the Openscapes blog Tidy Data for reproducibility, efficiency, and collaboration by Julia Lowndes and Allison Horst

The tidyr package will prove useful in data restructuring or “tidying”. All of the packages in the tidyverse, including ggplot2 and dplyr, as well as many community-contributed packages, are designed to work with data in tidy format.

You can see what tidy and untidy data look might look like in practice in the tidying example section, and learn how to turn untidy data into tidy data in the Restructuring with pivot_wider() and pivot_longer() section.