Advanced data wrangling

The Wrangling Data section of these tutorials have introduced all sorts of ways to transform, manipulate, and reshape data. These tools will be sufficient to handle many data analysis projects, but if you are looking to learn more, or do something beyond the scope of those tools, below is a compilation of links to some additional resources.

Doing things many times with purrr

In R, there are many ways of doing things repeatedly; you may have heard of for loops, the apply family, or mapping functions. The “tidy approach” to running a function many times, or “iterating,” is implemented in the purrr package via map functions. Dr. Rebecca Barter wrote a wonderful introduction to purrr that we recommend if you want to learn about vectors, lists, data frames, and iterating over them. To learn more about iteration generally, check out the R for Data Science chapter on the topic.

Writing your own functions

As you do more data analysis, you may find yourself finding uses for the same code time and time again. Rather than copying and pasting each time you need that code, you could automate that task by writing your own function. The R for Data Science book outlines three advantages of writing a custom function over copying and pasting:

  1. You can give a function an evocative name that makes your code easier to understand.

  2. As requirements change, you only need to update code in one place, instead of many.

  3. You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another).

We recommend the R4DS Chapter if you are interested in learning more about writing your own functions!

Working with time and date data with lubridate

If you have worked with time and/or date data, you may have found it trickier than expected. As the relevant R for Data Science chapter notes about dates and times: “You use them all the time in your regular life, and they don’t seem to cause much confusion. However, the more you learn about dates and times, the more complicated they seem to get.” We recommend the chapter linked above if you need to work with time and/or data and don’t know where to start.

Working with categories using forcats

In R, data can be classified in a number of ways: logical (true/false), numeric (numbers), or character / string (words). Another data type is a “factor”, or categorical data. When visualizing or presenting categorical data, the order of those categories is often important.

By default, R will order characters alphabetically; if you are graphing categorical data, this ordering may not make sense. (Example: we generally consider days of the week in a non-alphabetical order.) You can reorder categories in your data using forcats. Our brief tutorial on how to refactor levels within plots can be found here

You can learn more about factors, how to make your variables into factors, and how to change the order of factors in the R for Data Science chapter.