Data @ Reed

Useful R Packages

Packages are bundles of specialized code that you can add in to go beyond the basic R functions. When you install a package you are adding in extra coding options that can help you analyze or visualize your data more easily. Anyone can write packages and they can be general or very specific, so depending on your task, you may find someone has written a package to make it easier for you.

The terms "package" and "library" are used interchangeably. When using R you will run install.packages() when you need to add a package for the first time, then you run library() to load the package. install.packages() is like buying a book—you only need to do it once—and library() is like getting it off the shelf—you need to do it everytime you want to use that book. 

 

The one package to rule them all

The first step in all of your scripts will likely be this line of code:

library(tidyverse)

{tidyverse} is a meta-package that will load many other packages within a single step. When you run the line above, it will load in the following packages for you automatically:

  • ggplot2,
  • dplyr
  • tidyr
  • readr
  • purrr
  • tibble
  • stringr
  • forcats

Below are categories that contain other useful packages. If you are trying to load in data from an online database (ex: US Census) be sure to check out the Direct Data Access libraries. There may be a library that will load your data in for you without the need for you to download it from the website. 

Loading Data

Package Name

What It Does

Learn More

readr for loading in .csv, .txt, and more file types readr documentation
readxl for loading .xlsx file types and other Excel extensions readxl documentation
haven for loading Stata, SPS, and SPSS files haven documentation
jsonlite for importing JSON objects and converting to R data types jsonlite documentation
googlesheets4 for loading data from a Google Drive account googlesheets4 documentation
rvest for web-scraping  rvest documentation
duckdb for loading more data than R likes to load; if you have a huge dataset, use this package duckdb documentation

Formatting Data

Package Name

What It Does

Learn More

dplyr contains the most commonly used tools for data manipulation dplyr documentation
tidyr tools for pivoting tables from wide to long format and vice versa tidyr documentation
janitor for cleaning up and standardizing data names janitor documentation
stringr helpful functions for manipulating strings stringr documentation
scales for overriding default settings for significant digits, plot axes, and more scales documentation
lubridate a must have package for formatting any data that is a date or time lubridate documentation
data.table good functions for speeding up analysis when you have large data sets data.table documentation
broom for making your data more tidyverse friendly  broom documentation
purrr tools for working with functions and vectors, helpful for converting from lists of lists to data frames purrr documentation

Creating Nice Plots & Tables

Package Name

What It Does

Learn More

ggplot2 the best package for making your graphs look nice ggplot2 documentation
gt stands for "great tables" and follow through on its promise gt documentation
gtsummary works with gt to display publication-ready summary of regressions and more gtsummary documentation
viridis has pretty color palettes viridis documentation
RColorBrewer has pretty color palettes RColorBrewer documentation
ggpubr customization for ggplot2 that helps make publication-ready documents ggpubr documentation
patchwork works well with ggplot2 to help align multiple plots or tables in one figure or page patchwork documentation
gridExtra helps align multiple plots or tables in one figure or page gridExtra documentation
wesanderson has color palettes that correspond to each Wes Anderson movie wesanderson documentation
plotly for making your graphs interactive, works well with the shiny package plotly documentation

Useful Stats Packages

Package Name

What It Does

Learn More

stats the main source for statistical functions beyond base R stats documentation
lme4 for linear regression with mixed-effects models lme4 documentation
lmerTest statistical tests for analyzing linear mixed-effect models lmerTest documentation
MASS for regression analysis of non-linear models MASS documentation
Hmisc a lot of miscellaneous additional functions for statistical analyis Hmisc documentation
FactoMineR for multivariate exploratory data analysis FactoMineR documentation
outliers many specific tests for detecting outliers outliers documentation
vegan for ordination analyses and diversity stats, particularly good for ecology vegan documentation
car extra tools for regression analysis car documentation
cluster tools for performing cluster analysis cluster documentation
forcats tools for working with categorical variables forcats documentation

Direct Data Access

Package Name

Database Accessed

Learn More

tidycensus US Census tidycensus documentation
rnoaa* National Oceanic and Atmospheric Administration

rnoaa documentation 

COVID19 daily updates on Covid data COVID19 documentation
wbstats World Bank data wbstats documentation
tidyquant Stock market data fredr documentation
crimedata Crime Open Database crimedata documentation
eurostat Eurostat Open Data eurostat documentation
WDI World Bank and World Development Indicators WDI documentation
imf.data International Monetary Fund imf.data documentation
fredr Federal Reserve of Economic Data fredr documentation
googleanalyticsR Google Analytics googleanalyticsR documentation

* They're working on a replacement, but it is still usable.