ggplot2
ggplot2
is an R package for data visualization that makes appealing and effective graphs using a standardized syntax.
To run these examples on your own, load palmerpenguins
, a package that contains a dataset about penguin populations on islands in Antarctica. You also need to load the ggplot2
package, which is contained within the tidyverse
package. (You used other tidyverse
libraries in the Transforming data with dplyr
section.) You can also load ggplot2
on its own, using library(ggplot2)
.
To learn more about packages and the palmerpenguins
package, see our Meet the Palmer Penguins page.
library(palmerpenguins)
library(tidyverse)
Before graphing using ggplot2
, the data needs to be “tidy”. (Read more about tidy data on our tidy data page or from the tidyverse team. For this example, the palmerpenguins
data is already tidy.
There are three main pieces that make up a ggplot:
Data: You have to specify the data that you want ggplot()
to plot using the following syntax:
data = ___
Aesthetics: In order for ggplot()
to know which variables you want to use you have to specify what they are by using some combination of aesthetics, such as:
aes(x = ___ , y = ___ , color = ___ , fill = ___ )
Anything that you put inside of aes()
should be a variable. The aesthetics that you use will vary with graph type. (For example, a histogram shows the distribution of one variable and will include x = ___
, and not all of the aesthetics listed above; a bar plot may use fill =
, but a scatterplot would use color =
.)
Geometry: You can think of the geom
as specifying what shape your data will take, appearing in ggplot2
code as variations on:
geom_ ___()
The ggplot2
package includes a number of standard geometries. Some common geoms
include geom_point()
for a scatterplot, geom_histogram()
for a histogram, and geom_boxplot()
for a boxplot.
You can use these three pieces to create a template for making graphs with ggplot()
:
ggplot(data = ___ ,
mapping = aes(x = ___ , y = ___ , color = ___ , fill = ___ ) +
geom_ ___()
Note that in ggplot2
, layers are added to the graph (using +
) rather than the pipe %>%
. When debugging ggplot2
code, make sure you have added layers rather than piped together components.
The next several sections provide examples of these commonly-used data visualizations: