Linegraphs

Linegraphs also show the relationship between two continuous variables, often showing progression over time. You can create a linegraph in much the same way as a scatterplot by swapping out geom_point() for geom_line().

The below code is very similar to that in the Scatterplots section, with two changes: (1) time (year) is now on the x-axis and (2) a different geom is being used to represent the same data.

ggplot(data = penguins, 
       mapping = aes(x = year, y = body_mass_g, color = island)) +
  geom_line()

The above graph looks a bit odd; this is because there are multiple data points (body mass, y) at each time point (year, x). Linegraphs work best when your data has one y value per x value In this example, that means you would want one value of body_mass_g for each island in each year. You can achieve this by taking the mean of body_mass_g for each combination of island and year:

penguins_sum <- penguins %>%
  filter(!is.na(body_mass_g)) %>%
  group_by(island, year) %>%
  summarize(mean_body_mass_g = mean(body_mass_g))

penguins_sum
## # A tibble: 9 × 3
## # Groups:   island [3]
##   island     year mean_body_mass_g
##   <fct>     <int>            <dbl>
## 1 Biscoe     2007            4741.
## 2 Biscoe     2008            4628.
## 3 Biscoe     2009            4793.
## 4 Dream      2007            3684.
## 5 Dream      2008            3779.
## 6 Dream      2009            3691.
## 7 Torgersen  2007            3763.
## 8 Torgersen  2008            3856.
## 9 Torgersen  2009            3489.

(In the above code, note the removal of missing values for body_mass_g before calculating the mean body mass.)

Now you can make the same linegraph, using the new penguins_sum dataset, and with mean_body_mass_g in place of body_mass_g

ggplot(data = penguins_sum, 
       mapping = aes(x = year, y = mean_body_mass_g, color = island)) +
  geom_line()

You may encounter problems with your linegraphs when your x-variable is categorical. This issue can be resolved by adding group = 1 into your aes() call.