Linegraphs also show the relationship between two continuous variables, often showing progression over time. You can create a linegraph in much the same way as a scatterplot by swapping out geom_point()
for geom_line()
.
The below code is very similar to that in the Scatterplots section, with two changes: (1) time (year) is now on the x-axis and (2) a different geom
is being used to represent the same data.
ggplot(data = penguins,
mapping = aes(x = year, y = body_mass_g, color = island)) +
geom_line()
The above graph looks a bit odd; this is because there are multiple data points (body mass, y) at each time point (year, x). Linegraphs work best when your data has one y value per x value In this example, that means you would want one value of body_mass_g
for each island in each year. You can achieve this by taking the mean of body_mass_g
for each combination of island
and year
:
penguins_sum <- penguins %>%
filter(!is.na(body_mass_g)) %>%
group_by(island, year) %>%
summarize(mean_body_mass_g = mean(body_mass_g))
penguins_sum
## # A tibble: 9 × 3
## # Groups: island [3]
## island year mean_body_mass_g
## <fct> <int> <dbl>
## 1 Biscoe 2007 4741.
## 2 Biscoe 2008 4628.
## 3 Biscoe 2009 4793.
## 4 Dream 2007 3684.
## 5 Dream 2008 3779.
## 6 Dream 2009 3691.
## 7 Torgersen 2007 3763.
## 8 Torgersen 2008 3856.
## 9 Torgersen 2009 3489.
(In the above code, note the removal of missing values for body_mass_g
before calculating the mean body mass.)
Now you can make the same linegraph, using the new penguins_sum
dataset, and with mean_body_mass_g
in place of body_mass_g
ggplot(data = penguins_sum,
mapping = aes(x = year, y = mean_body_mass_g, color = island)) +
geom_line()
You may encounter problems with your linegraphs when your x-variable is categorical. This issue can be resolved by adding group = 1
into your aes()
call.