Reordering Categories in a Plot

When building plots with categorical data, ggplot will default to ordering categories alphabetically; this may not make sense for your data. (Example: a barplot showing days of the week on the x-axis; alphabetical order is not the correct ordering/sequencing.) Within the forcats package, there is a family of functions that helps with these types of problems.

Reordering by another variable

Barplots

Make a plot comparing how many penguins of each species are in the palmerpenguins dataset. Notice that the bars are ordered alphabetically, when perhaps you would rather have them ordered by n, the number of penguins of each species.

penguins %>% 
  count(species) %>% 
  ggplot(aes(x = species, y = n)) +
  geom_col()

If you wish to order a categorical variable (species) by some quantitative variable (n) fct_reorder() is a useful option. Generally the syntax is as follows:

fct_reorder(.f = categorical_variable_to_be_ordered, .x = quantitative_variable_to_order_other_variable_by)

To make a change to the “factor levels” of the data, use this function inside of mutate:

penguins %>% 
  count(species) %>% 
  mutate(species_ordered = fct_reorder(.f = species, .x = n)) %>%
  ggplot(aes(x = species_ordered, y = n)) +
  geom_col()

Note the change in the order of the bars in the graph produced by the code below. To order the bars in descending rather than ascending (numeric) order, set the .desc argument to TRUE in fct_reorder():

penguins %>% 
  count(species) %>% 
  mutate(species_ordered = fct_reorder(.f = species, .x = n, .desc = TRUE)) %>%
  ggplot(aes(x = species_ordered, y = n)) +
  geom_col()

Boxplots

Boxplots can be trickier because there may not be some explicit variable in the dataset that fct_reorder can use as a reference. You can use fct_reorder to order boxplots by some function, which is generally a summary function that can be applied to your y-variable. (A few examples include max, min, median, and mean.)

For example, start with a plot featuring multiple boxplots (one for each species), where the boxplots are ordered alphabetically:

penguins %>% 
  ggplot(aes(x = species, y = bill_length_mm)) +
  geom_boxplot()

For contrast, create a plot with multiple boxplots ordered by the median value of bill_length_mm.

The below code uses the argument .fun to reorder by median. (Note that the function is written as median and not as median().) You may notice the specification na.rm = T; this removes missing data from the calculation of median. If there are missing values in your data, it may cause the reordering to not function properly.

penguins %>% 
  mutate(species = fct_reorder(.f = species, .x = bill_length_mm, .fun = median, na.rm = T)) %>% 
  ggplot(aes(x = species, y = bill_length_mm)) +
  geom_boxplot()

Similar to the earlier example, you can put groups in descending order using .desc:

penguins %>% 
  mutate(species = fct_reorder(.f = species, .x = bill_length_mm, .fun = median, na.rm = T, .desc = T)) %>% 
  ggplot(aes(x = species, y = bill_length_mm)) +
  geom_boxplot()

Reordering manually

Barplots

For a different example, create a new demo dataset called days_of_week, showing the hours worked on each day:

days_of_week <- tibble(
  days = c('Fri', 'Sat', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu'),
  hours_worked = seq(20,50, 5)
)

days_of_week
## # A tibble: 7 x 2
##   days  hours_worked
##   <chr>        <dbl>
## 1 Fri             20
## 2 Sat             25
## 3 Sun             30
## 4 Mon             35
## 5 Tue             40
## 6 Wed             45
## 7 Thu             50

Plotting this data using a barplot yields:

days_of_week %>% 
  ggplot(aes(x = days, y = hours_worked)) +
  geom_col()

The columns are automatically sorted alphabetically, but days of the week are ordinal data, meaning that the order of the values is important. Here fct_reorder is not the correct tool because order is not defined by another variable, but by external information. (In this case, knowledge of a calendar.)

The forcats function fct_relevel will help here. The syntax is generally:

fct_relevel(.f = factor_that_needs_reordering, levels = levels_to_order_by)

The below code creates a vector with days (levels) in the desired order, and then feeds that vector into the fct_relevel call.

day_order <- c('Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat')

days_of_week %>% 
  mutate(days = fct_relevel(.f = days, levels = day_order)) %>% 
  ggplot(aes(x = days, y = hours_worked)) +
  geom_col()

After reordering factors, the bar order in the bargraph has changed.

Boxplots

To see how this works with Boxplots, make some changes to the demo dataset:

days_of_week_new <- tibble(
  days = rep(c('Fri', 'Sat', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu'), 10),
  hours_worked = rnorm(70, 25, 7)
)

Plotting once again yields a x-axis ordered alphabetically, which is not what we want.

days_of_week_new %>% 
  ggplot(aes(x = days, y = hours_worked)) +
  geom_boxplot()

Use the same vector describing the levels inside of fct_relevel to get:

days_of_week_new %>% 
  mutate(days = fct_relevel(.f = days, levels = day_order )) %>% 
  ggplot(aes(x = days, y = hours_worked)) +
  geom_boxplot()