Reordering geom_bar and geom_col by Count or Value

One of the things I’m always looking up with ggplot2 is how to reorder the bars in my bar charts by their length (i.e. the count/frequency or value, depending on whether you’re using geom_bar or geom_col). If you do a Google search, there are multiple different solutions, but I will document in this post what I’ve found to be the cleanest and simplest solution.¹

Reordering geom_bar by count

By default, the bars are arranged by the order (levels) of the factor variable.² For example, below I plot the count (i.e. frequency) of each of the 4 groups (“A”, “B”, “C”, and “D”) and they are arranged in alphabetical order along the x-axis.

suppressPackageStartupMessages({
  library(ggplot2)
  library(dplyr)
})
# Create example dataset
df = data.frame(group = c(rep("A", 4), rep("B", 2), rep("C", 1), rep("D", 3), 
                          rep("E", 5), rep("F", 1), rep("G", 1), rep("H", 4)),
                value = sample(1:21))
# Default order
df %>%
  ggplot(aes(x = group)) + 
  geom_bar()

To reorder the bars by their length, I will use the forcats package. For geom_bar, the length of the bar reflects the frequency of the group, so we need to reorder the levels of our factor variable by their frequency. This can be done using the fct_infreq function.

library(forcats)
# Highest frequency to lowest frequency
df %>%
  ggplot(aes(x = fct_infreq(group))) +
  geom_bar() +
  labs(x = "group")

To reverse the order, use fct_rev.

# Lowest frequency to highest frequency
df %>%
  ggplot(aes(x = fct_rev(fct_infreq(group)))) +
  geom_bar() +
  labs(x = "group")

Reordering geom_col by value

For geom_col, the lengths of the bars correspond to values for each group, so we need to reorder the factor levels by that value. Using the same example dataset, let’s say we calculate the mean value for every group and plot the corresponding bar chart with geom_col. To order by the length of the bar, we can use the fct_reorder function from the forcats package. For fct_reorder, we need to tell the function our factor variable (“group”) and the values we want to reorder it by (the column corresponding to the y-axis, i.e. “value”).

library(forcats)
df_means = df %>%
  group_by(group) %>%
  summarize(value = mean(value))
df_means %>%
  ggplot(aes(x = fct_reorder(group, value), y = value)) +
  geom_col() +
  labs(x = "group")

If we want to reverse the order to be descending, we can again use the fct_rev function.

df_means %>%
  ggplot(aes(x = fct_rev(fct_reorder(group, value)), y = value)) +
  geom_col() +
  labs(x = "group")

Bonus Tips with the forcats Package

Manually set levels of a factor

Instead of using the fct_infreq or fct_reorder functions to set the order of the levels according to the frequency or value, you can also manually determine the order of factor levels yourself. I can’t say that this is something I’ve had to do often, but I will document it here just in case it is helpful.

In the forcats package, the function to manually reorder the levels is fct_relevel. One neat thing is that you don’t have to specify all the levels in your reordering, but can move one or more levels individually to a specified position, as shown in the examples below.

df %>%
  ggplot(aes(x = fct_relevel(group, "D"))) + # Move "D" group to the front
  geom_bar() +
  labs(x = "group")

df %>%
  ggplot(aes(x = fct_relevel(group, c("A", "D"), after = 4))) + # Move "A" and "D" groups to fifth place
  geom_bar() +
  labs(x = "group")

Lump Infrequent Categories Together

The forcats package also offers useful functions to lump infrequent categories (levels) in your factor variable together. For example, using the fct_lump_n function, we can keep only the n most frequent categories and lump the rest of them into a category named “Other.”

Similar functions also exist to lump the categories under different criteria (e.g. fct_lump_min, fct_lump_prop, fct_lump_lowfreq).

df %>%
  ggplot(aes(x = fct_lump_n(fct_infreq(group), n = 5))) + # Only keep the 5 most frequent categories and lump the rest into "Other"
  geom_bar() +
  labs(x = "group")

# Highlight "Other" category
df %>%
  mutate(group = fct_lump_n(fct_infreq(group), n = 5), # Only keep the 5 most frequent categories and lump the rest into "Other"
         highlight = fct_other(group, keep = "Other", other_level = "Top N Groups")) %>% # Create a factor variable with two levels to use as fill/highlight
  ggplot(aes(x = group, fill = highlight)) + 
  geom_bar()

For more about the forcats package, check out their documentation here.

Credit goes to Clause Wilke for sharing his slides, which is primarily what I’ve based this post on.↩︎
Technically, the group variable I use in my example dataset is of type “character,” but ggplot2 appears to automatically convert it to a factor variable with levels following an alphabetical order.↩︎