As what we have mentioned in the previous chapter, R sorts levels of factors in alphabetical order by default. In this chapter we will talk about working with factors using forcats package, which can be helpful when you managing categorical variables.
Don’t directly assign levels with
levels()<-. Instead, using
##  Physics Math Chemistry ## Levels: Chemistry Physics Math
For the binned, ordinal data with levels out of order,
fct_relevel() can be used to set a correct order.
library(tibble) library(ggplot2) Births2015 <- tibble(MotherAge = c("15-19 years", "20-24 years", "25-29 years", "30-34 years", "35-39 years", "40-44 years", "45-49 years", "50 years and over", "Under 15 years"), Num = c(229.715, 850.509, 1152.311, 1094.693, 527.996, 111.848, 8.171, .754, 2.5)) ggplot(Births2015, aes(fct_relevel(MotherAge, "Under 15 years"), Num)) + geom_col() + coord_flip() + scale_y_continuous(breaks = seq(0, 1250, 250)) + ggtitle("United States Births, 2015", subtitle = "in thousands") + theme_grey(16) + labs(y = "mother age", x = "count")
The following examples give three circumstances when using
fct_relevel()to move levels to the beginning:
##  A B C move1 D E move2 F ## Levels: move1 move2 A B C D E F
fct_relevel()to move levels after an item (by position):
##  A B C move1 D E move2 F ## Levels: A B C D move1 move2 E F
fct_relevel()to move levels to the end
##  A B C move1 D E move2 F ## Levels: A B C D E F move1 move2
If the row order is correct, use
Usually, unbinned, nominal data should be sorted by frequency order, which can be achieved using
fct_infreq() (default is decreasing order of frequency)
For binned, nominal data which should be sorted by frequency order, use
fct_reorder(). In the following example count is used, generally you can also apply mean,median, etc. to
.fun inside `fct_reorder()``.
For prominent NA bars which should not be eliminated, use
fct_explicit_na(x). And using
fct_rev(x) to reverse the factor level doesn’t help.
library(dplyr) df <- data.frame(temperature = factor(c("cold", "warm", "hot", NA)), count = c(15, 5, 22, 12)) df %>% mutate(temperature = fct_explicit_na(temperature, "NA") %>% fct_relevel("NA", "hot", "warm", "cold")) %>% ggplot(aes(x = temperature, y = count)) + geom_col() + coord_flip() + theme_grey(16) + labs(x = "temperature")
For analyzing categorical variables, the first step is always to decide whether the class is ordinal or nominal.
fct_recode(x, …) – change names of levels
fct_inorder(x) – set level order of x to row order
fct_relevel(x, …) – manually set the order of levels of x
fct_reorder(x, y) – reorder x by y
fct_infreq(x) – order the levels of x by decreasing frequency
fct_rev(x) – reverse the order of factor levels of x
fct_explicit_na(x) – turn NAs into a real factor level
Sometimes you want to transfer a continuous variable to a categorical variable. For example, you might want assign grades to final scores of a course. In the following example, we generated a data set of test scores randomly and we assign grades based on some thresholds. We then apply function
cut. (You can similarly use
## testscore grade ## 1 94 A ## 2 89 B ## 3 74 C ## 4 86 B ## 5 76 C ## 6 89 B