mutate()
Another common data wrangling task is to create a new variable, using function mutate()
. When creating a new variable, you provide a name for the new column and a method for calculating the new value.
Continuing with the penguins data from palmerpenguins
, the code below creates a new column for the mean body mass in kilograms:
penguins %>%
mutate(body_mass_kg = body_mass_g / 1000)
## # A tibble: 344 × 9
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## 7 Adelie Torgersen 38.9 17.8 181 3625
## 8 Adelie Torgersen 39.2 19.6 195 4675
## 9 Adelie Torgersen 34.1 18.1 193 3475
## 10 Adelie Torgersen 42 20.2 190 4250
## # … with 334 more rows, and 3 more variables: sex <fct>, year <int>,
## # body_mass_kg <dbl>
The syntax for mutating a column follows the pattern of mutate(new_column_name = expression)
, where expression
is some sort of instruction for combining values in existing columns. In the above example, new_column_name
is body_mass_kg
, and expression
is body_mass_g / 1000
.
Perhaps you realized that all flipper measurements were 4 mm short of the true length; you could use mutate()
to adjust the data:
penguins %>%
mutate(flipper_length_mm = flipper_length_mm + 4)
## # A tibble: 344 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <dbl> <int>
## 1 Adelie Torgersen 39.1 18.7 185 3750
## 2 Adelie Torgersen 39.5 17.4 190 3800
## 3 Adelie Torgersen 40.3 18 199 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 197 3450
## 6 Adelie Torgersen 39.3 20.6 194 3650
## 7 Adelie Torgersen 38.9 17.8 185 3625
## 8 Adelie Torgersen 39.2 19.6 199 4675
## 9 Adelie Torgersen 34.1 18.1 197 3475
## 10 Adelie Torgersen 42 20.2 194 4250
## # … with 334 more rows, and 2 more variables: sex <fct>, year <int>
You can also combine mutate()
with other functions. The below code calculates total body mass of all penguins on each island.
penguins %>%
group_by(island) %>%
mutate(island_penguin_mass = sum(body_mass_g, na.rm = T))
## # A tibble: 344 × 9
## # Groups: island [3]
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## 7 Adelie Torgersen 38.9 17.8 181 3625
## 8 Adelie Torgersen 39.2 19.6 195 4675
## 9 Adelie Torgersen 34.1 18.1 193 3475
## 10 Adelie Torgersen 42 20.2 190 4250
## # … with 334 more rows, and 3 more variables: sex <fct>, year <int>,
## # island_penguin_mass <int>
It may be useful to give R rules for creating new variables. For example, the below code divides all penguins into flipper length categories, based on the mean flipper length of the dataset (201mm), using the case_when()
function. You can think of case_when
as being a multilevel if
statement. Essentially, the case_when()
function in the code below is saying "for each observation (row), when the variable flipper_length_mm
meets a certain condition (greater than, equal to, or less than 201mm), the new column should contain the respective category: "long"
, "average"
, or "short"
.
penguins %>%
mutate(flipper_category =
case_when( flipper_length_mm > 201 ~ "long",
flipper_length_mm == 201 ~ "average",
flipper_length_mm < 201 ~ "short"))
## # A tibble: 344 × 9
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## 7 Adelie Torgersen 38.9 17.8 181 3625
## 8 Adelie Torgersen 39.2 19.6 195 4675
## 9 Adelie Torgersen 34.1 18.1 193 3475
## 10 Adelie Torgersen 42 20.2 190 4250
## # … with 334 more rows, and 3 more variables: sex <fct>, year <int>,
## # flipper_category <chr>