Deciphering error messages in R, especially the first time you see them, can be perplexing and time-consuming. Here our team offers some tips to hopefully make your debugging experiences a bit less painful.
First, we recommend looking for common, small errors.
Start by checking your code for:
data
and Data
are distinct.)%>%
, used to connect code: Do you have the connections you need?ggplot()
code: Do you have the correcte number of plus signs +
? (used to connect plotting code)Sometimes, your R session can get a bit clogged up; in that case, saving your work and restarting RStudio may fix the problem. (Yes, sometimes “try turning it off and on again” is the solution)
If you have checked the above list and you are still encountering an error, or your code isn’t functioning properly, consider the below four approaches for further troubleshooting.
Reading the details behind a function can help with troubleshooting. While you can also look up documentation on the internet, most R packages and functions have help files that can be accessed through the R Console.
Once you have loaded a package into your environment, you can type ?function_name
into the console and hit enter/return. This will load the help file for that function in the RStudio Help panel, containing several useful pieces of information:
For example, running the following code will bring up the help file for the mean()
function:
?mean
A fairly common problem with using the mean()
function is to run something like this:
mean(1, 2, 3)
## [1] 1
The above code will not produce an error, but it will return “1” which we know is not the mean of 1, 2, and 3. The help file specifies that the values used to find the mean of should all be in the first argument (before any commas). (In the “mean” help documentation, you can see this in both the “Usage” and the “Examples” section). The correct code makes use of the c()
function to concatenate 1, 2, and 3 into one argument:
mean(c(1, 2, 3))
## [1] 2
The above code works properly, returning a mean value of 2.
A fairly common error message to get is something along the lines of"
“argument is not ____” or " non- ____ argument to a ___ function" or “NAs introduced by coercion”
These errors often have to do with an argument or variable being the wrong type. For example, you might be trying to perform a mathematical operation (e.g. calculating the mean) on a character/string variable (e.g. text data, or numbers stored as text). There are a couple of good ways to check variable types:
Using class
tocheck the types of vectors and values. For example, the output of running the following chunk will be “numeric”, “character”, and “logical”
class(1)
## [1] "numeric"
class("a")
## [1] "character"
class(NA)
## [1] "logical"
You can use the same function to check the class of a variable in a dataset, using the following syntax:
class(data_frame_name$variable_name)
If you want an overview of all the variables in your dataset, you can use the str()
(structure) function:
str(data_frame_name)
The above code will give a breakdown of all the variables and their corresponding types.
Some common variable types you will see are:
In order to fix an error with variable types, you will need to know how to change the type (or class) of a variable. The suite of as.*
functions is quite useful. Generally the syntax is as.desired_type(object_that_needs_converting)
.
For example, if a column in a dataset has numbers, but R is treating them as characters (type “chr”) instead, use the function as.numeric()
:
numeric_var <- as.numeric(character_var)
Inside of a dataframe, the code may look something like this:
new_data_frame <- old_data_frame %>%
mutate(numeric_var = as.numeric(character_var))
This ties in nicely to the previous section, which detailed how to access the help files of a function. Help files will almost always describe what variable types are expected by a function. If you’re trying to troubleshoot a function that isn’t working, checking the help file and your variable types is a good place to start.
A couple of good practices for locating the problem in a chunk of code:
Read error messages closely for clues. While not all error messages are helpful, often times they will specify which function or operation is causing your function to fail.
Look behind you using traceback()
. After getting an error, run the function traceback()
in the console; this will show the steps that led up to the error. This can be especially useful when a long chunk of code is throwing an error.
Add checkpoints to your code via print()
statements. By adding lines like print("got to here!")
throughout your code, you may be able to identify exactly where your code is failing. (note: print()
statements will not fix errors, but may prove helpful in isolating errors.)
A variation on print()
statements: if you are creating objects in your code, you can print those objects when they are created to check if the output matches what you expect.
#
at the start of a line to comment out lines of your code. (When lines are commented out, R ignores that code.) If your code runs successfully after you comment out a line, there is a good chance the problem is happening in that line of code.Sometimes errors can arise when you compute summary statistics on a dataset. For example, the below code attempts to compute the mean of bill length and body mass in the penguins
dataset.
penguins %>%
summarize(mean_bill_length = mean(bill_length_mm),
mean_body_mass = mean(body_mass_g))
## # A tibble: 1 x 2
## mean_bill_length mean_body_mass
## <dbl> <dbl>
## 1 NA NA
The resulting dataset shows NA
values where means should be. Why? There are NA
s in the columns used for this calculation, and R cannot calculate the mean of missing. Because R is by default cautious about missing values, a single NA
is enough to disrupt calculations. To change the way R approaches missing values, use the argument na.rm = TRUE
. This tells the mean()
function to remove the NA
s before making the calculation. Note the change when na.rm
is incorporated into code:
penguins %>%
summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE),
mean_body_mass = mean(body_mass_g, na.rm = TRUE))
## # A tibble: 1 x 2
## mean_bill_length mean_body_mass
## <dbl> <dbl>
## 1 43.9 4202.
Exclusing missing values from calculations can be very useful when computing summary statistics, and na.rm
can be included in functions like mean()
, median()
, sum()
, min()
, max()
, among others. If your dataset includes missing values and you are encountering errors in your code, double-check to make sure R is handling NA
s as you expect.