troubleshooting.utf8

Troubleshooting

Deciphering error messages in R, especially the first time you see them, can be perplexing and time-consuming. Here our team offers some tips to hopefully make your debugging experiences a bit less painful.

First steps

First, we recommend looking for common, small errors.

Start by checking your code for:

Misspellings: Are words, functions, and names of datasets and variables spelled correctly? (Note that objects in R are case-sensitive; data and Data are distinct.)
Commas: Any missing, any extra?
Parentheses / brackets: Any unmatched? (This could be too few, or too many)
Pipes %>%, used to connect code: Do you have the connections you need?
Within ggplot() code: Do you have the correcte number of plus signs +? (used to connect plotting code)
Packages and data: Have you loaded everything you are using? (If not, you may get an error like “could not find function” or “object not found”)
File paths: When loading in data or objects, are you asking R to look in the correct place?

Sometimes, your R session can get a bit clogged up; in that case, saving your work and restarting RStudio may fix the problem. (Yes, sometimes “try turning it off and on again” is the solution)

If you have checked the above list and you are still encountering an error, or your code isn’t functioning properly, consider the below four approaches for further troubleshooting.

Getting information about a function in RStudio

Reading the details behind a function can help with troubleshooting. While you can also look up documentation on the internet, most R packages and functions have help files that can be accessed through the R Console.

Once you have loaded a package into your environment, you can type ?function_name into the console and hit enter/return. This will load the help file for that function in the RStudio Help panel, containing several useful pieces of information:

Usage: a general description of the function as well as its default arguments
Arguments: a more detailed breakdown of the function’s inputs/arguments. This is particularly useful for finding out what types of objects the function accepts as arguments
Value: A description of what the output of the function will be
Examples: Some code that shows off a common way to use the function

For example, running the following code will bring up the help file for the mean() function:

?mean

A fairly common problem with using the mean() function is to run something like this:

mean(1, 2, 3)

## [1] 1

The above code will not produce an error, but it will return “1” which we know is not the mean of 1, 2, and 3. The help file specifies that the values used to find the mean of should all be in the first argument (before any commas). (In the “mean” help documentation, you can see this in both the “Usage” and the “Examples” section). The correct code makes use of the c() function to concatenate 1, 2, and 3 into one argument:

mean(c(1, 2, 3))

## [1] 2

The above code works properly, returning a mean value of 2.

Troubleshooting variable types

A fairly common error message to get is something along the lines of"

“argument is not ____” or " non- ____ argument to a ___ function" or “NAs introduced by coercion”

These errors often have to do with an argument or variable being the wrong type. For example, you might be trying to perform a mathematical operation (e.g. calculating the mean) on a character/string variable (e.g. text data, or numbers stored as text). There are a couple of good ways to check variable types:

Using class tocheck the types of vectors and values. For example, the output of running the following chunk will be “numeric”, “character”, and “logical”

class(1)

## [1] "numeric"

class("a")

## [1] "character"

class(NA)

## [1] "logical"

You can use the same function to check the class of a variable in a dataset, using the following syntax:

class(data_frame_name$variable_name)

If you want an overview of all the variables in your dataset, you can use the str() (structure) function:

str(data_frame_name)

The above code will give a breakdown of all the variables and their corresponding types.

Some common variable types you will see are:

“int” (integer)
“num” (numeric)
“chr” (character)
“Factor with [x] levels”

In order to fix an error with variable types, you will need to know how to change the type (or class) of a variable. The suite of as.* functions is quite useful. Generally the syntax is as.desired_type(object_that_needs_converting).

For example, if a column in a dataset has numbers, but R is treating them as characters (type “chr”) instead, use the function as.numeric():

numeric_var <- as.numeric(character_var)

Inside of a dataframe, the code may look something like this:

new_data_frame <- old_data_frame %>% 
  mutate(numeric_var = as.numeric(character_var))

This ties in nicely to the previous section, which detailed how to access the help files of a function. Help files will almost always describe what variable types are expected by a function. If you’re trying to troubleshoot a function that isn’t working, checking the help file and your variable types is a good place to start.

Locating the problem in a chunk of code

A couple of good practices for locating the problem in a chunk of code:

Read error messages closely for clues. While not all error messages are helpful, often times they will specify which function or operation is causing your function to fail.
Look behind you using traceback(). After getting an error, run the function traceback() in the console; this will show the steps that led up to the error. This can be especially useful when a long chunk of code is throwing an error.
Add checkpoints to your code via print() statements. By adding lines like print("got to here!") throughout your code, you may be able to identify exactly where your code is failing. (note: print() statements will not fix errors, but may prove helpful in isolating errors.)

A variation on print() statements: if you are creating objects in your code, you can print those objects when they are created to check if the output matches what you expect.

Isolate code through comments. Depending on what type of code you’re writing, you can place a # at the start of a line to comment out lines of your code. (When lines are commented out, R ignores that code.) If your code runs successfully after you comment out a line, there is a good chance the problem is happening in that line of code.

Being aware of NAs

Sometimes errors can arise when you compute summary statistics on a dataset. For example, the below code attempts to compute the mean of bill length and body mass in the penguins dataset.

penguins %>%
  summarize(mean_bill_length = mean(bill_length_mm),
            mean_body_mass = mean(body_mass_g))

## # A tibble: 1 x 2
##   mean_bill_length mean_body_mass
##              <dbl>          <dbl>
## 1               NA             NA

The resulting dataset shows NA values where means should be. Why? There are NAs in the columns used for this calculation, and R cannot calculate the mean of missing. Because R is by default cautious about missing values, a single NA is enough to disrupt calculations. To change the way R approaches missing values, use the argument na.rm = TRUE. This tells the mean() function to remove the NAs before making the calculation. Note the change when na.rm is incorporated into code:

penguins %>%
  summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE),
            mean_body_mass = mean(body_mass_g, na.rm = TRUE))

## # A tibble: 1 x 2
##   mean_bill_length mean_body_mass
##              <dbl>          <dbl>
## 1             43.9          4202.

Exclusing missing values from calculations can be very useful when computing summary statistics, and na.rmcan be included in functions like mean(), median(), sum(), min(), max(), among others. If your dataset includes missing values and you are encountering errors in your code, double-check to make sure R is handling NAs as you expect.