Data @ Reed

Working with flat files

Reading plain-text tables

There are two common ways to read plain-text tables (also called “flat files”): base R and the readr package. Using readr does take an extra step to set up (using library(readr) or readr::function()), but offers some benefits over base R including:

  • Faster run time (not an issue unless you have some big files)
  • Automatically parses some column types, like datetimes
  • Doesn’t automatically convert strings to factors

Here’s a table listing some of the functions that you can use to read files, based on the type of file.

Type of file Base R code readr code
CSV (comma separated value) read.csv("file.csv") read_csv("file.csv")
TSV (tab separated value) read.delim("file.txt", sep = "\t") read_tsv("file.txt")
Other character-delimited file read.delim("file.txt", sep = ";|;") read_delim("file.txt", delim = ";|;")
Fixed-width text file (FWF) read.fwf("file.txt", widths = c(5,10,9,3,1,1)) read_fwf("file.txt", col_positions = fwf_widths(c(5,10,9,3,1,1)))

To use the readr functions, either preface the function as readr::function() or run library(readr) before use. With the base R functions, you’ll most likely want to include the argument stringsAsFactors = FALSE inside the function to simplify common data wrangling steps to follow.

Writing plain-text tables

This can mostly be done with base R functions. Fixed-width files require a separate package (gdata) and are less clear in column separation than delimited files, so stick to the csv format when possible.

CSV: write.csv(object, "file_name", row.names = FALSE)

TSV: write.table(object, "file_name", sep = "\t")

Other character delimited: write.table(object, "file_name", sep = ";|;")

Fixed-width: gdata::write.fwf(object, "file_name")

The FWF option will automatically determine column widths based on the data, but you can manually change this with some of the parameters to the function.