Tabular Data Best Practices
Spreadsheets are widely used in research for data entry and storage. Creating a well-formed spreadsheet makes your data shareable and reusable later. Below are best practices to consider.
Create Good Variable Names
- Make them short but meaningful
- Do not use spaces, instead use underscores
- Do no use special characters & , % # ; * : ( ) ! @ $ ^ ~ ’ { } [ ] ? < > - + /
Enter Dates as Text
- Excel automatically reformats dates, which can then display differently on Windows and Macs
- Use the form YYYYMMDD (ISO 8601)
- Excel will treat it as a text field; sorting alphabetically will also sort chronologically
Use Consistent Codes for
- Subject/participant identifiers
- Categorical variables
- Missing values
Put One Item per Cell
- Break up multiple measurements into multiple cells
- Do not include units in the cell, put these in the column names or, better yet, a data dictionary
- Do not use fonts, italics, bold, or cell colors to represent data
- No empty cells, use a standard code to indicate missing data
Make It a Rectangle
- Put variables in columns and experimental unit/observation/measurement in rows
- Only use one row for variable names
- Only one rectangle per spreadsheet
Use Version Control to Track Changes
- Use a version control system like Git
- Upload files to Google Drive or the Open Science Framework for automatic version tracking
- At a minimum, use a suffix in the file name to represent the version number
- Example: ProjectName_Instrument_Condition_YYYYMMD_v01.txt
- Never call a file “final”
Getting Help
- When searching the Internet for help include your Excel version and operating system