Moving Data into Stata Using Stata
Stata has three major commands for importing data from other programs. These are the infile
command, the import delimited
and the infix
command. In addition, assuming the program the data is currently in contains the data in a table (spreadsheets count), copying and pasting will often work. The format of the data to be imported will determine which command you ultimately use. All of the following commands should be typed into the Command window. This command will refuse to run if you already have data open, so as to safeguard you from accidentally losing what you've been working so hard on.
When your data is exported in an ASCII format (an option every spreadsheet program provides), you only need the import delimited
command (formerly it was the insheet command). The import delimited
command can figure out for itself whether the file is comma- (.csv) or tab (.tab or .tsv or .dat)-delimited. If the variable names are not in the file, the import delimited command will give the variables arbitrary names (e.g., v1, v2, vetc) if which you can go back and change later. If the variable names are included, Stata knows to read them from the top line. You can also specify names for the variables by including the names you want the variables to have in the command.
To just import with Stata-assigned names, simply type import delimited using filename.extension
To import the whole thing with your own names, add the names before the end of the command.
import delimited v1sname v2sname v3sname etc using filename.extension
If the variable's name is more than eight characters long, Stata will abbreviate it when displaying the data.
Note that if you are not currently working in the same directory as where the file is stored, you will need to use the whole file path, e.g., if the file I wanted was in a folder called "Data", and Stata's current working directory was the Documents folder (the default). I would type import delimited using /Data/ filename.extension
. Note that file paths are case sensitive so if the folder is called "Data", telling Stata to look in /data will not work. The working directory is displayed directly under the toolbar, in the top left corner.
Assuming your data is less neatly arranged, you will need the infile
command. Most often this will be the case if your data is in a raw (.raw) or text (.txt) format. In these cases, data are usually just separated by white space (rather than commas or tabs) and may have other funny formatting.
The basic command is simply infile using filename.extension
but it needs to be followed by arguments that explain to Stata what it's dealing with.
Stata can deal with strings (such as names) as long as the string value has quote marks around it in the file. Additionally, Stata will process . or " " as a missing value, however it will ignore a blank white space (one without quotes) and fill in the next value. On the positive end, this means if an observation is broken up onto multiple lines Stata will have no trouble reading it as a single observation.
To get Stata to read this sort of unformatted file, you would type infile varname var name2 varnamen using filename.extension
Conversely, sometimes a text or raw file can be organized into neat columns, such that a human could read them without issue or perhaps the string variables do not have quote marks (which is still easily interpreted by humans). For a computer, this sort of thing can be much harder to read. For instance, if my first column as a string variable without quotes, the second column separated by spaces, was a number, etc and I just typed infile using messedupdata.txt
Stata would do it all wrong.
To tell Stata that your data is in neatly arranged columns, effectively teaching it how to read them, you will need to create a dictionary file.
Once your dictionary file is created the command becomes infile using mydictionaryfile.dct
To read more on these commands, look at their help files in Stata.