Dates, Times, & Stata
Often data is imported into Stata with string format dates such as 10/09/1986 or 9oct1986. While useful for a human eye, a computer can't glean much from data in this format. Therefore, Stata offers tools to turn these date strings into values that, while still displaying sensical data to humans, are encoded in a numeric format that Stata likes. Moving from dates to Stata dates is a three step process.
Before the process though, a bit of the theory behind it. Basically, computer time is zeroed to 01jan1960 00:00:00:00 o'clock. Thus, to Stata and other programs, every date following can be represented in terms of how far away from this origin date it is. This system even accounts for leap seconds. Because computers thrive on complex calculations, Stata will do this math for you. For times, Stata uses 00:00:00:00 as the origin of the day and once more proceeds to calculate milliseconds since then when given a time such as 1:26 p.m. By putting everything into milliseconds, Stata can meaningfully subtract or add dates and times to calculate elapsed time between events.
Stata offers nine time formats which all count in slightly different ways. %tc uses 01jan1960 base and counts in milliseconds, without counting leap seconds. %tC does the same thing except with leap milliseconds. %td counts from the same origin but in days since 01jan1960. %tw uses the same origin but counts in weeks from 01jan1960, %tm is the equivalent in months, %tq in financial quarters, and %th in half years. Additionally, %ty is years since 0 A.D. and %tg allows you to set your own origin time and increment.
Stata offers a variety of functions for converting the string dates to the above-listed formats. Using the generate
command and various clock functions, most any date or time format can be changed into a numeric version of itself. Once you have the date/time information in Stata in a string format, the first step is to generate a new variable with the data converted to numbers. You should already be familiar with the generate
command, so all that remains is to get to know the clock & date functiosn. The clock & date functions can look at strings like 10jan2009 or 14:35:00 or even 12/3/02 2:34pm and convert them into an all-number format. The generic command takes the following form generate double [newvar name] = clock([old string varname], "[format of string]"
date takes the same basic syntax. The table below from Stata outlines what arguments can go in the [format] section of the command and what they correspond to.
The first example in our sample dataset is date with the day then month then year (e.g., 10jan2008). Since this first column has no time component, date
will be used. The original variable in this dataset is named "datestr." I'll simply call the target variable "date". Finally, 10jan2008 is the day the month and then the year, represented by DMY according to the table above. Thus the command is generate double date = clock (datestr, "DMY"
If you were to browse
your data now you would see the old string date variable, the not yet used other two string variable examples, and a new variable called "date" with long number strings. After another example or two, we will change those number strings into something we can read too.
The next example is just a time, in this case using 24 hour time without seconds. The original variable is "timestr" and the new variable will just be "time". I see from the table above that h=hour and m=minute so my format is "hm". Since I'm dealing with a time, I use clock
not date
as the function. The final command is generate double time= clock(timestr, "hm")
The last example is an observation set combining date and time on a 12 hour clock with inconsistent spacing. While this might look more daunting, it is really just a combination of the previous two processes. Since there is a time component, clock
is used. The original variable is "datetimestr" the target variable will just be called datetime. The format is "MDYhm" and the full command is generate double datetime = clock(datetimestr,"MDYhm")
Now if you browse
the data, it should look like this:
All that remains is to tell Stata to display the new dates in a way you can understand. The following table is also taken from Stata and illustrates the formats you have to chose from. If you change your mind, there are ways to switch between the various formats. Simply select the format which corresponds to the command after generate
The command is format [var] [style]
For the first example, this ends up being format date %td
, the second example takes format time %tc
and the third format datetime %tc
Assuming that worked, the numeric data (in black) should match the original string data (in red) like so:
Here is a table from the help file which illustrates common formats and how to convert them to numerics
For a more detailed explanation of dates and times in Stata see help date
or here (the same thing but on the internet). Additionally, the official Stata Data Management [D] has an entire chapter devoted to dates and times in Stata.
Back to Data Types & Formats