Reshaping datasets
Depending on your goal in your analysis or visualization, you may need your data in “long” format or in “wide” format. This is illustrated below, using the 1990, 2000, and 2010 US Census data.
Example of long format:
STATE_NAME YEAR POPULATION
CA 1990 29,760,021CA 2000 33,871,648
CA 2010 38,802,500
OR 1990 2,842,321
OR 2000 3,421,399
OR 2010 3,831,074
Example of wide format:
STATE_NAME POP1990 POP2000 POP2010
CA 29,760,021 33,871,648 38,802,500
OR 2,842,321 3,421,399 3,831,074
Stata is able to convert data back and forth between these two formats. Here is their generalized example from the Stata documentation:
If you started with the wide format example dataset and you wanted to convert from wide format to long format, you would use the following code:
reshape long POP, i(STATE_NAME) j(YEAR)
That code tells Stata to reshape the data to long, and to construct a dataset that will contain values of STATE_NAME (groups observations together), POP (the values currently listed out yearly) and a new variable YEAR (formed from the information in the variable name itself). Note that Stata is looking at the variable name (e.g. POP1990, POP2000) to create the grouping for YEAR.
If you started with the long format example dataset and you wanted to convert from long format to wide format, you would use the following code:
reshape wide POP, i(STATE_NAME) j(YEAR)