Economics 312

Spring 2012

Econometric Project #3

Due 6am, February 21

Hill, Griffiths, and Lim, Problem 5.13

Problem 5.13 will be your first opportunity to explore "hedonic models" in which selling prices (often, as here, of houses) are related to the characteristics of the product being sold. (The sample report that I prepared deals with a hedonic model, but with a different dataset.) This problem examines several alternative functional forms and asks you to perform various estimations and tests using a quadratic model of two explanatory variables.

Monte Carlo Simulation of Omitted Variable Bias

The economic return to education has often been estimated by regressing the log of an individual's wage rate on the number of years of education he or she completed. One criticism of this approach is that people with more natural ability, who would tend to earn more regardless of their level of education, also tend to get more education, resulting in omitted-variable bias if ability (which usually cannot be measured) is not included in the equation.

The data set wage_MC.dta contains 2061 observations on log-wage, education, and IQ (a measure of cognitive ability). (The data set is from a well-known paper by David Card.) You are to perform a Monte Carlo study to demonstrate the effect of omitting IQ on the estimated coefficient of education in a wage regression.

  • As a first step, estimate a regression of ln(wage) on educ and IQ. We will take this regression to represent the "true model." The residuals from this regression are not too far from being normally distributed (though we can reject that hypothesis), so we will assume that the underlying error term is normal, with mean zero and standard deviation given by the root MSE of your regression.
  • Construct a do-file to
    • Generate a normal error term with the properties above.
    • Generate a simulated ln(wage) variable assuming that the three regression coefficients are the values estimated above.
    • Regress the simulated ln(wage) variable on educ, leaving IQ out of the regression.
  • Simulate your do-file and save the estimated education coefficients and their standard errors from each replication.
  • Characterize the distribution of your estimated slope coefficients in relation to the "true" effect of education. Does the bias correspond to what would be predicted by the formula discussed in class (of which HGL's equation (6.23) is a variant)?
  • In your simulation, what proportion of the time does your omitted-variable regression reject the true null hypothesis about the effect of education on ln(wage)?

Links to Datasets

Problem 5.13: br2.dta br2.def
Monte Carlo wage data: wage_MC.dta

Project Teams

Casey Anderson Joan Wang
Brett Beutell Nick Pittman
Martis Buchholz Brian Moore
Jess Delaney Mischka Moechtar
Anya Demko Paige Leishman
Lauren DeRosa Svetoslav Ivanov
Zach Horváth Sean Howard
Joseph Warren Sunny Yang
Allie Hemmings (solo this week)