Weighted Data in Stata
There are four different ways to weight things in Stata. These four weights are frequency weights (fweight
or frequency
), analytic weights (aweight
or cellsize
), sampling weights (pweight
), and importance weights (iweight
).
Frequency weights are the kind you have probably dealt with before. Basically, by adding a frequency weight, you are telling Stata that a single line represents observations for multiple people. The other weighting options are a bit more complicated.
Analytic weights observations as if each observation is a mean computed from a sample of size n, where n is the weight variable.
Sampling weights (a.k.a. probability weights) cover situations where random sampling without replacement occurs. You can learn more about sampling weights reading this Demographic and Health Survey help page.
Importance weights, unlike the other three types, do not have a specific formula and can only be used with certain commands; they are primarily useful to programmers and will not be discussed here any further.
Most estimation commands can take the first three types of weights. If you are uncertain whether a command can take a weighting, read its help page help [command]
. For example, the first thing in help regress
is a syntax diagram which includes [weight], which means you can use weight commands. To use a weight command you must have a variable that contains the weight information.
Assuming a command allows weights, the syntax simply adds [[weight type]=[name of weight variable]]
before listing any options. For example, presuming I wanted to run a regression and had an analytic weight column called "n", the command would be regress y x1 x2 x3 [aweight=n]
Typing regress y x1 x2 x3 [cellsze=n]
runs the exact same command.