Chi-square tests
Chi-square tests are non-parametric analyses that evaluate frequencies in a sample and compare those to the expected frequencies in a population. Chi-square goodness-of-fit tests look at one variable, while a chi-square difference of means test looks at two variables.
Chi-square test, goodness of fit
A chi-square goodness of fit compares your observed values to expected values. For this example, we will look at the 1988 NLSW (National Bureau of Labor + Statistics, Young Women dataset) data and use the csgof package for our analysis.
First, install the csgof package. In Stata, type
findit csgof
and click on "csgof from http://www.ats.ucla.edu/stat/stata/ado/analysis". This will take you to a second screen in Stata; click on "click here to install", install the package, and then return to the command line.
Load your data and look at your data (using the browse command).
sysuse nlsw88
br
Based on other information, you hypothesize that the people captured by this dataset are mostly "race = white", with smaller frequencies of people coded as "race = black" and an even smaller group of "race = other".
csgof race, expperc(75 20 5)
These results show that the racial composition in your sample does not match your expectations.
Chi-square test, independence
Also known as the chi-square test for a difference of means, this test examines the relationship between two categorical variables. In this example, I will look at the stock Stata dataset of automobile repair data from 1978 and see if there is a relationship between a car's repair rating and whether or not it was produced in the US.
sysuse auto
tab rep78 foreign, chi2
If you wanted to see row percentages instead of frequencies, specify that in the options:
tab rep78 foreign, row nofreq chi2