If you want to verify whether a relationship you have observed between two variables is real, you have a variety of statistical tools available. A problem arises, though, when you have measured these variables without much precision, using categorical measurement. The solution is a twoway chisquare test, which, among other things, can be used to make unsubstantiated assumptions about the characteristics of people you have just met. "Identify Unexpected Outcomes" [Hack #15] used the oneway chisquare test to make police scheduling decisions based on whether equal numbers of crimes were committed at different times of day. That tool works well to solve any analytical problem when:
You face another common analytic problem when you're curious to know whether two categorical variables are related to each other. Relationships between categorical variables can be examined with the handy twoway chisquare test.
We make assumptions all the time about relationships between these sorts of variables. Many of our common stereotypes about categories of people have implicit hypotheses about these relationships. Here are a few assumptions you might have that imply a relationship between categorical variables:
If you meet a computer programmer at a party and you hold this stereotype belief about this type of person, you might assume that she is familiar with 20sided dice. If you are wrong, though, that might lead to much awkward conversation. It would be nice to know if there really were such relationships between these categorical variables of interest. Calculating a twoway chisquare solves this problem and can verify or cast doubt on these assumptions about people.
Answering Relationship QuestionsWhile the oneway chisquare analyzes a single categorical variable, twoway chisquares analyze the relationship between two categorical variables. The process is the same: compare the expected frequencies with actual frequencies for each category or combination of categories. If the differences add up to a big number, then something is going on. Here is a categorical relationship question that we might like to have answered. It is similar to other issues of stereotype that could be explored:
You probably already have some assumption about this, but how would you go about checking the accuracy of such an assumption? Conduct preliminary analysesLook at Table 26 for an example of categorical frequency data for, to start, a single categorical variable. This data is fictional, but consistent with published studies, which typically find that Republicans are more likely to be male and that females tend to more commonly identify as Democrats.
In this random sample of 75 Republicans, 45 are males and 30 are females. That's 60 percent male and 40 percent female. Can we conclude that Republicans in general are more likely to be male than female? If not, we would expect there to be 50 percent males and 50 percent females in our sample.
This isn't our research question, though. Compute the twoway chisquareOur initial question included only Republicans, so while political party might have seemed like a variable in our first analysis, it was really just a description of the population; it did not vary at all. We can add party to our analysis, though, by adding another categoryDemocrat, for example and recruiting 75 more participants, and suddenly we have data with two variables. Imagine frequency data as shown in Table 27.
Here we have two categorical variables: party affiliation and sex. We could go ahead and use a oneway analysis to look at either of the two rows by themselves. However, a more typical question would be, "Is there a relationship between party and sex?"
To calculate a standardized measure of the difference between the expected frequencies and the observed frequencies, we use the same formula as with the oneway chisquare. As "Identify Unexpected Outcomes" [Hack #15] demonstrates, we start by totaling up the differences between expected and observed frequencies in each cell (each square of a table). We do the same with the twoway chisquare. The expected frequency in each cell is equal to the number of people in that cell's row multiplied by the number of people in that cell's column and then divided by the total sample size. Using the data in Table 27, the calculations for expected frequencies are shown in Table 28.
Thus, the twoway chisquare calculations look like this: Determine if the chisquare is big enoughStatisticians know that the critical chisquare value for 2x2 tables (like the chisquare we just computed) is 3.84. Chisquare values greater than 3.84 are found by chance about 5 percent of the time or less [Hack #15]. Because our chisquare value was 3.24 and that is less than the key 5 percent value of 3.84, we know that such a fluctuation can occur by chance somewhat greater than 5 percent of the time. We cannot claim statistical significance here, and so we must conclude that though our sample seemed to show a relationship between the two categorical variables of party affiliation and sex, it might have occurred because of chance sampling error. In the population from which the sample was drawn, there might not be any relationship. Why It WorksA twoway chisquare answers this relationship question by looking at differences. This might seem counterintuitive, because most statistics look for differences in order to show, well, a difference, not to show similarities. But here's the thinking:
The problem solved with this hack was one of knowing whether a stereotype belief we held was correct. Of course, outside of the real world, in the scientific world, researchers use this tool to explore a wide variety of complex questions. Twoway chisquares, sometimes called contingency table analyses, are useful anytime you have two categorical variables and want to see whether there is some dependency of one variable on the other. Our example used variables with just two categories, but similar analyses can be done on variables with many categories. The technical requirements are a bit more complex, but the procedure is the same. See Also
