Chi-square is a convenient measure of association between two factors when the factors are not quantitative. It indicates the degree to which the frequencies in a cross-tabulation of the two factors deviate from what they would be if no interrelation existed between the factors. The computed chi-square has a specific level of statistical significance that you can look up in a standard table.
Suppose we ask 300 testers to rate two brands of a product (A and B) both in terms of overall preference and preference regarding "comfort." By a convenient coincidence , the " comfort " preference divides exactly even, with 100 preferring A, 100 preferring B, and 100 having no preference.
Comfort level | A | B | No preference |
---|---|---|---|
A | 70 | 30 | 100 |
No preference | 55 | 45 | 100 |
B | 40 | 60 | 100 |
Total | 165 | 135 | 300 |
Clearly, a strong association exists between the preference on "comfort" and overall preference; chi-square is 18.4, indicating a significance level of 99%+. However, let us assume that we suspect the results and upon further investigation, we find that we have recorded the data in the wrong cell . The table should have looked like:
Comfort level | A | B | No preference |
---|---|---|---|
A | 70 | 30 | 100 |
B | 55 | 45 | 100 |
No preference | 40 | 60 | 100 |
Total | 165 | 135 | 300 |
Now, let us see what we have. It still looks like a strong association for A but not for B, so we should have a lower chi-square, right?
No. Chi-square is still 18.4. As long as the numbers stay the same, it does not matter how they are labeled. Like the scarecrow in The Wizard of Oz , chi-square does not have a brain. It is merely an algorithm, a mechanical process based on numbers regardless of what they represent. By itself, it never can take the place of a regression or correlation because it cannot describe the relationship; it can only gauge its statistical significance, entirely regardless of logic or sense.
Chi-square is nonparametric. To describe a relationship in numerical terms, we need numerical values ” that is, parameters. If we arbitrarily assign value +1 to preference for A and -1 to preference for B, we can compute a correlation coefficient of r = +.246 for the original tabulation and exactly half that for the corrected distribution. The parametric regression/correlation, unlike chi-square, is affected by the way the rows and columns are labeled because each label has a specific value.
So chi-square is a very useful index when we cannot assign values, but it is very easy to misuse it; it does not have a brain, so the analyst has to use his or her own brain to interpret it correctly.