Hack 10. Know Big When You See It

You've just read about an amazing new scientific discovery, but is such a finding really a big deal? By applying effect size interpretations, you can judge the importance of such announcements (or lack thereof) for yourself.

Something is missing in most reports of scientific findings in nonscientific publications, on TV, on the radio, anddo I even have to mentionon the Web. Although reports in such media typically do a pretty good job of only reporting findings that are "statistically significant," this is not enough to determine whether anything really important or useful has been discovered. A big drug study can report "significant" results, but still not have found anything of interest to the rest of us or even other researchers.

As we repeat in many places in this book, significance [Hack #4] means only that what you found is likely to be true about the bigger population you sampled from. The problem is that this fact alone is not nearly enough for you to know whether you should change your behavior, start a new diet, switch drugs, or reinterpret your view of the world.

What you need to know to make decisions about your life and reality in light of any new scientific report is the size of the relationship that has just been brought to light. How much better is brand A than brand B? How big is that SAT difference between boys and girls in meaningful terms? Is it worth it to take that half an aspirin a day, every day, to lower your risk of a heart attack? How much lower is that risk anyway?

The strength of that relationship should be expressed in some standardized way, too. Otherwise, there is no way to really judge how big it is. Using a statistical tool known as effect size will let you know big when you see it.

Seeing Effect Sizes Everywhere

An effect size is a standardized value that indicates the strength of a relationship between two variables. Before we talk about how to recognize or interpret effect sizes, let's begin with some basics about relationships and statistical research.

Statistical research has always been interested in relationships among variables. The correlation coefficient, for example, is an index of the strength and direction of relationships between two sets of scores [Hack #11]. Less obvious, but still valid, examples of statistical procedures that measure relationships include t tests [Hack #17] and analysis of variance, a procedure for comparing more than two groups at one time.

Even procedures that compare different groups are still interested in relationships between variables. With a t test, for instance, a significant result means that it matters which group a person is in. In other words, there is an association between the independent variable (which defines the groups) and the dependent variable (the measured outcome).

Finding or Computing Effect Sizes

This hack is about finding and interpreting effect sizes to judge the implications of scientific findings reported in the popular media or in scientific writings. Often, the effect size is reported directly and you just have to know how to interpret it. Other times, it is not reported, but enough information is provided so that you can figure out what the effect size is.

When effect sizes are reported, they are typically one of three types. They differ depending on the procedure used and the way that procedure quantifies the information of interest. In each case, the effect size can be interpreted as estimates of the "size of the relationship between variables." Here are the three typical types of effect sizes:

Correlation coefficient: A correlation, symbolized by r, is already a measure of the relationship between variables and, thus, is an effect size. Because correlations can be negative, though, the value is sometimes squared to produce a value that is always greater than zero. Thus, the value of r ² is interpreted as the "proportion of variance" shared by variables.
d: This value, symbolized by d strangely enough, summarizes the difference between two group means used in a t test. It is calculated by dividing the mean difference of the two groups by the average standard deviation in the two groups.

Here's an alternative, easy, super-fun, ultra-cool, and neato-swell way to calculate d:

Eta-squared: The effect size most often reported for the results of an analysis of variance is symbolized as h².Similar to r ², it is interpreted as the "proportion of variance" in the dependent variable (the outcome variable) accounted for by the independent variable (what group you are in).

Interpreting Effect Sizes

With levels of significance, statisticians have adopted certain sizes that are "good" to achieve. For example, most statistical researchers hope to achieve a .05 or lower level of significance. With effect sizes, though, there are not always certain values that are clearly good or clearly bad. Still, some standards for small, medium, and large effect sizes have been suggested.

The standards for big, medium, and little are based, for the most part, on the effect sizes that are normally found in real-world research. If a given effect size is so large as to be rarely found in published research, it is considered to be big. If the effect size is tiny and easy to find in real-life research, then it is considered to be small.

You should decide yourself, though, how big an effect size is of interest to you when interpreting research results. It all depends on the area of investigation. Table 1-10 provides the rules of thumb for how big is big.

Table Effect size standards
Effect sze	Small	Medium	Large
r	+/-.10	+/-.30	+/-.50
r ²	.01	.09	.25
d	.2	.5	.8
h²	.01	.06	.14

Interpreting Research Findings

The advantage of talking about effect sizes when discussing research results is that everyone can get a sense of what impact the given research variable (or intervention, or drug, or teaching technique) is really having on the world. Because they are typically reported without any probability information (level of significance), effect sizes are most useful when provided alongside traditional level of significance numbers. This way, two questions can be answered:

Does this relationship probably exist in the population?
How big is the relationship?

Remember our example of whether you should decide to take half an aspirin each day to cut down your chances of having a heart attack? A well-publicized study in the late 1980s found a statistically significant relationship between these two variables. Of course, you should talk with your doctor before you make any sort of decision like this, but you should also have as much information as possible to help you make that decision. Let's use effect size information to help us interpret these sorts of findings.

Here is what was reported in the media:

A sample of 22,071 physicians were randomly divided into two groups. For a long period of time, half took aspirin every day, while the other half took a placebo (which looked and tasted just like aspirin). At the end of the study period (which actually ended early because the effectiveness of aspirin was considered so large), the physicians taking aspirin were about half as likely to have had a heart attack than the placebo group. 1.71 percent of the placebo physicians had attacks versus about 1 percent (.94 percent) of the aspirin physicians. The findings were statistically significant.

The "clear" interpretation of such findings is that taking aspirin cuts your chances of a heart attack in half. Assuming that the study was representative and the physicians in the study are like you and me in important ways, this interpretation is fairly correct.

Another way to interpret the findings is to look at the effect size of the aspirin use. Using a formula for proportional comparisons, the effect size for this study is .06 standard deviations, or a d of .06. Applying the effect size standards shown in Table 1-10, this effect size should be interpreted as smallvery small, really. This interpretation suggests that there is really quite a tiny relationship between aspirin-taking and heart attacks. The relationship is real, just not very strong.

One way to think about this is that your chances of having a heart attack during a given period of time is pretty small to begin with. 98.76 percent of everyone in the study did not have a heart attack, whether they took aspirin or not. Although taking aspirin does lower your chances, they go from small to a little smaller. It is similar to the idea that entering the lottery massively increases your chances of winning compared to those who do not enter, but your chances are still slim.

Why It Works

A researcher can achieve significant results, but still not have found anything for anyone to get excited about. This is because significance tells you only that your sample results probably did not occur by chance. The results are real and likely exist in the population. If you have found evidence of a small relationship between two variables or between the use of a drug and some medical outcome, the relationship might be so small that no one is really interested in it. The effect of the drug might be real, but weak, so it's not worth recommending to patients. The relationship between A and B might be greater than zero, but so tiny as to do little to help understand either variable.

Modern researchers are still interested in whether there is statistical significance in their findings, but they should almost always report and discuss the effect size. If the effect size is reported, you can interpret it. If it is not reported, you can often dig out the information you need from published reports of scientific findings and calculate it yourself. The cool part is that you might then know more about the importance of the discovery than the media who reported the findings and, maybe, even the scientists themselves.

Seeing Effect Sizes Everywhere

Finding or Computing Effect Sizes

Interpreting Effect Sizes

Table Effect size standards

Interpreting Research Findings

Why It Works