Anytime you have used statistics to summarize observations, you've probably been wrong. If you need to know how close you have come to the truth, use standard errors. Statisticians are perhaps the only professionals who not only proudly admit that their answers are probably wrong, but will go to great lengths to tell you exactly how wrong they are. When you conduct a survey, record observations, or conduct some sort of experiment, your results describe only your samplethe customers, patients, students, goldfish, or pieces of Kryptonite that you have in front of you. Inferential statistics uses values computed for a sample to estimate what that value would be for the population it is meant to represent. For example, the mean of a sample is a pretty good guess for the mean of the population. The problem is knowing whether to trust your results. Calibrating Error and Calculating PrecisionIt is unlikely that the mean of a sample is exactly the same as the mean of the population, but it is likely to be close. If you want to know how far wrong you are, you can calibrate your precision using standard errors. The standard error of the mean gives us an estimate of the distance between our sample mean estimate and the actual population mean.
Fortunately for anyone curious to know how far a statistical finding is from the hidden truth, every popular statistical procedure provides a standard error. After introducing the following basic concepts, this hack will explain how to apply the following standard errors:
There are three common ways that standard errors are used to verify the accuracy of results of statistical analyses. The particular tool you use depends on whether you want to know how close you are to correctly estimating:
Mean EstimatesThe precision of a sample mean as an estimate of a population mean is based on sample size. Here's the formula: As the sample size increases, the closer the sample mean is to the true population mean. This makes sense if you think of sample size as the number of independent observations; the more looks you get at something, the more accurate your description will be.
Proportion EstimatesWhen a sample of people is surveyed and the results are presented as some percentage or proportion (e.g., "72 percent of all sailors have knee trouble"), that percentage is some distance from the actual percentage you'd find if you surveyed the whole population. If the sample was selected randomly, the standard error of proportion indicates how close the sample percentage is to the population percentage. The standard error of proportion is based on sample size and the size of the proportion. Here's the formula: Like the standard error of the mean, as the sample size increases, the size of the standard error of the proportion decreases. If you are mathematically oriented, you might notice that as the proportion moves away from .50, the smaller that number in the top part of the formula becomes. When the calculations are made, then, the further the sample proportion is from .50, the smaller the standard error of the proportion. Another point of interest is that the top part of the formula is an indication of the amount of variability in the sample. (proportion)(1 - proportion) is the standard deviation for proportions squared.
Estimates of Future PerformanceIn regression analyses, scores on one or more variables are used to estimate scores on another variable [Hack #13]. However, that predicted score is unlikely to be exactly right. Just as we can calculate how far an average sample mean is from a population mean or how far off our survey results are from theoretical population results, we can also say how far off, on average, our regression prediction will be from the actual score a person would get. Here's the formula: The standard deviation used in the equation is the standard deviation of the criterion variable, which is the one you are predicting. The correlation is the correlation between your predictor(s) and the criterion variable.
Notice with this formula that the larger the correlation, the smaller the standard error of the estimate. This makes sense, because if there is a lot of informational overlap between two variables, you can get a good sense of the score on one variable by looking at the other.
Using Standard ErrorsHere's how to use these tools to state with some confidence the range within which the truth lies. Because sampling errors are normally distributed, the standard error can be used just like a standard deviation to define specific proportions of scores under the normal curve. For example, if we want to provide a range of values in which the population value falls 95 percent of the time, we can build a 95 percent confidence interval around our sample value. Based on the normal curve [Hack #23], 1.96 standard errors on either side of the sample value should provide a range of values that we can say with 95 percent certainty contains the population value. Table 2-11 shows some examples of various standard errors and the use of sample data to produce these confidence intervals [Hack #6]. Notice how a larger sample size creates a sample estimate closer to the population value, and a larger sample size also points to a confidence interval that is more precise.
Uncle Frank's Campaign for DogcatcherAs the campaign manager for my Uncle Frank in his recent campaign for dogcatcher, I had an opportunity to use standard errors. Several weeks before the election, I surveyed 30 randomly chosen voters in the town of Tonganoxie, Kansas, where Frank lives. My survey found that 50 percent of respondents said they would vote for him. I warned Uncle Frank that the sample was so small that it was not a very precise reflection of the entire population of voters. After referring to Table 2-11, I determined that if we had surveyed all the voters in town, the percentage saying they would vote for Frank might reasonably be anywhere between about 32 percent and 68 percent, though the most likely value was 50 percent. Of course, the optimist that is my uncle interpreted this as meaning he might have 68 percent of the vote and a huge lead. He spent the rest of his campaign chest on a giant victory party the night before the election. I, being the realist that I am and knowing my uncle's reputation around town, assumed the true outcome would be in the other direction. It was. That's okay, though. It was a great party. Why It WorksWe can trust the accuracy of standard errors if we accept the following assumptions and apply some common sense:
The formulas are constructed in such a way that if you have little or no information about the population, then the size of the error in your sample estimate is about the size of the standard deviation of the population. Look what happens with the standard error of the mean or the standard error of the proportion when the sample size is 1, or what happens with the standard error of the estimate when the correlation is 0.00. Intuitively, a good formula for figuring the standard error size should produce smaller errors when more is known about the population. |