Hack 56. Predict the Outcome of a Baseball Game

Turn your radio on in the middle of a baseball game for five seconds and then turn it off. Without hearing the score, you'll be able to name the winner, and you'll be right more than half of the time.

Look, I'm a busy guy. I'm always looking for a way to save time on the less important things in life, such as following my local baseball team, so I'll have more time to spend on the important things in lifefriends, family, debating the logic of the Holms' sequential Bonferroni procedure as the appropriate follow-up method to analysis of variance, and so on. A case in point happened just the other day. Wanting to know whether the Kansas City Royals would win a baseball game that was in progress, I hardly had time to wait until the game was over. I wanted to know right now!

Much like Veruca Salt and her interest in owning one of Willy Wonka's Oompa-Loompas "now!", I don't have much patience.

Like a bolt from the blue, I realized that I could turn on my car radio for just a few seconds and have enough information to guess the outcome of the game. And I could do that without hearing the score or who was on base.

How It Works

During the first couple hours of a baseball game, turn on the radio broadcast of that game. Listen just long enough to identify the team that is at bat. That team has a greater than 50 percent chance of winning that game.

Why It Works

Baseball is a game where the longer you are on offense, the more points you can score. As more batters come to bat in a single inning, the chances of moving runners along the base paths and across home plate increases. Another way to look at it is to imagine the end of an inning that was huge for one team. If a team scored a lot of runs, they had to have used considerably more than the minimum of three batters in that inning and, consequently, been at bat a proportionately longer length of time than the other team. Over the course of a game, the team that is at bat longest is more likely to score more (or have more productive innings).

Sampling theory [Hack #19] suggests that a sample is most likely to capture the most common elements of a population. Our population here is all the moments during a game that we could listen to. The most common characteristic in the population (in terms of who is at bat) belongs to the team that is at bat the most.

Figure 5-4 suggests a possible distribution of at-bat time for a regulation nine-inning game. In this example, the winning team was on offense for 58 percent of the time. In retrospect, a random tuning in to the broadcast had a 58 percent chance of finding the winning team at bat.

Figure 5-4. Time at bat for winning and losing teams

The accuracy of prediction should be above 50 percent over the long run of baseball broadcasts, but it won't be really, really accurate. This is because the relationship between time at bat and scoring a victory is not a perfect correlation [Hack #11]. Players can score quicklyhit a homerun on their first pitch, for exampleor they can take their time getting many hits but strand many runners and never score.

Overall, the correlation between the two variables should be positive, however. Even the perhaps unimpressive 58 percent accuracy in my imagined data in Figure 5-4 means that you will be right 16 percent more often than a blind guess. With such an advantage at the blackjack tables, you would be a millionaire in a week.

Proving It Works

To test the accuracy of my claim, you can use the data that appears in your daily newspaper. While most box scores do not include information about total time-at-bat for each team, there is a variable that provides almost the same information. There will almost certainly be a "total at-bats" reported. While this statistic is not the same as time spent at bat, it should correlate pretty highly. Each day, this information is provided for more than a dozen games, and just a few days' worth of data should be enough to test my theory. Gather the total at-bats for each team, including which team won the game.

Real-life researchers often don't have access to the variable they would really like to know about, and us using number of at-bats instead of time at bat is good example of this. Instead, we must settle for the next best thing available. Scientists call these substitutes proxy variables or surrogate variables.

My hypothesis is that the team with the most at-bats should win the game more than 50 percent of the time. Out of curiosity, I tested this hypothesis myself. I used the Chicago Cubs as an example, because their stats were readily available on the Web. I arbitrarily chose 2003 and the Cubs' first 25 games. An analysis of these games found that the team with the most at-bats won 56 percent of the time. If I had eliminated the three situations where there were ties in at-bats, I could have predicted with 63 percent accuracy.

While the team with the fewest at-bats sometimes did win the Chicago Cubs games, the larger the discrepancy between at-bats, the more likely the team with the most at-bats was to win the game. When the most-at-bats teams won, they averaged 4.14 more at-bats than the loser. When the least-at-bats teams won, they averaged only 2.88 at-bats less than the loser.

Other Places It Works

Some people have suggested that in the case of my team, the Kansas City Royals, if I want to be right more than half the time, I should always predict a loss. Yes, yes, very funny.

Where It Doesn't Work

The accuracy of this method should be low if you turn on the radio in the ninth inning, which is why I suggest you try it during the first couple hours of the game. Under the rules of baseball, if the home team is leading after the top of the ninth inning, they never come to bat. They win. Game over. As home teams win more often than visiting teams, this means that often the winning team never comes to bat at all in the ninth inning.

This presents an interesting variation of this prediction method that applies only to the ninth inning. Turn on the game in the ninth inning; if your team is batting, things don't look so good. The data presented for the Chicago Cubs that found the winning team occasionally having fewer at-bats than their opponent can be partly explained by the fact that the winning team sometimes bats in only eight innings.

This method doesn't work for all sports. In basketball, for example, time of possession wouldn't be expected to positively correlate with points scored and, in the case of high-energy, fast-scoring teams, might even negatively correlate. In football, on the other hand, time of position is considered a key indicator of quality performance and usually correlates with a win.

Statistics Hacks
Statistics Hacks: Tips & Tools for Measuring the World and Beating the Odds
ISBN: 0596101643
EAN: 2147483647
Year: 2004
Pages: 114
Authors: Bruce Frey

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net