Chapter 50: Analysis of Variance: One-Way ANOVA


Overview

  • The owner of my company, which publishes computer books, wants to know whether the position of our books in the computer book section of bookstores influences sales. More specifically, does it really matter whether the books are placed in the front, back, or middle of the computer book section?

  • If I am determining whether populations have significantly different means, why is the technique called analysis of variance?

  • How can I use the results of one-way ANOVA for forecasting?

We often have several different groups of people or items and want to determine whether data about the groups differs significantly. Here are some examples:

  • Is there a significant difference in the length of time that four doctors keep mothers in the hospital after they give birth?

  • Does the production yield for a new drug depend on whether the size of the container in which the drug is produced is large, small, or medium?

  • Does the drop in blood pressure attained after taking one of four drugs depend on the drug taken?

When you’re trying to determine whether the means in several sets of data that depend on one factor are significantly different, one-way analysis of variance, or ANOVA, is the correct tool to use. In the examples given above, the factors are the doctors, the container size, and the drug, respectively. In analyzing the data, we can choose between two hypotheses:

  • Null hypothesis, which indicates that the means of all groups are identical.

  • Alternative hypothesis, which indicates that there is a statistically significant difference between the groups’ means.

To test these hypotheses in Microsoft Office Excel 2007, we can use the Anova: Single Factor option in the Data Analysis dialog box. If the p-value computed by Excel is small (usually less than or equal to 0.15), we can conclude that the alternative hypothesis is true (the means are significantly different). If the p-value is greater than 0.15, the null hypothesis is true (the populations have identical means). Let’s look at an example.

  • The owner of my company, which publishes computer books, wants to know whether the position of our books in the computer book section of bookstores influences sales. More specifically, does it really matter whether the books are placed in the front, back, or middle of the computer book section?

  • The publishing company wants to know whether its books sell better when a display is set up in the front, back, or middle of the computer book section. Weekly sales (in hundreds) were monitored at 12 different stores. At 5 stores, the books were placed in the front; at 4 stores, in the back; and at 3 stores, in the middle. Resulting sales are contained in the Signif worksheet in the file Onewayanova.xlsx, which is shown in Figure 50-1. Does the data indicate that the location of the books has a significant effect on sales?

    image from book
    Figure 50-1: Book sales data

  • We assume that the 12 stores have similar sales patterns and are approximately the same size. This assumption allows us to use one-way ANOVA because we believe that at most one factor (the position of the display in the computer book section) is affecting sales. (If the stores were different sizes, we would need to analyze our data with two-way ANOVA, which I’ll discuss in Chapter 51, “Randomized Blocks and Two-Way ANOVA.”)

  • To analyze the data, on the Data tab, click Data Analysis, and then select Anova: Single Factor. Fill in the dialog box as shown in Figure 50-2.

    image from book
    Figure 50-2: Anova: Single Factor dialog box

  • We use the following configurations:

    • The data for our input range, including labels, is in cells B3:D8.

    • Select the Labels option because the first row of our input range contains labels.

    • I’ve selected the Columns option because the data is organized in columns.

    • I’ve selected C12 as the upper-left cell of the output range.

    • The selected alpha value is not important. You can use the default value.

  • After clicking OK, we obtain the results shown in Figure 50-3.

    image from book
    Figure 50-3: One-way ANOVA results

  • In cells F16:F18, we see average sales depending on the location of the display. When the display is at the front of the computer book section, average sales are 900; when the display is at the back of the section, sales average 1400; and when the display is in the middle, sales average 1100. Because our p-value of 0.003 (in cell H23) is less than 0.15, we can conclude that these means are significantly different.

  • If I am determining whether populations have significantly different means, why is the technique called analysis of variance ?

  • Suppose that the data in our book sales study is the data shown in the worksheet named Insig, shown in Figure 50-4 on the next page (also in the file Onewayanova.xlsx). If we run a one-way ANOVA on this data, we obtain the results shown in Figure 50-5 on the next page.

    image from book
    Figure 50-4: Book store data for which the null hypothesis is accepted

    image from book
    Figure 50-5: Anova results accepting the null hypothesis

  • Note that the mean sales for each part of the store are exactly as before, yet our p-value of .66 indicates that we should accept the null hypothesis and conclude that the position of the display in the computer book section doesn’t affect sales. The reason for this strange result is that in our second data set, we have much more variation in sales when the display is at each position in the computer book section. In our first data set, for example, the variation in sales when the display is at the front is between 700 and 1100, whereas in the second data set, the variation in sales is between 200 and 2000. The variation of sales within each store position is measured by the sum of the squares of data within a group. This measure is shown in cell D24 in the first data set and in cell F24 in the second. In our first data set, the sum of squares of data within groups is only 22, whereas in the second data set, the sum of squares within groups is 574! This large variation within the data points at each store position masks the variation between the groups (store positions) themselves and makes it impossible to conclude for the second data set that the difference between sales in different store positions is significant.

  • How can I use the results of a one-way ANOVA for forecasting?

  • If there is a significant difference between group means, our best forecast for each group is simply the group’s mean. Therefore, in the first data set, we predict the following:

    • Sales when the display is at the front of the computer book section will be 900 books per week.

    • Sales when the display is at the back will be 1400 books per week.

    • Sales when the display is in the middle will be 1100 books per week.

  • If there is no significant difference between the group means, our best forecast for each observation is simply the overall mean. Thus, in the second data set, we predict weekly sales of 1117, independent of where the books are placed.

  • We can also estimate the accuracy of our forecasts. The square root of the Within Groups MS (mean square) is the standard deviation of our forecasts from a one-way ANOVA. As shown in Figure 50-6, our standard deviation of forecasts for the first data set is 156. By the rule of thumb, this means that we would expect, for example:

    • During 68 percent of all the weeks in which books are placed at the front of the computer section, sales will be between 900–156=744 and 900+156=1056 books.

    • During 95 percent of all weeks in which books are placed at the front of the computer book section, sales will be between 900–2(156)=588 books and 900+2(156) =1212 books.

    image from book
    Figure 50-6: Computation of forecast standard deviation




Microsoft Press - Microsoft Office Excel 2007. Data Analysis and Business Modeling
MicrosoftВ® Office ExcelВ® 2007: Data Analysis and Business Modeling (Bpg -- Other)
ISBN: 0735623961
EAN: 2147483647
Year: 2007
Pages: 200

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net