Excel includes a set of data analysis tools, called the Analysis ToolPak, that you can use to develop complex statistical or engineering analyses. Although this book is not geared toward statisticians, engineers, mathematicians, or academic researchers, this section will describe some of the functions in the Analysis ToolPak that are very helpful for analyzing business data. Because the Analysis ToolPak is not very widely understood, this section details the components of common Analysis ToolPak dialog boxes and demonstrates how to use these dialog boxes in common data analysis scenarios. To use the Analysis ToolPak, simply click Data Analysis on the Tools menu. (See the following if the command is not on the menu.) Click a tool in the list, and then complete the information in the resulting dialog box to perform the analysis.
Installing the Analysis ToolPak
If the Data Analysis command doesn’t appear on the Tools menu, click Add-Ins on the Tools menu, select the Analysis ToolPak check box, and then click OK. If the Analysis ToolPak check box is not visible, be sure you have your original Microsoft Office or Excel installation media handy, double-click the Add/Remove Programs icon in Control Panel, and do one of the following:
For Microsoft Windows 2000, Windows Millennium Edition, and Windows XP:
If you installed Excel as part of Microsoft Office, click Microsoft Office in the Currently Installed Programs box and then click the Change button.
If you installed Excel individually, click the Excel program entry in the Currently Installed Programs box and then click the Change button.
For Microsoft Windows 98 and Windows NT 4.0:
If you installed Excel as part of Microsoft Office, click Microsoft Office on the Install/Uninstall tab and then click the Add/ Remove button.
If you installed Excel individually, click the Excel program entry on the Install/Uninstall tab and then click the Add/ Remove button.
Follow the instructions on the screen. The Analysis ToolPak can be found by expanding the Microsoft Excel for Windows node and then expanding the Add-Ins node.
Some of the useful Analysis ToolPak tools for business analysis include the following:
Descriptive Statistics, which provides a quick list of summarizations such as minimum, maximum, median, mode, and other results.
Histogram, which provides a categorization of data values into similar groupings, or bins.
Moving Average, which provides a series of data values averaged over time, for use in forecasting or trending.
Rank and Percentile, which provides a list of how data values compare or rank against each other.
Sampling, which selects a random number of representative data values. Sampling is especially helpful when you want to quickly analyze a smaller number of data values that you believe are representative of a much larger group of data values.
The rest of this section describes how to use these tools. In any of the dialog boxes for these individual Analysis ToolPak functions, you’ll need to provide the following information:
Input Range In this box, type the cell address for the group of data values that you want to analyze.
Grouped By (part of the Descriptive Statistics and Rank and Percentile dialog boxes). Select the Columns option if your data is grouped by columns; select the Rows option if your data is grouped by rows.
Labels In First Row (simply the Labels option in the Histogram and Sampling dialog boxes). Select this option if the first row or column of your input range contains field names.
Chart Output (part of the Histogram and Moving Average dialog boxes). Select this option to display a chart along with the resulting data table.
Output Range Select this option to place the results in the active worksheet. In the corresponding text box, enter the cell address of the cell where you want the results to start. Excel will display a message before it tries to place results in any cell that already contains a data value.
New Worksheet Ply Use this option to place the results in a new worksheet in the active workbook. Type the name of the worksheet in the corresponding box.
New Workbook Use this option to place the results in a new workbook.
To use the Descriptive Statistics tool, on the Tools menu, click Data Analysis. Click Descriptive Statistics, and then click OK. The Descriptive Statistics dialog box is shown in Figure 3-23.
Figure 3-23: The Descriptive Statistics dialog box.
In addition to the input range and other standard options, use the following options in the dialog box depending on the information you’re analyzing:
Summary Statistics Select this option if you want Excel to produce the mean, standard error, median, mode, standard deviation, variance, kurtosis (a measure of how data is distributed), skewness (the degree of data distribution), and other related statistics for the selected group of cells.
Confidence Level For Mean Select this option if you want to measure the confidence level for the mean summarization. For example, if the mean for a cell group of 10 values is 50, the standard deviation is 38.7, and you want a 95 percent confidence level, the result is 24. This means that 95 percent of the time, any single data value in the cell group should fall between the range of data values 26 and 74.
Kth Largest Use this option to include the kth largest value in the cell group in the result. In the corresponding box, type the number to use for k. If you type 3, for example, the result is the third largest value in the cell group.
Kth Smallest Select this option to include the kth smallest value in the cell group. If you type 5 for example, the results will show the fifth smallest value in the group of cells.
Your Turn
In this exercise, you will run the Descriptive Statistics tool on the total nights booked for all customers for all months.
Open Hotel.xls. If it is already open, close it (do not save it) and open it again.
On the Tools menu, click Data Analysis, click Descriptive Statistics, and then click OK.
Click the Input Range box, and then select cells C2 through C313.
In the Grouped By area, click Columns.
Click the New Worksheet Ply option.
Select the Summary Statistics and Confidence Level For Mean check boxes.
Check the Kth Largest option, and then type 5 in the adjacent box.
Check the Kth Smallest option, and then type 7 in the adjacent box. Compare your results to Figure 3-24.
Figure 3-24: Completing the Descriptive Statistics dialog box.
Click OK, and then compare your results to Figure 3-25.
Figure 3-25: Results of running the Descriptive Statistics tool.
To use the Histogram tool, click Data Analysis on the Tools menu. Click Histogram, and then click OK. The Histogram dialog box is shown in Figure 3-26.
Figure 3-26: The Histogram dialog box.
In addition to the input range and other standard options, use the following options in the dialog box depending on the information you’re analyzing:
Bin Range In the Bin Range box, type the cell address for the group of data values (preferably in ascending order) that you want to use for the histogram grouping bins. If you leave this box blank, Excel will create a set of evenly distributed bins using the data’s minimum and maximum values.
Pareto (Sorted Histogram) Select this option to present the histogram grouping bins in descending order of frequency.
Cumulative Percentage Select this option to display a histogram column for cumulative percentages and to include a cumulative percentage line in the histogram.
Your Turn
You want to generate a histogram to display the frequency of nights booked per month for all of the preferred customers.
Open Hotel.xls. If the file is already open, close it (do not save it) and open it again.
In cell F1, type the number 1. In cell F2, type 2, and so on through the number 20 in cell F20. The values you enter will be used as the bins in the histogram.
On the Tools menu, click Data Analysis.
Click Histogram, and then click OK.
Click the Input Range box, and then select cells C2 through C313.
Click the Bin Range box, and then select cells F1 through F20.
Click the Output Range option, click the Output Range box, and then click cell H1.
Select the Pareto (Sorted Histogram) check box.
Select the Chart Output check box, and then compare your results to Figure 3-27.
Figure 3-27: Completing the Histogram dialog box.
Click OK and compare your results to Figure 3-28.
Figure 3-28: The histogram produced for Hotel.xls.
The number of nights stayed per month that recur most often are 1, 11, 10, and 14.
To use the Moving Average tool, click Data Analysis on the Tools menu. Click Moving Average, and then click OK. The Moving Average dialog box appears, as shown in Figure 3-29.
Figure 3-29: The Moving Average dialog box.
For a moving average, the output range must be on the same worksheet as the input range. For this reason, the New Worksheet Ply and New Workbook options are not available. In addition to the input range and other standard options, the Moving Average dialog box includes two items that you set to perform your analysis:
Interval For this option, enter the number of values that you want to include in the moving average. The default interval is 3.
Standard Errors Use this check box to have Excel generate a two- column table, showing standard error values in the right column.
Your Turn
You want to forecast what room service charges might be for the next year by using a three-month moving average of preferred customer Abercrombie’s room service charges during the past year.
Open Hotel.xls. If it is already open, close it (do not save it) and open it again.
On the Tools menu, click Data Analysis.
Click Moving Average, and then click OK.
Click the Input Range box, and then select cells D2 through D13 in the Total Room Service column.
In the Interval box, type 3.
Click the Output Range box, and then click cell F1.
Select the Chart Output check box, compare your results to Figure 3-30, and then click OK. Compare the chart output to Figure 3-31.
Figure 3-30: Completing the Moving Average dialog box.
Figure 3-31: The chart showing actual and forecasted room service charges.
The numbers starting in cell F1 next to the chart are a result of running the moving average for each month, using the interval you entered. For example, for March, an average of the room service charges for January, February, and March is calculated. For April, an average of the room service charges for February, March, and April is calculated. You don’t see any moving average calculations for January or February because the data for the previous months (November and December of the preceding year) is not available.
It’s hard to say with certainty what preferred customer Abercrombie’s monthly room charges will be next year. However, overall the trend is downward in the last quarter, which could carry over into the next year. Hopefully, you will take some sort of action with this customer to turn things around.
To use the Rank and Percentile tool, click Data Analysis on the Tools menu, click Rank And Percentile, and then click OK. Figure 3-32 shows the Rank And Percentile dialog box.
Figure 3-32: The Rank And Percentile dialog box.
Your Turn
You want to give all preferred customers who spent monthly room service charges in the 90th percentile or higher a special thank-you gift.
Open Hotel.xls. If the file is already open, close it (do not save it) and open it again.
On the Tools menu, click Data Analysis, click Rank And Percentile, and then click OK.
Click the Input Range box, and then select cells D2 through D313 in the Total Room Service column.
Click the Output Range option, click in the Output Range box, click cell F1, and compare your results to Figure 3-33.
Figure 3-33: Completing the Rank And Percentile dialog box.
Click OK, and compare the output to Figure 3-34. The 90th percentile cut-off point is $1,153.40 in any one month. (The numbers in the Point column refer to an item’s order in the list.)
Figure 3-34: All the values above this point are in the 90th percentile.
To use the Sampling tool, click Data Analysis on the Tools menu, click Sampling, and then click OK. The Sampling dialog box appears, as shown in Figure 3-35.
Figure 3-35: The Sampling dialog box.
The sampling method options include the following:
Periodic Use this option if you want to sample the kth value in the group of cells you select for input. Enter the value of k in the Period box.
Random Use this option if you want to sample values randomly. Type the number of sample values in the Number Of Samples box.
Your Turn
In this exercise, you’ll determine the average amount of monthly room charges made by each of your preferred customers. You’ll start by averaging about 10 percent of the data values to see whether they are representative of the 310 data values in the list.
Open Hotel.xls. If it is already open, close it (do not save the file) and open the file again.
On the Tools menu, click Data Analysis, click Sampling, and then click OK.
Click the Input Range box, and then select cells D2 through D313 in the Total Room Service column.
Click the Random option, and in the Number Of Samples box, type 31.
Click the Output Range option, and then click cell F1. Compare your results to Figure 3-36.
Click OK and compare your output to Figure 3-37. Because you’ve used random samples, your results will vary.
Figure 3-36: Completing the Sampling dialog box for a random number of samples.
Figure 3-37: Running the Sampling tool for a random number of samples.
With the randomly picked values selected, right-click anywhere in the status bar, and click Average.
What is the average monthly room service charge? Compare this amount to the average monthly room service charge of $473.35 for all preferred customers. Were the values picked representative? Now try a fixed number of data values.
On the Tools menu, click Data Analysis, click Sampling, and then click OK.
Click the Periodic option, and in the Period box, type 12.
Click the Output Range box, and click cell G1.
Click OK, and compare the output to Figure 3-38.
Figure 3-38: Running the Sampling tool for a periodic number of samples.
With the new sample values selected, right-click in the status bar and click Average.
The average monthly room service charge based on these records is $522.37. Again, compare this to the average monthly room service charge of $473.35 for all of the preferred customers. Perhaps every 12th value that was sampled was not representative either. Experiment with higher values for both the number of random samples and the number of periodic values to see whether you can get closer to $473.35.
Putting It Together
You can use a number of Analysis ToolPak tools together to perform several analyses at once and spot trends. In the following exercise, you will compare the average of monthly room service charges to their rank and percentile.
Open Hotel.xls. If the file is open already, close it (do not save the file) and reopen the file again.
On the Tools menu, click Data Analysis.
Click Descriptive Statistics, and then click OK.
Click the Input Range box, and then select cells D2 through D313 in the Total Room Service column.
Click the Columns and New Worksheet Ply options.
Select the Summary Statistics check box, and then click OK.
Click the worksheet labeled Sheet1.
On the Tools menu, click Data Analysis.
Click Rank And Percentile, and then click OK.
Click the Input Range box, and then select cells D2 through D313 in the Total Room Service column.
Click the Columns and New Worksheet Ply options, and then click OK.
Compare the Descriptive Statistics worksheet’s Mean value (473.3455, or $473.35) to the values in the Rank And Percentile worksheet’s Column1 and Percent columns.
You should notice that the mean (average) of $473.35 falls between the 60.1 and 60.4 percentiles. This is more than 10 percentage points higher than the median (midpoint), which is between $380.36 and $387.84 in the Rank And Percentile worksheet’s Column1 column, or $384.10 in the Descriptive Statistics worksheet’s Median row value.
Is this percentage difference good or bad for business? Although you have some big spenders among your customers, this could actually be bad for business in the long run. Here’s why: if the average was closer to the midpoint, this would most likely mean that most customers were making steady, predictable, evenly distributed room service orders. However, because the average is somewhat higher than the midpoint, this means that many of the preferred customers are making either large or small, less predictable room service orders, which could be difficult to plan for in the long run, especially if those big spenders stop purchasing. How do you correct this? The answer comes in knowing how the hotel chain goes about influencing its preferred customers’ purchasing habits and what it can do to keep the big spenders coming back or making the small spenders purchase more room service.