8.5. Advanced Statistics
Excel offers a smorgasbord of number crunching tools; in fact, it has more than 50 advanced statistical functions that this chapter hasn't even mentioned. Many of these functions will appeal only to those who already understand concepts like standard deviation, theoretical distributions, and dimensionless Pearson product moment correlation coefficients (a subject of keen interest to all six people in the world who currently use such numbers ).
But even if you don't understand any of those concepts (and if you're still wondering whether your inner mathematician may yet emerge), Excel offers two advanced statistical functionsFORECAST( ) and TREND( )that give you a taste of all that awaits the advanced statistician. While both FORECAST( ) and TREND( ) rely on some complex, behindthescenes computations , using them is quite easy, even for nonspecialists.
The FORECAST( ) and TREND( ) functions predict missing values based on existing data. These "missing" values may represent experimental data you didn't collect, or future sales numbers for dates that haven't occurred yet. Excel examines the data you have so far (the more the better), and uses a mathematical technique known as the method of least squares to generate a trend line. Excel then uses this trend line to generate the other values you've requested .
Both FORECAST( ) and TREND( ) work similarly and produce the same answers. The difference is that FORECAST( ) is designed to calculate a single value. The TREND( ) function, on the other hand, is an array function that gives you multiple values. It works a little like the FREQUENCY( ) function described on Section 8.2.8.
The FORECAST( ) and TREND( ) analyses work by trying to spot a relationship between two sets of numbers, designated as the xaxis and yaxis. To predict a new value in the FORECAST( ) function, for instance, you supply the function with a new x value. It then uses the known relationship between the x and y values to guess the corresponding y value. The syntax is as follows :
FORECAST(x, known_x_values, known_y_values)
For example, imagine you're trying to study the relationship between the number of product inquiries received during one month and the actual sales recorded in the next month. You have plenty of historical data, and you'd like to use that to predict how many sales you should expect next month (based on the number of inquiries you've received during the current month). The existing inquiry numbers are the first set of known values, so they make up the xaxis values (the cells A2:A9, as shown in Figure 89). The second set of known values is the actual recorded sales in the following month (the yaxis values, shown in cells B2:B9). You want to predict a new yaxis value for expected sales based on the current inquiries ( cell A12). Here's the final formula:
=FORECAST(A12,B2:B9,A2:A9)

You can also use the TREND( ) function to generate expected sales based on multiple inquiry numbers (as opposed to the single sales value that FORECAST( ) generates). Using the previous example, you could theoretically try to predict sales for the next six months. In truth, that approach doesn't really make sense because, while you have previous sales data to draw from, you don't know how many inquiries you'll receive in future months.
A better idea is to use TREND( ) to project future sales based on a slightly different set of data. You still use previous sales, but rather than using inquiries as your second known factor, you use the progression of time . In effect, you're asking TREND( ) if your sales over periods 1, 2, and 3 were 100, 150, and 200, what would sales look like over periods 4, 5, 6? Without further ado, here's what you need to generate new, predicted sales figures using the TREND( ) function:
Make sure your worksheet has all the data your TREND( ) formula needs.
As shown in Figure 810, you need three groups of numbers: known yvalues (historical sales figures), known xvalues (previous time periods), and the new xvalues (known, new time periods). What you want the formula to figure out are the new yvalues (that is, the sales figures).

Select the range of cells where you want to display the predicted values.
In this example, four new sales figures appear in cells C11:C14.
Enter the array formula using the TREND( ) function.
Here's the formula used in this example:
=TREND(C2:C9, A2:A9, A11:A14)
When you've finished typing in the formula, hit Ctrl+Shift+Enter.
You can't simply press Enter, as that will only enter the formula into the first selected cell. Instead, you need to press Ctrl+Shift+Enter to copy the array formula into all the result cells. You can tell that an array formula is at work by the curly braces {} in the formula bar, which Excel adds automatically.
Excel provides many more advanced statistical functions. For the full list, refer to the Statistical group in the Insert Function dialog box (select Insert Function). Table 83 outlines some of the more commonly used functions in this group.
Power User's Clinic An Advanced Engineering Toolkit 
When you turn on the Analysis ToolPak addin (as explained on Section 7.2.4), you'll find that Excel creates a new category for some very specialized mathematical functions named Engineering. These functions tackle two primary problem areas: imaginary, or complex numbers (numbers that incorporate i , the square root 1); and number conversions between binary, decimal, and octal representations. If you need to use these functions for advanced mathematical spreadsheets, you'll find them quite understandable. On the other hand, if you don't know the difference between a complex conjugate and a vector, they probably won't be of much use. Excel doesn't corner the market on advanced mathematical functions. Many more specialized functions are available if you install addins from other vendors . And in Chapter 26, you'll learn how you can write the custom logic for your own functions using Visual Basic for Applications (VBA). 
Function  Syntax  Description 

AVEDEV( )  AVEDEV(range)  Calculates the average deviation, which is the average of how much a series of data points deviate from the mean. 
CORREL( )  CORREL(range1, range2)  Calculates the correlation coefficient between two sets of data, which is a measure of how the sets are related . The correlation coefficient is always between 1 and 1 (the closer to 1 the stronger the correlation, while 0 means no correlation at all). 
COVAR( )  COVAR(range1, range2)  Calculates the covariance, which is the average of the products of deviation for each data point pair. Covariance is used to determine if a relationship exists between data sets. 
LINEST( )  LINEST(known_y, known_x)  Calculates the bestfit line using the least squares method. This is an array function that returns both the slope and the yintercept. 
NORMDIST( )  NORMDIST(x, mean, standard_dev)  Calculates the normal cumulative distribution for a specified mean and standard deviation. 
NORMINV( )  NORMINV(probability, mean, standard_deviation)  Calculates the inverse of the normal cumulative distribution for a specified mean and standard deviation. 
SLOPE( )  SLOPE(known_y, known_x)  Calculates a regression line through the supplied data points, and returns its slope. 
STDEV( )  STDEV(range)  Calculates the standard deviation of a data set. Blank and text cells are ignored. 
STDEVA( )  STDEVA(range)  Calculates the standard deviation of a data set. Text cells are used as 0 values but blank cells are ignored. 
VAR( )  VAR(range)  Estimates the variance of a sample population. Text cells are used as values but blank cells are ignored. 
VARA( )  VARA(range)  Estimates the variance of a sample population. Text cells are used as values but blank cells are ignored. 