Chapter 46: Using Correlations to Summarize Relationships


Overview

  • How are monthly stock returns for Microsoft, GE, Intel, GM, and Cisco related?

Trend curves are a great help in understanding how two variables are related. Often, however, we need to understand how more than two variables are related. Looking at the correlation between any pair of variables can provide insights into how multiple variables move up and down in value together.

The correlation (usually denoted by r) between two variables (call them x and y) is a unit-free measure of the strength of the linear relationship between x and y. The correlation between any two variables is always between –1 and +1. Although the exact formula used to compute the correlation between two variables isn’t very important, being able to interpret the correlation between the variables is.

A correlation near +1 means that x and y have a strong positive linear relationship. That is, when x is larger than average, y tends to be larger than average, and when x is smaller than average, y also tends to be smaller than average. When a straight line is applied to the data, there will be a straight line with a positive slope that does a good job of fitting the points. As an example, for the data shown in Figure 46-1 (x=units produced and y=monthly production cost), x and y have a correlation of +0.90.

image from book
Figure 46-1: Correlation near +1, indicating that two variables have a strong positive linear relationship

On the other hand, a correlation near –1 means that there is a strong negative linear relationship between x and y. That is, when x is larger than average, y tends to be smaller than average, and when x is smaller than average, y tends to be larger than average. When a straight line is applied to the data, the line will have a negative slope that does a good job of fitting the points. As an example, for the data shown in Figure 46-2 (x= the price charged for a product and y= product demand), x and y have a correlation of –0.94.

image from book
Figure 46-2: Correlation near –1, indicating that two variables have a strong negative linear relationship

A correlation near 0 means that x and y have a weak linear relationship. That is, knowing whether x is larger or smaller than its mean tells you little about whether y will be larger or smaller than its mean. Figure 46-3 shows a graph of the dependence of unit sales (y) on years of sales experience (x). Years of experience and unit sales have a correlation of 0.003. In our data set, the average experience is 10 years. We see that when a person has more than 10 years of sales experience, his or her sales can be either low or high. We also see that when a person has fewer than 10 years of sales experience, sales can be low or high. Although experience and sales have little or no linear relationship, there is a strong non-linear relationship (see the fitted curve) between years of experience and sales. Correlation does not measure the strength of non-linear relationships.

image from book
Figure 46-3: Correlation near 0, indicating a weak linear relationship between two variables

  • How are monthly stock returns for Microsoft, GE, Intel, GM, and Cisco related?

  • The file Stockcorrel.xlsx (see Figure 46-4) shows monthly stock returns for Microsoft, GE, Intel, GM, and Cisco during the 1990s. We can use correlations to try to understand how movements in these stocks are related.

    image from book
    Figure 46-4: Monthly stock returns during the 1990s

  • To find the correlations between each pair of stocks, click Data Analysis in the Analysis group on the Data tab, and then select the Correlation option. You must install the Analysis ToolPak (as described in Chapter 36) before you can use this feature. Click OK, and then fill in the Correlation dialog box as shown in Figure 46-5.

    image from book
    Figure 46-5: Correlation dialog box

  • The easiest way to enter the input range is to select the upper-left cell of the data range (B51) and then press Ctrl+Shift+Right Arrow, followed by Ctrl+Shift+Down Arrow. Check the Labels In First Row box if the first row of the input range contains labels. I entered cell H52 as the upper-left cell of our output range. After clicking OK, we see the results shown in Figure 46-6.

    image from book
    Figure 46-6: Stock return correlations

  • We find, for example, that the correlation between Cisco and Microsoft is 0.513, whereas the correlation between GM and Microsoft is 0.07. The analysis shows that returns on Cisco, Intel, and Microsoft are most closely tied together. Because the correlation between each pair of these stocks is around 0.5, these stocks exhibit a moderate positive relationship. In other words, if one stock does better than average, it is likely (but not certain) that the other stocks will do better than average. Because Cisco, Intel, and Microsoft stock returns are closely tied to technology spending, their fairly strong correlation is not surprising. We find also that the monthly returns on Microsoft and GM are virtually uncorrelated. This relationship indicates that when Microsoft stock does better than average, we really can’t tell whether GM stock will do better or worse than average. Again, this trend is not surprising because GM is not really a high-tech company and is more susceptible to the vagaries of the business cycle.

Filling in the correlation matrix   As you can see in this example, Microsoft Office Excel 2007 left some entries in the correlation matrix blank. For example, the correlation between Microsoft and GE (which is equal to the correlation between GE and Microsoft) is omitted. If you want to fill in the entire correlation matrix, right-click the matrix, and then click Copy. Right-click a blank portion of the worksheet, and then click Paste Special. In the Paste Special dialog box, select Transpose. This flips the data on its side. Now right-click the flipped data, and click Copy. Right-click the original correlation matrix, and click Paste Special again. In the Paste Special dialog box, check the Skip Blanks box, and then click OK. The transposed data is copied to the original matrix, but does not copy the blanks from the transposed data. The full correlation matrix is shown in Figure 46-7.

image from book
Figure 46-7: Complete correlation matrix

Using the CORREL function   As an alternative to using the Correlation option of the Analysis Toolpak, you can use the CORREL function. For example, entering in cell I49 the formula CORREL(E52:E181,F52:F181) confirms that the correlation between monthly returns on Cisco (shown in column F) and GM (shown in column E) is 0.159.

Relationship between correlation and R2   In Chapter 45, “Estimating Straight Line Relationships,” we found an R2 value for units produced and monthly operating cost of 0.688. How is this value related to the correlation between units produced and monthly operating costs? The correlation between two sets of data is simply

image from book

for the trendline, where we choose the sign for the square root to be the same as the sign of the slope of the trendline. Thus the correlation between units produced and monthly operating cost for our Chapter 41, “Consolidating Data,” data is

image from book

Correlation and regression towards the mean   You have probably heard the phrase “regression towards the mean.” Essentially, this means that the predicted value of a dependent variable will be in some sense closer to its average value than the independent variable. More precisely, suppose we try to predict a dependent variable y from an independent variable x. If x is k standard deviations above average, then our prediction for y will be r×k standard deviations above average (here, r=correlation between x and y). Because r is between -1 and +1 this means that y is fewer standard deviations away from the mean than x. This is the real definition of “regression towards the mean.” See Problem 5 for an interesting application of the concept of regression towards the mean.




Microsoft Press - Microsoft Office Excel 2007. Data Analysis and Business Modeling
MicrosoftВ® Office ExcelВ® 2007: Data Analysis and Business Modeling (Bpg -- Other)
ISBN: 0735623961
EAN: 2147483647
Year: 2007
Pages: 200

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net