Problem

You're working with a multivariable problem and want to compute the correlation coefficient between variables.

Solution

Use Excel's built-in functions `CORREL` or `PEARSON`, or use the Correlation tool available in the Analysis ToolPak.

Discussion

In Recipe 8.4, I show you how to perform linear regression where the dependent variable depends on more than one independent variable. The example I discuss in that recipe involves six independent variables. This is a typical example where it's prudent to assess the correlation between independent variables before conducting the regression analysis. This way you can avoid using highly correlated (or colinear) independent variables, which can give you trouble.

Excel provides two built-in functions that allow you to compute the Pearson product-moment correlation coefficients between variables. These functions are `CORREL` and `PEARSON` and, according to Excel's help documents, they perform exactly the same calculation. (I've no idea why there are two functions to do the same thing.)

Figure 5-8 shows the data from the multiple linear regression example I discuss in Chapter 8. For now, I'll show you how to examine the correlation between independent variablesthe x variablesusing Excel's correlation functions as well as the Analysis ToolPak.

To compute the correlation coefficient between variables `x1` and `x2`, you can use the cell formula `=CORREL(E7:E22,F7:F22)` or `=PEARSON(E7:E22,F7:F22)`. They both produce the same results, which is 0.991589. Clearly, these two variables are positively correlated and in a regression analysis, for example, you would probably exclude one of these variables from the curve fit. You can compute correlation coefficients between the other variables in a similar manner.

Figure 5-8. Multiple linear regression data

To save some time when dealing with numerous variables, you might consider using the Correlation tool available in the Analysis ToolPak. Select Tools images/U2192.jpg border=0> Data Analysis from the main menu bar to open the Data Analysis dialog box shown in Figure 5-9.

Figure 5-9. Data Analysis Correlation tool

Select the Correlation tool and press OK to open the Correlation tool dialog box shown in Figure 5-10.

Figure 5-10. Correlation tool dialog box

The Correlation tool allows you to select multiple datasets at once; it will compute the correlation coefficients between all the datasets. For example, in Figure 5-10 I entered cells E6 through J22 in the Input Range. This cell range includes all of the x variables shown in Figure 5-8. I included the header labels in the input range, so I checked the "Labels in first row" option. Further, I specified cell M7 (on the same worksheet) as the starting cell where I'd like the output displayed. Figure 5-11 shows the results.

Figure 5-11. Correlation results

This table presents the correlation coefficient between each given variable and every other variable. It also correlates each variable with itself, which of course results in a correlation coefficient of 1 (as you can see in Figure 5-11).

Sometimes it's useful to compute *coefficients of determination* instead of correlation coefficients alone. To compute a coefficient of determination, simply square the correlation coefficient. For example, if cell N9 contains the correlation coefficient between variables `x1` and `x2`, then the formula `=N9^2` would yield the coefficient of determination.

There are other types of correlation coefficients common in statistical analysis. For example, *Phi*, *rank biserial*, *point biserial*, and *Spearman rank* coefficients are common, in addition to the Pearson coefficient. Excel does not have built-in support for these other correlation coefficients; however, you can certainly set up a spreadsheet to calculate these coefficients.

For example, you can readily prepare a spreadsheet to compute the Spearman rank correlation coefficient between variables `x1` and `x2` from the previous example. The Spearman coefficient requires you to convert the raw data to ranks first. You can achieve this conversion using Excel's built-in `RANK` function (see Recipe 5.5). Figure 5-12 shows a simple spreadsheet I set up to compute the Spearman rank correlation coefficient between the variables `x1` and `x2` from the previous example.

The first two columns in the table contain the raw data. The third and fourth columns contain the rank of each data point. I used the `RANK` function for these calculations. For example, cell E41 contains the cell formula `=RANK(C41,$C$41:$C$56,1)`, which computes the rank for the first `x1` value, assuming the data are sorted in ascending order (see Recipe 5.5).

The formula for computing the Pearson rank correlation coefficient is:

Here, d is the difference in ranks between corresponding values for both `x1` and `x2`, and N is the number of data points. You can compute N using the `COUNT` function. Cell D58 contains the formula `=COUNT(D41:D56)` which returns a value of 16 for the number of points.

To compute d2, I set up another column containing formulas like `=(E41-F41)^2`. The results are shown in the last column of the table shown in Figure 5-12.

Cell G57 computes the sum of d2 values using the formula `=SUM(G41:G56)`. Cell G59 computes the N(N2 - 1) term using the formula `=(D58*(D58^2-1))`. And finally, the correlation coefficient is computed in cell G60 using the formula `=1-6*G57/G59`. The resulting correlation coefficient is 0.997.

Figure 5-12. Spearman rank correlation coefficient calculation

See Also

Take a look at Recipe 8.4 to see the resulting regression model for this example dataset. Chapter 8 also discusses coefficients of determination in the context of least-squares curve fitting.

Read Recipe 5.5 to learn more about `RANK` and other ranking functions.

Using Excel

- Introduction
- Navigating the Interface
- Entering Data
- Setting Cell Data Types
- Selecting More Than a Single Cell
- Entering Formulas
- Exploring the R1C1 Cell Reference Style
- Referring to More Than a Single Cell
- Understanding Operator Precedence
- Using Exponents in Formulas
- Exploring Functions
- Formatting Your Spreadsheets
- Defining Custom Format Styles
- Leveraging Copy, Cut, Paste, and Paste Special
- Using Cell Names (Like Programming Variables)
- Validating Data
- Taking Advantage of Macros
- Adding Comments and Equation Notes
- Getting Help

Getting Acquainted with Visual Basic for Applications

- Introduction
- Navigating the VBA Editor
- Writing Functions and Subroutines
- Working with Data Types
- Defining Variables
- Defining Constants
- Using Arrays
- Commenting Code
- Spanning Long Statements over Multiple Lines
- Using Conditional Statements
- Using Loops
- Debugging VBA Code
- Exploring VBAs Built-in Functions
- Exploring Excel Objects
- Creating Your Own Objects in VBA
- VBA Help

Collecting and Cleaning Up Data

- Introduction
- Importing Data from Text Files
- Importing Data from Delimited Text Files
- Importing Data Using Drag-and-Drop
- Importing Data from Access Databases
- Importing Data from Web Pages
- Parsing Data
- Removing Weird Characters from Imported Text
- Converting Units
- Sorting Data
- Filtering Data
- Looking Up Values in Tables
- Retrieving Data from XML Files

Charting

- Introduction
- Creating Simple Charts
- Exploring Chart Styles
- Formatting Charts
- Customizing Chart Axes
- Setting Log or Semilog Scales
- Using Multiple Axes
- Changing the Type of an Existing Chart
- Combining Chart Types
- Building 3D Surface Plots
- Preparing Contour Plots
- Annotating Charts
- Saving Custom Chart Types
- Copying Charts to Word
- Recipe 4-14. Displaying Error Bars

Statistical Analysis

- Introduction
- Computing Summary Statistics
- Plotting Frequency Distributions
- Calculating Confidence Intervals
- Correlating Data
- Ranking and Percentiles
- Performing Statistical Tests
- Conducting ANOVA
- Generating Random Numbers
- Sampling Data

Time Series Analysis

- Introduction
- Plotting Time Series Data
- Adding Trendlines
- Computing Moving Averages
- Smoothing Data Using Weighted Averages
- Centering Data
- Detrending a Time Series
- Estimating Seasonal Indices
- Deseasonalization of a Time Series
- Forecasting
- Applying Discrete Fourier Transforms

Mathematical Functions

- Introduction
- Using Summation Functions
- Delving into Division
- Mastering Multiplication
- Exploring Exponential and Logarithmic Functions
- Using Trigonometry Functions
- Seeing Signs
- Getting to the Root of Things
- Rounding and Truncating Numbers
- Converting Between Number Systems
- Manipulating Matrices
- Building Support for Vectors
- Using Spreadsheet Functions in VBA Code
- Dealing with Complex Numbers

Curve Fitting and Regression

- Introduction
- Performing Linear Curve Fitting Using Excel Charts
- Constructing Your Own Linear Fit Using Spreadsheet Functions
- Using a Single Spreadsheet Function for Linear Curve Fitting
- Performing Multiple Linear Regression
- Generating Nonlinear Curve Fits Using Excel Charts
- Fitting Nonlinear Curves Using Solver
- Assessing Goodness of Fit
- Computing Confidence Intervals

Solving Equations

- Introduction
- Finding Roots Graphically
- Solving Nonlinear Equations Iteratively
- Automating Tedious Problems with VBA
- Solving Linear Systems
- Tackling Nonlinear Systems of Equations
- Using Classical Methods for Solving Equations

Numerical Integration and Differentiation

- Introduction
- Integrating a Definite Integral
- Implementing the Trapezoidal Rule in VBA
- Computing the Center of an Area Using Numerical Integration
- Calculating the Second Moment of an Area
- Dealing with Double Integrals
- Numerical Differentiation

Solving Ordinary Differential Equations

- Introduction
- Solving First-Order Initial Value Problems
- Applying the Runge-Kutta Method to Second-Order Initial Value Problems
- Tackling Coupled Equations
- Shooting Boundary Value Problems

Solving Partial Differential Equations

- Introduction
- Leveraging Excel to Directly Solve Finite Difference Equations
- Recruiting Solver to Iteratively Solve Finite Difference Equations
- Solving Initial Value Problems
- Using Excel to Help Solve Problems Formulated Using the Finite Element Method

Performing Optimization Analyses in Excel

- Introduction
- Using Excel for Traditional Linear Programming
- Exploring Resource Allocation Optimization Problems
- Getting More Realistic Results with Integer Constraints
- Tackling Troublesome Problems
- Optimizing Engineering Design Problems
- Understanding Solver Reports
- Programming a Genetic Algorithm for Optimization

Introduction to Financial Calculations

- Introduction
- Computing Present Value
- Calculating Future Value
- Figuring Out Required Rate of Return
- Doubling Your Money
- Determining Monthly Payments
- Considering Cash Flow Alternatives
- Achieving a Certain Future Value
- Assessing Net Present Worth
- Estimating Rate of Return
- Solving Inverse Problems
- Figuring a Break-Even Point

Index

Excel Scientific and Engineering Cookbook (Cookbooks (OReilly))

ISBN: 0596008791

EAN: 2147483647

EAN: 2147483647

Year: N/A

Pages: 206

Pages: 206

Authors: David M Bourg

Similar book on Amazon

- Challenging the Unpredictable: Changeable Order Management Systems
- ERP System Acquisition: A Process Model and Results From an Austrian Survey
- Context Management of ERP Processes in Virtual Communities
- Data Mining for Business Process Reengineering
- Intrinsic and Contextual Data Quality: The Effect of Media and Personal Involvement

Flylib.com © 2008-2017.

If you may any questions please contact us: flylib@qtcs.net

If you may any questions please contact us: flylib@qtcs.net