COVARIANCE


All of the summary measures to this point involve a single variable. It is also useful to summarize the relationship between two variables . Specifically, we would like to summarize the type of behavior often observed in a scatterplot. Two such measures are covariance and correlation. We will discuss them briefly here and in more depth in later chapters. Each measures the strength (and direction) of a linear relationship between two numerical variables. Intuitively, the relationship is "strong" if the points in a scatterplot cluster tightly around some straight line. If this straight line rises from left to right, then the relationship is "positive" and the measures are positive numbers. If it falls from left to right, then the relationship is "negative" and the measures are negative numbers .

First, it is important to realize that if we want to measure the covariance or correlation between two variables X and Y ” indeed, even if we just want to form a scatterplot of X vs. Y ” then X and Y must be "paired" variables. That is, they must have the same number of observations, and the X and Y values for any observation should be naturally paired. For example, each observation could be the height and weight for a particular person, the time in a store and the amount of money spent by a particular customer, and so on.

With this in mind, let X i and Y i be the paired values for observation i, and let n be the number of observations. Then the covariance between X and Y, denoted by Cov (X, Y), is given by the formula

You probably will not ever have to use this formula directly ” most software packages have a built-in COVAR function that does it for you ” but the formula does indicate what covariance is all about. It is essentially an average of products of deviations from means. If X and Y vary in the same direction, then when X is above (or below) its mean, Y will also tend to be above (or below) its mean. In either case, the product of deviations will be positive ” a positive times a positive or a negative times a negative ” so the covariance is positive. The opposite is true when X and Y vary in opposite directions. Then, the covariance is negative.

The limitation of covariance as a descriptive measure is that the units in which X and Y are measured affect it. For example, we can inflate the covariance by a factor of 1000 simply by measuring X in dollars rather than in thousands of dollars. To remedy this problem the correlation is used.

When there are more than two variables in a data set, it is often useful to create a table of covariances and/or correlations . Each value in the table then corresponds to a particular pair of variables.




Six Sigma and Beyond. Statistics and Probability
Six Sigma and Beyond: Statistics and Probability, Volume III
ISBN: 1574443127
EAN: 2147483647
Year: 2003
Pages: 252

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net