Section 8.8. Calculating Correlation

   

8.8 Calculating Correlation

8.8.1 Problem

You want to calculate the correlation between two samples. For example, you want to calculate how similar the light-bulb sales patterns are for two different years .

8.8.2 Solution

The query in the following example uses the formula for calculating correlation coefficients shown earlier in this chapter. It does this for the years 1997 and 1998.

 SELECT     (COUNT(*)*SUM(x.Sales*y.Sales)-SUM(x.Sales)*SUM(y.Sales))/(    SQRT(COUNT(*)*SUM(SQUARE(x.Sales))-SQUARE(SUM(x.Sales)))*    SQRT(COUNT(*)*SUM(SQUARE(y.Sales))-SQUARE(SUM(y.Sales))))     correlation FROM BulbSales x JOIN BulbSales y ON x.month=y.month WHERE x.Year=1997 AND y.Year=1998  correlation                                            -----------------------------------------------------  0.79 

The correlation calculated is 0.79, which means that the sales patterns between the two years are highly correlated or, in other words, very similar.

8.8.3 Discussion

The solution query implements the formula shown earlier in the chapter for calculating a correlation coefficient. The solution query shown here is an example of how you implement a complex formula directly in SQL. To aid you in making such translations from mathematical formulas to Transact-SQL expressions, Table 8-4 shows a number of mathematical symbols together with their corresponding Transact-SQL functions.

Table 8-4. Mathematical symbols related to Transact-SQL functions

Symbol

Formula

Description

a

ABS(a)

Absolute value of a

e a

EXP(a)

Exponential value of a

a 2

SQUARE(a)

Square of a

a n

POWER(a,n)

n th power of a

n

COUNT(*)

Sample size

figs/sqrta.gif

SQRT(a)

Square root of a

figs/suma.gif

SUM(a)

Sum of all cases in a sample

figs/avga1.gif or figs/avga2.gif

AVG(a)

Average of all cases in a sample the mean

s

STDDEV(x)

Standard deviation of a sample

figs/stddevpx.gif

STDDEVP(x)

Standard deviation of a population

s 2

VAR(x)

Variance of a sample

figs/varpx.gif

VARP(X)

Variance of a population

To write the query shown in this recipe, take the formula for correlation coefficient shown earlier in this chapter, and use the Table 8-4 to translate the mathematical symbols in that formula into SQL functions for your query.

   


Transact-SQL Cookbook
Transact-SQL Cookbook
ISBN: 1565927567
EAN: 2147483647
Year: 2005
Pages: 152

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net