# Section 5.28. Generating Random Numbers

### 5.27. Finding a Correlation Coefficient

The correlation coefficient is one of the simplest and most universally useful statistical measures. It is a measure of the "linearity" of a set of x-y pairs, ranging from -1.0 (complete negative correlation) to +1.0 (complete positive correlation).

We compute this using the mean and sigma (standard deviation) functions defined previously in sections 5.25 and 5.26. For an explanation of this tool, consult any statistics text.

The following version assumes two arrays of numbers (of the same size):

`def correlate(x,y)   sum = 0.0   x.each_index do |i|     sum += x[i]*y[i]   end   xymean = sum/x.size.to_f   xmean  = mean(x)   ymean  = mean(y)   sx = sigma(x)   sy = sigma(y)   (xymean-(xmean*ymean))/(sx*sy) end a = [3, 6, 9, 12, 15, 18, 21] b = [1.1, 2.1, 3.4, 4.8, 5.6] c = [1.9, 1.0, 3.9, 3.1, 6.9] c1 = correlate(a,a)          # 1.0 c2 = correlate(a,a.reverse)  # -1.0 c3 = correlate(b,c)          # 0.8221970228`

The following version is similar, but it operates on a single array, each element of which is an array containing an x-y pair:

`def correlate2(v)   sum = 0.0   v.each do |a|     sum += a[0]*a[1]   end   xymean = sum/v.size.to_f   x = v.collect {|a| a[0]}   y = v.collect {|a| a[1]}   xmean  = mean(x)   ymean  = mean(y)   sx = sigma(x)   sy = sigma(y)   (xymean-(xmean*ymean))/(sx*sy) end     d = [[1,6.1], [2.1,3.1], [3.9,5.0], [4.8,6.2]] c4 = correlate2(d)               # 0.2277822492`

Finally, the following version assumes that the x-y pairs are stored in a hash. It simply builds on the previous example:

`def correlate_h(h)   correlate2(h.to_a) end e = { 1 => 6.1, 2.1 => 3.1, 3.9 => 5.0, 4.8 => 6.2} c5 = correlate_h(e)          # 0.2277822492`

The Ruby Way, Second Edition: Solutions and Techniques in Ruby Programming (2nd Edition)
ISBN: 0672328844
EAN: 2147483647
Year: 2004
Pages: 269
Authors: Hal Fulton