5.27. Finding a Correlation CoefficientThe correlation coefficient is one of the simplest and most universally useful statistical measures. It is a measure of the "linearity" of a set of x-y pairs, ranging from -1.0 (complete negative correlation) to +1.0 (complete positive correlation). We compute this using the mean and sigma (standard deviation) functions defined previously in sections 5.25 and 5.26. For an explanation of this tool, consult any statistics text. The following version assumes two arrays of numbers (of the same size): def correlate(x,y) sum = 0.0 x.each_index do |i| sum += x[i]*y[i] end xymean = sum/x.size.to_f xmean = mean(x) ymean = mean(y) sx = sigma(x) sy = sigma(y) (xymean-(xmean*ymean))/(sx*sy) end a = [3, 6, 9, 12, 15, 18, 21] b = [1.1, 2.1, 3.4, 4.8, 5.6] c = [1.9, 1.0, 3.9, 3.1, 6.9] c1 = correlate(a,a) # 1.0 c2 = correlate(a,a.reverse) # -1.0 c3 = correlate(b,c) # 0.8221970228 The following version is similar, but it operates on a single array, each element of which is an array containing an x-y pair: def correlate2(v) sum = 0.0 v.each do |a| sum += a[0]*a[1] end xymean = sum/v.size.to_f x = v.collect {|a| a[0]} y = v.collect {|a| a[1]} xmean = mean(x) ymean = mean(y) sx = sigma(x) sy = sigma(y) (xymean-(xmean*ymean))/(sx*sy) end d = [[1,6.1], [2.1,3.1], [3.9,5.0], [4.8,6.2]] c4 = correlate2(d) # 0.2277822492 Finally, the following version assumes that the x-y pairs are stored in a hash. It simply builds on the previous example: def correlate_h(h) correlate2(h.to_a) end e = { 1 => 6.1, 2.1 => 3.1, 3.9 => 5.0, 4.8 => 6.2} c5 = correlate_h(e) # 0.2277822492 |