Section 16.7. Prettyprinting Objects | The Ruby Way, Second Edition: Solutions and Techniques in Ruby Programming (2nd Edition)

16.6. Measuring Performance

I don't like to place too much emphasis on optimizing for speed. The best way to do this in general is to select a reasonable algorithm and use common sense.

Certainly speed matters. In some cases it matters greatly. But it is usually a mistake to worry about speed too soon in the development cycle. The saying goes, "Premature optimization is the root of all evil." (This originated with Hoare and was restated by Knuth.) Or as someone else put it: "Make it right, then make it fast." At the application level, this is generally a good heuristic (though at the level of large systems, it may be of less value).

I would add to this precept: Don't optimize until you measure.

That isn't so much to ask. Refrain from refactoring for speed until you actually know two things: First, is it really too slow? And second, exactly which parts are causing the speed problem?

The second question is more important than it appears. Programmers tend to think they have a pretty good idea of where the execution time is going, but in reality, studies have shown conclusively that on the average they are extraordinarily bad at making these determinations. "Seat of the pants" optimization is not a viable option for most of us.

We need objective measurements. We need a profiler.

Ruby comes with a profiler called profile. Invoking it is as easy as requiring the library:

ruby -rprofile myprog.rb

Consider the program Listing 16.6. Its purpose is to open the /usr/share/dict/words file and look for anagrams. It then looks for which words have the most anagrams and prints them out.

Listing 16.6. Finding Anagrams in the Dictionary

words = File.readlines("/usr/share/dict/words") words.map! {|x| x.chomp } hash = {} words.each do |word|   key = word.split("").sort.join   hash[key] ||= []   hash[key] << word end sizes = hash.values.map {|v| v.size } most = sizes.max list = hash.find_all {|k,v| v.size == most } puts "No word has more than #{most-1} anagrams." list.each do |key,val|   anagrams = val.sort   first = anagrams.shift   puts "The word #{first} has #{most-1} anagrams:"   anagrams.each {|a| puts "   #{a}" } end num = 0 hash.keys.each do |key|   n = hash[key].size   num += n if n > 1 end puts puts "The dictionary has #{words.size} words," puts "of which #{num} have anagrams."

I know you are curious about the output. Here it is:

No word has more than 14 anagrams. The word alerts has 14 anagrams:    alters    artels    estral    laster    lastre    rastle    ratels    relast    resalt    salter    slater    staler    stelar    talers The dictionary has 483523 words, of which 79537 have anagrams.

On my system, that file has more than 483,000 words, and the program runs in a little more than 18 seconds. Where do you think the time is going? Let's find out. The output from the profiler is more than 100 lines long, sorted by decreasing time. We'll look at the first 20 lines:

 %   cumulative   self               self     total time   seconds   seconds     calls  ms/call  ms/call  name 42.78   190.93    190.93        15 12728.67 23647.33  Array#each 10.78   239.04     48.11   1404333     0.03     0.04  Hash#[]  7.04   270.48     31.44         2 15720.00 25575.00  Hash#each  5.66   295.73     25.25    483523     0.05     0.05  String#split  5.55   320.51     24.78   1311730     0.02     0.02  Array#size  3.64   336.76     16.25         1 16250.00 25710.00  Array#map  3.24   351.23     14.47    483524     0.03     0.03  Array#sort  3.12   365.14     13.91    437243     0.03     0.03  Fixnum#==  3.04   378.72     13.58    483526     0.03     0.03  Array#join  2.97   391.98     13.26    437244     0.03     0.03  Hash#default  2.59   403.53     11.55    437626     0.03     0.03  Hash#[]=  2.43   414.38     10.85    483568     0.02     0.02  Array#<<  2.29   424.59     10.21         1 10210.00 13430.00  Array#map!  1.94   433.23      8.64    437242     0.02     0.02  Fixnum#<=>  1.86   441.54      8.31    437244     0.02     0.02  Fixnum#>  0.72   444.76      3.22    483524     0.01     0.01  String#chomp!  0.11   445.26      0.50         4   125.00   125.00  Hash#keys  0.11   445.73      0.47         1   470.00   470.00  Hash#values  0.06   446.00      0.27         1   270.00   270.00  IO#readlines  0.05   446.22      0.22     33257     0.01     0.01  Fixnum#+

Here we see that the Array#each method is taking more time than anything else. That makes sense; we have a loop that runs for each word and does significant work with each iteration. The average is misleading because the first call of each takes almost all the time; the other 14 calls (see anagrams.each) take a short time.

We also notice that Hash#[] is an expensive operation (largely because we do it so often); 1.4 million of these calls take not quite 11 seconds.

Notice where the readlines call fellalmost off our list of the top twenty. This program is not I/O bound at all; it is compute-bound. The read of the entire file took hardly more than a fourth of a second.

The real value of profiling is not seen in this example, however. This program doesn't have any methods or any class structure. In real life, you would see your own methods listed among the core methods. You would then be able to tell which of your methods was among the top 20 "time wasters."

One thing to realize about the Ruby profiler is that (perhaps ironically) the profiler itself is slow. It hooks into the program in many places and monitors execution at a low level (in pure Ruby). So don't be surprised if your program runs an order of magnitude slower. This program when profiled took 7 minutes 40 seconds (or 460 seconds), slowed down by a factor of 25 or more.

Besides the profiler, there is a lower-level tool you should be aware of. The benchmark standard library is also useful in measuring performance.

One way to use this little tool is to invoke the Benchmark.measure method and pass it a block:

require 'benchmark' file = "/usr/share/dict/words" result = Benchmark.measure { File.readlines(file) } puts result # Output:    0.350000   0.070000   0.420000 (  0.418825)

The output from this method is as follows:

User CPU time (in seconds)
System CPU time
Sum of user and system times
Elapsed real time

For comparing several different items, the Benchmark.bm method is convenient. Give it a block, and it will pass in a reporting object. Invoke that object with a label and a block, and it will output the label and time the block. Look at the following code:

require 'benchmark' n = 200_000 s1 = "" s2 = "" s3 = "" Benchmark.bm do |rep|   rep.report("str <<     ") { n.times { s1 << "x" } }   rep.report("str.insert ") { n.times { s3.insert(-1,"x") } }   rep.report("str +=     ") { n.times { s2 += "x" } } end

Here we compare three ways of getting a character into the end of a string; the result is the same in each case. We do each operation 200,000 times so as to measure the effect better. Here is the output:

      user     system      total        real str <<       0.180000   0.000000   0.180000 (  0.174697) str.insert   0.200000   0.000000   0.200000 (  0.200479) str +=      15.250000  13.120000  28.370000 ( 28.375998)

Notice how time-consuming the third case isa full two orders of magnitude slower. Why is this? What lesson can we learn?

You might conclude that the + operator is slow for some reason, but that isn't the case. This is the only one of the three methods that doesn't operate on the same object every time but creates a new object.

The lesson is that object creation is an expensive operation. There may be many such small lessons to be learned by using Benchmark, but I still recommend profiling at a higher level first.