19.13. Profiling
In the previous guideline, the benchmarked comparison between repeatedly computing sqrt $pixel_value and repeatedly looking up $sqrt_of[$pixel_value] indicated that caching provided a 9% improvement: Rate recompute look_up_array recompute 3951/s -- -8% look_up_array 4291/s 9% -- That sounds impressive, but it's important to keep those numbers in perspective. Each iteration of the test did 256 square root retrievals. So, overall, the test was achieving 1,011,456 (i.e., 3951 x 256) sqrt calls per second, compared to 1,098,496 @sqrt_of look-ups per second. Suppose you were processing the 786,432 pixels of a typical 1024 x 768 image. Using the example performance figures, the repeated sqrt calls would require around 0.78 seconds to process that many pixels, whereas the look-up table would take only about 0.72 seconds. Adding a cache to this section of your code would save you a grand total of 0.06 seconds per image. That's an all-too-common outcome when code is optimized: developers focus their efforts on those components that are easy to optimize, rather than on those components in which improvements will produce the greatest benefit. How do you find those places where optimization will do the most good? By understanding where your application spends most of its time. And the easiest way to do that is to profile your program using the standard Devel::DProf module, which can determine how long your application spends within each subroutine in your source code. That is, instead of running your program in the usual way: > perl application.pl datafile run it under the auspices of the profiler module: > perl -d:DProf application.pl datafile The -d: debugging flag is a shorthand for -MDevel::, so specifying -d:DProf is the same as specifying -MDevel::DProf. That tells perl to include that profiling module before the start of the application.pl source. The module itself simply watches every subroutine call within your code, noting how much time elapses between the invocation and return, and adding that duration to a record of the total time spent in each subroutine. At the end of the program, the module creates a file called tmon.out in the current directory. It's possible to directly interpret the raw data in the file (see the module docs for details), but much easier to understand it by passing it through the standard dprofpp application. For example, you could use Devel::DProf to profile a call to your new autoformat application, and then summarize the results with dproffpp like so: > perl -d:DProf ~/bin/autoformat < Ch_19.txt > Ch_19_formatted.txt For each subroutine, the table produced by dprofpp shows (amongst other details):
Looking at the output of the previous example, you can see that the program spends about 8% of its time in Text::Balanced::_failmsg( ), making a total of 221 calls to that subroutine. In contrast, it spends only about half that amount of time in 70 calls to Text::Balanced::extract_codeblock( ). So it probably makes sense to focus any optimization efforts on _failmsg( ) first. If you need finer-grained profiling, the Devel::SmallProf CPAN module allows you to count how many times each line of your program is executed, which makes it easier to determine precisely what statement is causing a particular subroutine to be expensive: > perl -d:SmallProf application.pl datafile The result is a file named smallprof.out, which is actually a copy of your source code with each line prefixed by the number of times it was executed, the total amount of time spent executing the line, and the line number. Although the module lacks a utility like dprofpp to summarize its output, it is an excellent investigative tool once you have identified the main suspects using Devel::DProf. |