No matter how fast CPUs get, it always seems we are trying to squeeze more performance out of our software. A good performance analysis tool will provide you with statistics to help you do so. A performance analysis tool works by gathering data during one or more sample executions of your code. Thus the more real to life your test set is, the more useful the output of your performance analysis tool will be.
One of the simplest yet most useful performance analysis tools is a run time histogram. Such a tool shows you graphically what functions are using up the most time in your program and how often that function was called. In many cases such a tool will allow you to concentrate your performance tuning on the areas that have the greatest potential payback. After all, achieving a 50% speedup in a segment of your code that is only using up 1% of the execution time does little to speed up the entire program. It would be ten times more effective to concentrate on getting a 10% speedup in a segment of the code that was using 50% of the execution time.
Another useful performance analysis tool is a loadmap generator. A loadmap generator examines a running program to look for ways in which code could be rearranged to improve locality of reference. It is not uncommon today to have program sizes that exceed the capacity of the four to eight megabyte caches found on modern CPUs. Often, a compiler can do little to determine run time locality of reference. A loadmap generator examines locality of reference at run time and provides hints to the linker for more efficiently arranging code so as to optimize cache misses. Just like run time histograms, however, a loadmap generator is only as good as the code sample that was used to generate it.