Using the gprof Utility | GNU/Linux Application Programming (Programming Series)

Upon execution of the profiler-instrumented image, the gmon.out file is generated. This file is used in conjunction with the original image for gprof to generate human-readable statistics information. Let s look at a simple example and the data products that result. First, we invoke the image and then generate the gprof summary:

 $ ./sort     $ gprof sort gmon.out > sort.gprof

The gprof utility writes its human-readable output to standard-out, so the user must redirect this to a file to save it. The first element of the gprof output is what s called the flat profile. This provides the basic timing summary of the executable, as shown in Listing 8.2. Note that this is not the complete output of gprof . We ll look at other data products shortly.

Listing 8.2: Sample Flat Profile Output from gprof

 $  gprof sort gmon.out  more  Flat profile: Each sample counts as 0.01 seconds.   %   cumulative   self              self     total  time   seconds   seconds    calls   s/call   s/call  name  71.66      3.11     3.11        1     3.11     3.11  bubbleSort  28.11      4.33     1.22        1     1.22     1.22  insertSort   0.23      4.34     0.01        2     0.01     0.01  init_list ...

As we saw in Listing 8.1, our application is made up of three functions. Each function is represented here with a variety of timing data. The first column represents the percentage of time spent in each function in relation to the whole. What s interesting to note from this column is that the bubble sort algorithm requires 2.5 times as much execution time to sort the identical list as the insert sort. The next column, cumulative seconds , is a running sum of the number of seconds, and the next, self seconds , is the number of seconds taken by this function alone. Note that the table itself is sorted by this column in descending order. The column entitled calls represents the total number of times that this function was called (if the function itself was profiled, otherwise the element will be blank. The self s/call represents the average number of seconds spent in this function (including functions that it calls), while the total s/call represents the total number of seconds spent in the function (including functions that it calls). Finally, the name of the function is provided.

The next element provided by gprof is the call graph. This summary (shown in Listing 8.3) shows each function and the calls that are made by it, including the time spent within each function. It illustrates the timing as a hierarchy, which is useful in understanding timing for individual functions down the chain and their effect on higher layers of the call chain.

Listing 8.3: Sample Call Graph Output from gprof

 index % time    self  children    called     name                                                      <spontaneous>     [1]    100.0    0.00    2.11                 main [1]                     1.44    0.00       1/1           bubbleSort [2]                     0.67    0.00       1/1           insertSort [3]                     0.00    0.00       2/2           init_list [4]     -                     1.44    0.00       1/1           main [1]     [2]     68.2    1.44    0.00       1         bubbleSort [2]     -                     0.67    0.00       1/1           main [1]     [3]     31.8    0.67    0.00       1         insertSort [3]     -                     0.00    0.00       2/2           main [1]     [4]      0.0    0.00    0.00       2         init_list [4]     -

The index column is a unique number given to each element. The column marked % time represents the percentage of the total amount of time that was spent in the given function and its children calls. The self column is the amount of time spent in the function, with children as the amount of time spent in the children s functions. Note that in the first row, children is 2.11 (a sum of the two sort functions 1.44 + 0.67). This illustrates that the children functions took all of the time, and no meaningful time was spent in the main function itself. The called field identifies how many times the function was called by the parent. The first number is the number of times the particular parent called the function, and the second number is the total number of times that the child function was called altogether. Finally, the name column represents the names of the functions. Note that in the first row (after the row headings), the name < spontaneous > is used. This simply means that the parent could not be determined (very likely the C-start initialization).

Now that we have a performance baseline of our application, let s rebuild our application to see how we can improve it. In this example, we ll build using -O2 optimization to see how well it improves this very simple application:

 $ gcc -o sort sort.c -pg -O2     $ ./sort     $ gprof sort gmon.out  more

From Listing 8.4, we see that the performance of our application improved significantly (a five times improvement for the insert-sort ) from Listing 8.2 (pre-optimization). The function bubbleSort saw only a modest four times improvement.

Listing 8.4: Sample Flat Profile Output from gprof for the Optimized Application

 Flat profile:     Each sample counts as 0.01 seconds.     %   cumulative   self              self     total     time   seconds   seconds    calls  ms/call  ms/call  name     76.47      0.78     0.78        1   780.00   780.00  bubbleSort     23.53      1.02     0.24        1   240.00   240.00  insertSort     0.00       1.02     0.00        2     0.00     0.00  init_list

This clearly illustrates the utility of the gprof utility (and the gcc optimizer). We can see how the -O2 optimization level improves the source.

Options Available for gprof

Now that we ve covered the basic uses of gprof , we ll look at the variety of options that are provided. While gprof provides a large number of options, we ll discuss some of the more useful ones here. For a complete list of options, see the gprof help.

Source Annotation

The gprof utility can be used to annotate the source with frequency of execution. The image must be built for this purpose. The -g and -a options must be specified along with -pg as:

 gcc -o sort sort.c -g -pg

This results in an image with not only the profile information (via the -pg option), but also debugging information (through the -g option). Upon executing the image and checking the resulting output, as

 gprof  -A -l  sort gmon.out

we get Listing 8.5. The -A option tells gprof to emit an annotated source listing. The -l option specifies to emit a function execution profile, as shown below:

Listing 8.5: Sample Source Annotation from gprof for the Sort Application (Incomplete)

 #include <stdio.h>                 #define MAX_ELEMENTS 10000                 void insertSort(int list[], int size)            1 -> {                   int i, j, temp;                   for (i = 1 ; i <= size-1 ; i++) {                     temp = list[i];                     for (j = i-1 ; j >= 0 && (list[j] > temp) ; j) {                       list[j+1] = list[j];                     }                     list[j+1] = temp;                   }                 }

The -x option can also be used with gprof to extend execution counts to all source lines.

Ignore Private Functions

For private functions that are statically declared, we can ignore the statistics for these through the use of the -a option (or -no-static ). This attributes time spent in the private function to the caller, with the private function never appearing in the flat profile or the call graph.

Recommending Function Ordering

The gprof utility can recommend a function ordering that can improve cache and translation lookaside buffer (TLB) performance on systems that require functions to be contiguous in memory. For systems that include multiway caches, this may not provide much improvement.

To recommend a function ordering (via ”function-ordering ), the following command sequences can be used:

 $ gcc -o sort sort.c -pg -g     $ ./sort     $ gprof sort gmon.out function-ordering

A list of functions is then suggested with specific ordering (using the source from Listing 8.1). The result is shown in Listing 8.6.

Listing 8.6: Sample Function Ordering from gprof

 insertSort     bubbleSort     init_list     _init     _start     __gmon_start__ . . .

Note the grouping of the sorting and init functions, which are all called in proximity of one another. In some cases, reordering functions can be done as simply as coexisting contiguous functions within a single source file. The linker can also provide this capability.

The file ordering option ( ”file-ordering ) provides a similar capability by recommending the order in which objects should be linked to the target image.

Minimizing gprof Summary

The -b option is useful to minimize the amount of superfluous description data that is emitted . Using -b removes the field discussion in the output, as shown in Listing 8.7.

Listing 8.7: Sample Brief Output from gprof

 Flat profile:     Each sample counts as 0.01 seconds.       %   cumulative   self              self     total                 time   seconds   seconds    calls   s/call   s/call  name          71.59      3.10     3.10        1     3.10     3.10  bubbleSort      28.41      4.33     1.23        1     1.23     1.23  insertSort       0.00      4.33     0.00        2     0.00     0.00  init_list                 Call graph     granularity: each sample hit covers 4 byte(s) for 0.23% of 4.33 seconds     index % time    self  children    called     name                                                      <spontaneous>         [1]    100.0    0.00    4.33                 main [1]                     3.10    0.00       1/1           bubbleSort [2]                     1.23    0.00       1/1           insertSort [3]                     0.00    0.00       2/2           init_list [4]     -                     3.10    0.00       1/1           main [1]     [2]     71.6    3.10    0.00       1         bubbleSort [2]     -                     1.23    0.00       1/1           main [1]     [3]     28.4    1.23    0.00       1         insertSort [3]     -                     0.00    0.00       2/2           main [1]     [4]      0.0    0.00    0.00       2         init_list [4]     -     Index by function name      [2] bubbleSort (sort.c)  [4] init_list (sort.c)  [3] insertSort (sort.c)

To further minimize the output, the ”no-flat-profile can be used if the flat profile is not needed (the default is ”flat-profile ). The call graph can be disabled using the ”no-graph option (default is ”graph ).

Finding Unused Functions

The gprof utility can identify functions that are not called in a given run. The ”display-unused-functions is used in conjunction with the ”static-call-graph option to list those functions, as:

 gprof sort gmon.out  display-unused-functions static-call-graph

Increasing gprof Accuracy

In some cases, a program may differ in timing or may represent such a small timing sample that its accuracy is left in question. To increase the accuracy of applications profiling, the application can be performed numerous times and then averaged. A sample bash script that provides this capability is shown in Listing 8.8 [GNU gprof ]:

Listing 8.8: Analyzing Multiple Invocations of an Application (on the CD-ROM at ./source/ch8/script )

 #!/bin/bash     for i in