The profile data stored in the gmon.out file is analyzed and displayed using the gprof utility. If called with no options or command-line arguments, gprof assumes that the executable it is working with is a.out and that the profile data for this executable is in the file gmon.out . Also, unless directed otherwise , gprof generates and displays a fully annotated flat profile and a call graph using the collected data. Like most utilities, gprof supports an extensive set of options. We will examine a small subset of its options.
Staying with our selection sort example, as shown previously, the command-line sequence
linux$ g++ ss.cxx -pg -o ss
compiles the source file ss.cxx into the executable file ss . The program is then run. In this example the program obtains its input, via command-line redirection, from a data file called data_reverse . The data file contains the numbers 1 through 10 in reverse order. The output from the program's execution is displayed on the screen.
linux$ ss < data_reverse <-- 1 Please enter 10 integers Initial list <-- 2 10 9 8 7 6 5 4 3 2 1 After pass 1 : 1 9 8 7 6 5 4 3 2 10 After pass 2 : 1 2 8 7 6 5 4 3 9 10 After pass 3 : 1 2 3 7 6 5 4 8 9 10 After pass 4 : 1 2 3 4 6 5 7 8 9 10 After pass 5 : 1 2 3 4 5 6 7 8 9 10 <-- 3 After pass 6 : 1 2 3 4 5 6 7 8 9 10 After pass 7 : 1 2 3 4 5 6 7 8 9 10 After pass 8 : 1 2 3 4 5 6 7 8 9 10 After pass 9 : 1 2 3 4 5 6 7 8 9 10 Sorted list 1 2 3 4 5 6 7 8 9 10
(1) Program is run normally; obtains input from a data file.
(2) Its output is displayed on the screen.
(3) Given the unique nature of the input data (it starts in descending order) and the algorithm's logic, the data is actually sorted before the program terminates!
Once the program is run, the gprof utility is used to analyze and view the profile data. If options are to be passed to gprof , they should be specified individually on the command line following the call to the utility. The name of the executable file comes after the options. If not specified, the default is a.out . The executable file name is optionally followed by the profile file name, which, if not given, defaults to gmon.out . In the example below the -b option (for brief) is specified.
linux$ gprof -b ss <-- 1
(1) Call gprof and pass it the -b option and the name of the executable file (ss). The profile data file, which is not specified, is assumed to be gmon.out .
The output generated by this invocation is divided into two sections, the first of which is a flat profile. This section shows how much time the program spent in each function and how many times the function was called. Sorting 10 items is a trivial task even with the somewhat inefficient selection sort. Therefore, to produce a more instructive data set, the number of values to be sorted was increased to 1,000 and the program recompiled and rerun. To reduce the time needed for display, the output of the executable was discarded by directing it to /dev/null .
linux$ ss < data_reverse > /dev/null
The gprof output below is based on sorting 1,000 values.
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls us/call us/call name 100.00 0.04 0.04 999 40.04 40.04 Find_Least(int const *, int, int) 0.00 0.04 0.00 499500 0.00 0.00 Compare(int const &, int const &) 0.00 0.04 0.00 1001 0.00 0.00 Display(int *, int) 0.00 0.04 0.00 500 0.00 0.00 Swap(int &, int &) 0.00 0.04 0.00 1 0.00 40000.00 Do_S_Sort(int *, int) 0.00 0.04 0.00 1 0.00 0.00 Get_Data(int *, int)
In the flat profile section each function is listed in decreasing order based on its runtime (if the runtime is too small to be recorded, it is reported as 0). The meaning of each field in the flat profile is listed in the following table.
FIELD NAME |
MEANING |
---|---|
% time |
The percentage of total execution time. |
cumulative seconds |
Total seconds executing this function and all those listed above it in the table. |
self seconds |
Total seconds executing just this function (the primary sort key). |
calls |
The number of times this function was called (blank if not called). |
self Ts/call |
Average number of T seconds spent in this function ( m = milli, u = micro, n = nano, etc.). |
total Ts/call |
Per call, the average number of T seconds in this function and its descendants. |
name |
Function name (acts as the secondary sort key) |
As might be anticipated with a list of 1,000 elements, the Compare function in our example is called 499,500 times (the summation of 999 + 998 + 997 + 1, the series representing the number of compares ). Clearly, the Find_Least function is the most time intensive of the group .
The flat profile is followed by a call graph that details the amount of time spent in each function. The output in this section is divided into a series of entries, one per function. Each entry, which is one or more lines in length, is separated from the next by a series of dashes. Within each entry there is one primary line. The primary line indicates the function associated with the entry data. The primary line is easily identified, as it begins with an index number surrounded by set of square brackets. In each entry lines that precede the primary line are functions that call the function, while lines that follow the primary line are functions that are called by the function (in call graph-speak, its children). If the caller of a function cannot be determined, < spontaneous > is printed instead.
Here is the call graph for our selection sort program when it sorts a list of 1,000 values, which were initially in inverse order. For ease of reference, each primary line has been highlighted.
Call graph granularity: each sample hit covers 4 byte(s) for 25.00% of 0.04 seconds index % time self children called name 0.04 0.00 999/999 Do_S_Sort(int *, int) [2] [1] 100.0 0.04 0.00 999 Find_Least(int const *, int, int) [1] 0.00 0.00 499500/499500 Compare(int const &, int const &) [4] ----------------------------------------------- 0.00 0.04 1/1 main [3] [2] 100.0 0.00 0.04 1 Do_S_Sort(int *, int) [2] 0.04 0.00 999/999 Find_Least(int const *, int, int) [1] 0.00 0.00 999/1001 Display(int *, int) [5] 0.00 0.00 500/500 Swap(int &, int &) [6] ----------------------------------------------- [3] 100.0 0.00 0.04 main [3] 0.00 0.04 1/1 Do_S_Sort(int *, int) [2] 0.00 0.00 2/1001 Display(int *, int) [5] 0.00 0.00 1/1 Get_Data(int *, int) [7] ----------------------------------------------- 0.00 0.00 499500/499500 Find_Least(int const *, int, int) [1] [4] 0.0 0.00 0.00 499500 Compare(int const &, int const &) [4] ----------------------------------------------- 0.00 0.00 2/1001 main [3] 0.00 0.00 999/1001 Do_S_Sort(int *, int) [2] [5] 0.0 0.00 0.00 1001 Display(int *, int) [5] ----------------------------------------------- 0.00 0.00 500/500 Do_S_Sort(int *, int) [2] [6] 0.0 0.00 0.00 500 Swap(int &, int &) [6] ----------------------------------------------- 0.00 0.00 1/1 main [3] [7] 0.0 0.00 0.00 1 Get_Data(int *, int) [7] ----------------------------------------------- Index by function name [4] Compare(int const &, int const &) [2] Do_S_Sort(int *, int) [7] Get_Data(int *, int) [5] Display(int *, int) [1] Find_Least(int const *, int, int) [6] Swap(int &, int &)
The meaning of each field is based on its context (i.e., the line's designation): primary, function's callers (call this function), or called functions (called by this function).
FIELD |
PRIMARY |
FUNCTION'S CALLERS |
CALLED FUNCTIONS |
---|---|---|---|
index |
Index number of this function. |
||
% time |
Percent of total time spent in this function and its children. |
||
self |
Total amount of time just spent in this function (same as self seconds value in flat profile). |
Estimate of time spent in this function when invoked by the .caller function. |
Estimate of time spent in called function. |
children |
Total amount of time spent in the function calls made by this function. |
Estimate of time spent in calls to its children. |
Estimate of time spent in the children of the called function. |
called |
Number of times this function was called. A +is used to separate recursive calls. |
Number of times this function is called; the total number of nonrecursive calls. |
Number of times this function is called; the total number of nonrecursive calls. |
name |
The name and index number of the function. |
The name and index number of the function. |
The name and index number of the function. |
If we look at the first entry in the call graph output, we note that the primary line is flagged by [1] . The associated function, Find_Least , is called by the Do_S_Sort function 999 times. In turn , the Find_Least function calls the Compare function 499,500 times.
The gprof utility can be directed to display an annotated source code listing where it identifies the number of calls for the function. To produce this output, the source program must be compiled with the -g option, and gprof passed the -A option (indicating annotated source). Using our same source program, this sequence would be
linux$ g++ -g ss.cxx -pg -o ss linux$ ss < data_reverse > /dev/null linux$ gprof -A ss
As shown below, gprof 's output lists the program source code and indicates the number of times each function is called. At the end of the listing, it displays a top 10 list indicating the top 10 lines based on their execution activity. Following the top 10 list is an execution summary.
*** File /home/faculty/gray/revision/profile/ss.cxx: #include using namespace std; // Function prototypes void Get_Data ( int [], int ); void Display ( int [], int ); void Do_S_Sort ( int [], int ); int Find_Least( const int [], int, int ); int Compare ( const int &, const int & ); void Swap ( int &, int & ); int ##### -> main( ) { const int max = 1000; int List[max]; Get_Data ( List, max ); // Obtain data cout << "Initial list" << endl; Display ( List, max ); // Show it Do_S_Sort( List, max ); // Sort it cout << "Sorted list" << endl; Display ( List, max ); // Show it again return 0; } // Obtain data to sort from standard input void 1 -> Get_Data(int a[], int n) { cout << "Please enter " << n << " integers" << endl; for(int i=0; i < n; ++i) cin >> a[i]; } // Display the current contents of list void 1001 -> Display(int a[], int n) { for(int i=0; i < n; ++i) cout << " " << a[i]; cout << endl; } // Do the Selection Sort, Display after each pass void 1 -> Do_S_Sort( int a[], int n ){ int index; for (int i=0; i < n-1; ++i){ index=Find_Least( a, i, n ); if ( i != index ) Swap( a[i], a[index] ); cout << "After pass " << i+1 << " : "; Display( a, n ); } } // Find the index of the least element in list int 999 -> Find_Least( const int a[], int start, int stop ){ int Index_of_Least = start; for (int i=start+1; i < stop; ++i ) if ( Compare(a[i], a[Index_of_Least]) ) Index_of_Least = i; return Index_of_Least; } // Compare two data elements int 499500 -> Compare( const int &a, const int &b ){ return ( a < b ); } // Exchange two data elements void 500 -> Swap( int &a, int &b ){ int temp; temp = a; a = b; b = temp; ##### -> } Top 10 Lines: Line Count 58 499500 30 1001 49 999 63 500 23 1 37 1 Execution Summary: 8 Executable lines in this file 8 Lines executed 100.00 Percent of the file executed 502002 Total number of line executions 62750.25 Average executions per line
While the gprof utility is helpful in analyzing the execution of programs, the current Linux version of this utility does have a serious limitation. As implemented, the utility uses a signal handler to collect profile information. The signal handler is invoked by the periodic generation of a SIGPROF (27) signal. When a child process is generated via a fork or a new thread is created with a call to pthread_create , they do not, by default, receive these signals. As a result, no profile data is collected for the child process or new thread.
Programs and Processes
Processing Environment
Using Processes
Primitive Communications
Pipes
Message Queues
Semaphores
Shared Memory
Remote Procedure Calls
Sockets
Threads
Appendix A. Using Linux Manual Pages
Appendix B. UNIX Error Messages
Appendix C. RPC Syntax Diagrams
Appendix D. Profiling Programs