OProfile

team bbl


OProfile is a system-wide tracing tool that is available for the Intel, Power PC, AMD, sparc64, and PA-RISC platforms.

The OProfile tool works off of the system timer support or the performance counter support, if available. OProfile is a useful tool for obtaining a profile of a running system. This profile shows the amount of time being spent in an application and the number of times an application was running when OProfile made its measurement. It also gives information by application and library symbols.

The most unique function of OProfile is that it provides a report that shows the source code for an application with the number of measurements made for each line of code in the application. This function applies to C code and is very useful to determine where in an application to concentrate the tuning efforts.

OProfile is now shipped as part of some Red Hat distributions and is available from other Linux operating system providers. Note that the OProfile web site contains a disclaimer that they are not responsible for the version that ships with Red Hat. Some versions of Red Hat require that SMP support be enabled in the kernel configuration and that the OProfile driver must be built as a loadable module and not built into the kernel. If it's built into the kernel, duplicate function name errors occur.

One other item to note is that on the version that is downloaded from the OProfile development web site, you must run the autosetup.sh command as the first step. This is not always clear in the documentation.

It is best to use the OProfile patch on kernel source that is obtained from the www.kernel.org web site. This patch ensures that all the required support is included in the kernel source. If you use a specific vendor version of the kernel source, you might not have the latest patches. As vendors upgrade their kernel source to newer versions of the Linux kernel, the patches will be included.

Refer to the OProfile web site at http://oprofile.sourceforge.net/ for detailed information on the tool and to download a copy.

The next sections examine the various parts of OProfile, including opcontrol, oprof_start, oprofpp, op_time, op_to_source, and op_merge.

opcontrol

opcontrol initializes the OProfile tool and starts and stops the data collection. The specific command options available with opcontrol are as follows:

  • --init. Causes the OProfile driver to be loaded if it isn't already active and makes the driver interface available.

  • --setup. Defines the specific setup parameters to be used by OProfile. You can bypass this option by accepting the defaults or by using one of the setup options directly. Some of the more useful setup options are buffer size, performance counter mask to use, number of events between samples, and filters for looking at a specific PID or PID group.

  • --start-daemon. Starts the profiling daemon without starting the profiler. Use the --start option to start profiling.

  • --start. Starts the profile measurement.

  • --dump. Causes the collected profiling data to be dumped to the oprofile daemon.

  • --stop. Stops the profiling data collection.

  • --shutdown. Stops data collection and removes the daemon.

  • --reset. Clears current session data but not saved session data.

Refer to the OProfile documentation for further information.

The following examples show the initialization and starting of the OProfile functions for counter support, creating separate profiles for libraries and the kernel, and saving session data.

Using the Counter Support

The following command initializes OProfile to start counter 0 to use the CPU clock not-halted event and to make the measurement after every 400,000 counter events. Performance counter 1 is set to data memory references, and the measurement is made after every 10,000 events. The Linux kernel executable file is located at /usr/src/Linux and is called vmlinux. After the initialization, the measurement is started.

 opcontrol --ctr0-event=CPU_CLK_UNHALTED --ctr0-count=400000 opcontrol --ctr1-event=DATA_MEM_REFS --ctr1-count=10000 opcontrol --vmlinux=/usr/src/Linux/vmlinux opcontrol --start 

When using performance counters, ensure that the count values are large enough to prevent all the time from being spent in processing the performance counter handler. If the counter count value is too low, it is possible that the system will stop functioning and require a reboot. The performance counters are set up to interrupt when a counter overflow occurs. Therefore, if the count is too low, the time will be spent processing the counter overflow interrupt and nothing else.

Separate Profiles for Libraries and the Kernel
 opcontrol --separate=kernel --vmlinux=/usr/src/Linux/vmlinux opcontrol --start   my_test --run   pprofpp -kl -p /lib/modules/Linux/kernel /usr/local/bin/oprofiled 

These commands profile the OProfile daemon itself, including when the daemon was running inside the kernel driver and the libraries it is using.

Saving Session Data

The following command saves the profiling data that has been collected in a file called /var/lib/oprofile/samples/run1. This name can then be passed on to the reporting functions to be processed.

 opcontrol --save=run1 

oprof_start

oprof_start is a GUI interface for opcontrol. oprof_start contains window sections where you can select counter events and a sampling rate, configuration options, and the status of the profiling.

Data Profiling Tools

The following sections describe the tools that profile the sample data OProfile collects. These tools are oprofpp, op_time, op_to_source, and op_merge.

oprofpp

oprofpp produces the following types of reports:

  • List symbol mode. Provides a sorted histogram report of sample counts against functions, as shown in a system data walkthrough.

  • Detailed symbol mode. Provides individual sample counts for instructions inside a function.

  • grprof mode, which produces a flat grprof-style report.

Some of the more interesting options that can be used to control the report output are as follows:

  • Symbol processing: -l is a histogram of counts by symbol name; -s is a detailed listing of the position of the address and the number of samples by symbol; -L is a detailed listing of all symbols.

  • -g dumps the report to a file in gprof format.

  • -o reports the function and line number for all samples.

  • -k reports on shared libraries and kernel entries with respect to the application.

  • -d and -D are used to detangle the C++ symbols. -D is a more simplified style of reporting.

The following sample listing shows the counts and time by kernel symbols.

[View full width]

# oprofpp l Cpu type: PIII Cpu speed was (MHz estimation) : 863.886 Counter 0 counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 431500 vma samples % symbol name image name c0123290 24 0.0182187 do_timer /usr/src/linux/vmlinux c0138810 24 0.0182187 sys_fchdir /usr/src/linux/vmlinux . . . c0122950 100 0.0759111 ptrace_readdata /usr/src/linux/vmlinux c0122a10 103 0.0781885 ptrace_writedata /usr/src/linux/vmlinux c024b690 103 0.0781885 tcp_rfree /usr/src/linux/vmlinux c0107df0 107 0.0812249 copy_siginfo_to_user /usr/src/linux/vmlinux c011eb40 108 0.081984 sys_setitimer /usr/src/linux/vmlinux c0157240 108 0.081984 proc_file_lseek /usr/src/linux/vmlinux c0142e30 109 0.0827431 pipe_write /usr/src/linux/vmlinux c0123900 111 0.0842613 alloc_uid /usr/src/linux/vmlinux c0122af0 112 0.0850205 init_timervecs /usr/src/linux/vmlinux . . . c011f8c0 406 0.308199 do_softirq /usr/src/linux/vmlinux c010d464 411 0.311995 IRQ0x77_interrupt /usr/src/linux/vmlinux c0133ad0 416 0.31579 swap_out_pmd /usr/src/linux/vmlinux c01073f0 437 0.331732 release_segments /usr/src/linux/vmlinux c024b6b0 454 0.344636 tcp_poll /usr/src/linux/vmlinux c0131e50 463 0.351469 alloc_area_pmd /usr/src/linux/vmlinux c01483e0 463 0.351469 dcache_dir_close /usr/src/linux/vmlinux c014c6b0 465 0.352987 move_lock_status /usr/src/linux/vmlinux c014c440 479 0.363614 lock_get_status /usr/src/linux/vmlinux c010c850 522 0.396256 sys_ptrace /usr/src/linux/vmlinux c0134250 528 0.400811 rmqueue /usr/src/linux/vmlinux c024b480 541 0.410679 tcp_mem_schedule /usr/src/linux/vmlinux c0122c70 555 0.421307 second_overflow /usr/src/linux/vmlinux c01c9b70 564 0.428139 config_chipset_for_pio /usr/src/linux/vmlinux c011f2b0 571 0.433453 do_adjtimex /usr/src/linux/vmlinux c0148a00 572 0.434212 filldir64 /usr/src/linux/vmlinux c01b3150 603 0.457744 figure_loop_size /usr/src/linux/vmlinux c0114920 613 0.465335 mtrr_ioctl /usr/src/linux/vmlinux c01b30d0 630 0.47824 compute_loop_size /usr/src/linux/vmlinux c0208bb0 649 0.492663 qla1280_initialize_adapter /usr/src/linux/vmlinux c018aea0 676 0.513159 sys_semop /usr/src/linux/vmlinux c01b1e10 723 0.548837 floppy_release_irq_and_dma /usr/src/linux/vmlinux c0247ed0 725 0.550356 ip_build_and_send_pkt /usr/src/linux/vmlinux c0133bf0 726 0.551115 try_to_swap_out /usr/src/linux/vmlinux c0118a60 728 0.552633 sys_sched_getparam /usr/src/linux/vmlinux c01325e0 757 0.574647 kmem_cache_grow /usr/src/linux/vmlinuxc0129f80 959 0.727988 do_swap_page /usr/src/linux/vmlinux c02484b0 1011 0.767461 ip_build_xmit_slow /usr/src/linux/vmlinux c01c9180 1417 1.07566 hpt34x_dmaproc /usr/src/linux/vmlinux c0148400 1540 1.16903 dcache_dir_lseek /usr/src/linux/vmlinux c01069c0 1828 1.38766 gunzip /usr/src/linux/vmlinux c0208fd0 1890 1.43472 qla1280_pci_config /usr/src/linux/vmlinux c0108b60 1950 1.48027 handle_signal /usr/src/linux/vmlinux c0148150 2193 1.66473 sys_ioctl /usr/src/linux/vmlinux c010e5f0 2414 1.83249 __constant_c_and_count_memset /usr/src/linux/vmlinux c013a1a0 2482 1.88411 sys_writev /usr/src/linux/vmlinux c01174c0 2813 2.13538 do_page_fault /usr/src/linux/vmlinux c0118cb0 3646 2.76772 show_task /usr/src/linux/vmlinux c02489f0 3718 2.82238 ip_build_xmit /usr/src/linux/vmlinux c013a230 6183 4.69358 sys_pread /usr/src/linux/vmlinux c010e4e0 8802 6.6817 __constant_memcpy /usr/src/linux/vmlinux c013ae60 42015 31.8941 write_locked_buffers /usr/src/linux/vmlinux

op_time

op_time produces summary reports relative to the binaries that are running on the system. Use this report to find out where time is being spent when your test case is running. You can then use other reports to narrow your focus to specific functions and code segments within the application to see where the problem may exist.

Many of the same options that are available for oprofpp are also available for op_time.

The following sample report uses the -D option.

 # op_time -D 1          0.0043  0.0000 /lib/libhistory.so.4.3 1          0.0043  0.0000 /opt/kde3/lib/kwin.so 1          0.0043  0.0000 /opt/kde3/lib/libartsflow.so.1.0.0 1          0.0043  0.0000 /usr/X11R6/lib/X11/locale/lib/common/ximcp.so.2 1          0.0043  0.0000 /usr/lib/vslick/bin/vs 2          0.0087  0.0000 /bin/gawk 2          0.0087  0.0000 /lib/modules/2.4.19-4GB/kernel/drivers/usb/usb-uhci.o 2          0.0087  0.0000 /opt/kde3/lib/libartsmidi.so.0.0.0 3          0.0130  0.0000 /bin/ps 3          0.0130  0.0000 /lib/modules/2.4.19-4GB/oprofile/oprofile.o 4          0.0173  0.0000 /lib/libreadline.so.4.3 4          0.0173  0.0000 /opt/kde3/lib/libkdecore.so.4.0.0 5          0.0216  0.0000 /opt/kde3/lib/libmcop.so.1.0.0 5          0.0216  0.0000 /usr/local/bin/op_help 16         0.0692  0.0000 /usr/X11R6/lib/libX11.so.6.2 16         0.0692  0.0000 /usr/lib/gconv/ISO8859-1.so 19         0.0822  0.0000 /lib/libpthread.so.0 20         0.0865  0.0000 /opt/kde3/lib/libkio.so.4.0.0 25         0.1082  0.0000 /usr/lib/libstdc++.so.5.0.0 26         0.1125  0.0000 /usr/X11R6/bin/XFree86 33         0.1428  0.0000 /opt/kde3/lib/libkonsolepart.so 45         0.1947  0.0000 /bin/bash 59         0.2553  0.0000 /sbin/insmod 68         0.2942  0.0000 /lib/ld-2.2.5.so 163        0.7052  0.0000 /lib/modules/2.4.19-4GB/kernel/fs/reiserfs/reiserfs.o 297        1.2849  0.0000 /lib/libc.so.6 308        1.3325  0.0000 /usr/lib/qt-3.0.5/lib/libqt-mt.so.3.0.5 661        2.8597  0.0000 /usr/src/Linux/vmlinux 3950      17.0892  0.0000 /usr/local/bin/oprofiled 17373     75.1622  0.0000 /piperf/bin/pi_watch 

op_to_source

The op_to_source tool generates annotated source for assembly listings. If the profiled application is built with debug information, a report contains the actual source code with the profiled counts interleaved. If the application is built without the debug information, op_to_source can produce the assembler report. Use the --assembly (-a) option to produce this type of report.

The op_to_source level of reporting is one step above an application debugger that allows you to step through your application to check the execution logic. op_to_source provides you with the information necessary to determine where in the application you need to focus the debugging effort.

The following is sample output from op_to_source for a sample program.

[View full width]

# op_to_source :/* * IBM Performance Inspector * Copyright (c) International Business Machines Corp., 2003 * * This library is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published * by the Free Software Foundation; either version 2.1 of the License, or * (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See * the GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this library; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ : #include <stdio.h> #include <stdlib.h> #include <unistd.h> int One(unsigned long long cps); int Two(unsigned long long cps); int Four(unsigned long long cps); int Eight(unsigned long long cps); int Sixteen(unsigned long long cps); int ThirtyTwo(unsigned long long cps); int SixtyFour(unsigned long long cps); int val = 0; /************************************/ 10 0.003% 580 0.000% :void pc50() { /* pc50 total: 153902 52.31% 40667094 52.46% */ : int i; : 119226 40.52% 29583773 38.16% : for(i = 0; i < 5000; i++) { 34613 11.76% 11071427 14.28% : val = val + i; : } 53 0.018% 11314 0.014% :} : :/************************************/ 8 0.002% 290 0.000% :void pc25() { /* pc25 total: 70463 23.95% 18765427 24.20% */ : int i; : 60711 20.63% 16897936 21.79% : for(i = 0; i < 2500; i++) { 9693 3.295% 1851007 2.387% : val = val + i; : } 51 0.017% 16194 0.020% :} : :/************************************/ 13 0.004% 5455 0.007% :void pc15() { /* pc15 total: 43572 14.81% 11245713 14.50% */ : int i; : 39415 13.39% 11069245 14.27% : for(i = 0; i < 1500; i++) { 4046 1.375% 136253 0.175% : val = val + i; : } 98 0.033% 34760 0.044% :} : :/************************************/ 9 0.003% 116 0.000% :void pc10() { /* pc10 total: 26148 8.888% 6818423 8.795% */ : int i; : 20936 7.116% 5433262 7.008% : for(i = 0; i < 1000; i++) { 5154 1.752% 1365913 1.762% : val = val + i; : } 49 0.016% 19132 0.024% :} : :/************************************/ :int main(int argc, char *argv[]) { /* main total: 84 0.028% 22107 0.028% */ : int i; : int iter; : : if(argc == 2) { : iter = atoi(argv[1]); : } : else { : fprintf(stderr, " Usage: watch iter\n"); : fflush(stderr); : exit(0); : } : : val = 0; 47 0.015% 10143 0.013% : for(i = 0; i < iter; i++) { 5 0.001% 1011 0.001% : pc50(); 11 0.003% 6072 0.007% : pc25(); 9 0.003% 1313 0.001% : pc15(); 12 0.004% 3568 0.004% : pc10(); : } : : fprintf(stderr, " val = %d\n", val); : return(0); :} : : /* * Total samples for file : "/piperf/src/pi_watch.c" * * 294169 100.0% 77518764 100.0% */ /* * Command line: op_to_source --source-dir=/piperf/src --output-dir=/temp /piperf/bin/pi_watch * * Interpretation of command line: * Output annotated source file with samples * Output all files * * Cpu type: PIII * Cpu speed (MHz estimation) : 662.229 * * Counter 0 counted CPU_CLK_UNHALTED events (clock's processor is not halted) with a unit mask of 0x00 (No unit mask) count 400000 * Total samples : 294169 * Counter 1 counted DATA_MEM_REFS events (all memory references, cacheable and non) with a unit mask of 0x00 (No unit mask) count 1000 * Total samples : 77518764 */

op_merge

op_merge merges profiling samples created with the --separate option when applied to the same binary. op_merge can produce reports for separate shared libraries and then merge the reports for a complete system view. You can also produce reports on a combination of a subset of profiled applications.

    team bbl



    Performance Tuning for Linux Servers
    Performance Tuning for Linux Servers
    ISBN: 0137136285
    EAN: 2147483647
    Year: 2006
    Pages: 254

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net