|
The Performance Inspector (PI) is a suite of tools that identifies performance problems and characteristics. PI is distributed with a kernel patch and the source to build the device driver and tools. Procedures are also included to automate the install process and build for installing the tools. Support is provided for Intel 32- and 64-bit platforms, Power PC 32- and 64-bit systems, S390 32- and 64-bit systems, and the AMD Hammer processor. The tools included in all versions of Performance Inspector include the following:
In addition to these common measurement functions, an additional set of functionality is provided on Intel 32-bit systems. This functionality includes the following:
The Performance Inspector installation requires you to apply a kernel patch and rebuild the kernel. After that step is complete, the various tools are built. Because some of the tools are sensitive to the version of Java that is being used, the tool build process must be redone when a new version of the Java support is used. However, this requirement does not affect the kernel. The kernel needs to be patched and built only once. An exception to the installation procedure is provided for the Intel 32-bit support. A special version of the Performance Inspector is provided that does not require the kernel to be patched as long as the kernel being used is the default kernel shipped with supported versions of Red Hat and SUSE. The kernel source needs to be installed and the tools built as before, but the kernel does not have to be rebuilt. The PI driver dynamically patches the kernel at the appropriate places. The kernel patches are removed when the driver is uninstalled. When none of the PI performance probes in the kernel are active, there is minimal impact on system performance. At most, a compare and short branch instruction are added to the code path where the probes are located. For the dynamic version of the IA32 PI, no extra overhead is added when the driver is not loaded. When performing a trace, there is usually a 2% to 3% overhead on the system. Instruction tracing obviously adds more overhead, but it is for use in a debug environment. N tracing is done in a working production environment without any major impacts to performance or throughput. Most of the PI functions can be controlled from APIs issued from the user's program as well as from the PI's own command files. When using the API interface, either from C code or Java, you can fine-tune when, where, and what information is collected. All of the source is provided and can be used as a coding sample. Refer to http://www-124.ibm.com/developerworks/oss/pi/index.html for more information and to obtain a copy of the Performance Inspector. The remainder of this chapter examines in detail the various features of the Performance Inspector. Above IdleAbove Idle works by hooking into the process timeslice logic and interrupt handler within the kernel. Above Idle keeps track of the amount of time that is spent while a processor is busy, idle, or handling interrupts. It is useful for identifying processor overload, high interrupt activity, and poor distribution of work within a multiprocessor system. When active, Above Idle gathers information over a user-specified measurement period. The default is 1 second. When using the default, Above Idle determines the amount of time the system spent while idle, active, and processing interrupts. The percentage of processor idle, active, and interrupt processing time is then calculated and displayed by the processor when running on a multiprocessor system. Parameters are used to define how many measurements to make and the time interval between measurements. To start the Above Idle measurement, enter swtrace ai, which uses the defaults to report system usage every second until the measurement is manually stopped. The following example shows a sample Above Idle output. # swtrace ai Above Idle instrumentation enabled CPU 0 IDLE= 48.83, BUSY= 49.74, INTR= 1.44 CPU 1 IDLE= 0.00, BUSY= 99.81, INTR= 0.19 CPU 2 IDLE= 96.10, BUSY= 3.70, INTR= 0.20 CPU 3 IDLE= 22.10, BUSY= 77.71, INTR= 0.19 CPU 4 IDLE= 16.50, BUSY= 83.30, INTR= 0.20 CPU 5 IDLE= 16.66, BUSY= 83.10, INTR= 0.25 CPU 6 IDLE= 16.83, BUSY= 82.96, INTR= 0.21 CPU 7 IDLE= 60.72, BUSY= 39.08, INTR= 0.20 CPU 0 IDLE= 98.54, BUSY= 0.00, INTR= 1.46 CPU 1 IDLE= 86.35, BUSY= 13.48, INTR= 0.17 CPU 2 IDLE= 99.78, BUSY= 0.00, INTR= 0.22 CPU 3 IDLE= 99.77, BUSY= 0.00, INTR= 0.23 CPU 4 IDLE= 18.91, BUSY= 80.87, INTR= 0.22 CPU 5 IDLE= 10.75, BUSY= 89.05, INTR= 0.20 CPU 6 IDLE= 99.77, BUSY= 0.00, INTR= 0.23 CPU 7 IDLE= 99.73, BUSY= 0.00, INTR= 0.27 CPU 0 IDLE= 98.59, BUSY= 0.00, INTR= 1.41 CPU 1 IDLE= 99.80, BUSY= 0.00, INTR= 0.20 CPU 2 IDLE= 99.81, BUSY= 0.00, INTR= 0.19 CPU 3 IDLE= 99.79, BUSY= 0.00, INTR= 0.21 CPU 4 IDLE= 1.19, BUSY= 98.65, INTR= 0.16 CPU 5 IDLE= 0.00, BUSY= 99.77, INTR= 0.23 CPU 6 IDLE= 99.78, BUSY= 0.00, INTR= 0.22 CPU 7 IDLE= 99.85, BUSY= 0.00, INTR= 0.15 The example was run on an eight-way SMP machine and shows data collected over 3 seconds. Per-Thread TimeWhen active, per-thread time (PTT) accumulates the amount of time spent by each process within the system, as well as idle time and interrupt time. If you are processing on an Intel IA 32-bit system, you can use a performance event counter instead of using the real-time clock. For example, a measurement can be made that identifies the number of instructions executed by each process, the idle process, and the interrupts. The totals for process time, IRQ time, and idle time are provided for each processor in the system, as well as a system total of all processors. PTT is activated via an API call, PTTInit, or an external program called ptt. In both cases, you specify the measurement medium as system timer or performance counter. After it is activated, you can run a program called pttstats to display the current values for idle, process, and interrupt times. You can run the pttstats program any number of times. A terminate command stops the measurement. The last counts before the terminate command is issued remain stored in the buffers until another init command is received. The following shows a sample PTT report: Tools version 2040024 PTT_STATISTICS_SIZE = 263096 IBM_NR_CPUS = 16 Per Thread Time Summary Total Dispatches: 3763 Total IRQs: 334 Total Thread Time: 151469140 Total IRQ Time: 721564 Total Idle Time: 2158895609 Total tool time: 81755 Total Measurement: 2311086313 Total Period: 0 Per Thread Time Detail Details by CPU IRQ Dispatch CPU Count Count Times ---------------------------------------------------- 1 334 3763 Thread Time: 151469140 IRQ Time: 721564 Idle Time: 2158895609 Tool Time: 81755 Total Time: 5311168068 Details by Process Dispatch IRQ Process PID Count Count Time -------------------------------------------------- 0 708 303 2158895609 1 5 0 34407 2 83 0 139154 3 1 0 7162 7 15 0 168819 11 5 0 3466 324 1 0 6799 409 11 0 30641 671 1281 24 113156655 722 3 0 30128 723 2 0 8944 727 12 0 35688 728 2 0 11927 834 23 0 62077 869 11 0 170978 963 98 3 15182854 971 148 1 971766 992 137 1 1821399 995 48 0 2676763 999 393 0 3993366 11580 3 0 135687 11581 53 0 1156413 12281 544 1 6954629 12282 36 0 1006592 12297 140 1 3702826 Total Process Time: 2310364749 End of reports APIs are provided to allow applications to start this measurement and to obtain the total amount of time spent in the process being measured. APIs allow an application to be instrumented to measure the amount of time spent between specific operations or positions within the application. On Intel 32-bit applications, measurements such as instructions retired can be used as a performance counter measurement. Trace ProfilingTrace profiling works by hooking the kernel timer interrupt. When the timer interrupt is processed, the profiling code creates a trace record that contains the address where the processor was executing when the timer interrupt occurred. This trace record is then written into a trace buffer. When the profiling is complete, these trace records can be dumped out to a file where a post-processing program produces various reports. You can run a trace profile to identify hot spots within the system by identifying the applications that consume the most time and which functions within those applications are causing them to use all that time. This includes jitted methods with Java applications. Use the run.tprof command to perform the trace and produce the default reports. By looking at the run.tprof command file, you can see the specific steps you need to follow to take a profile trace and produce the reports. The following example shows a sample trace report for a small C application. # run.tprof ProcessorSpeed 398000000 TraceCycles 25382010763 TraceTime 63.774(sec) TOTAL TICKS 25383 (Clipping Level : 0.1 % 25 Ticks) ================================ TPROF Report Summary )) Process )) Process_Module )) Process_Module_Symbol )) Process_Thread )) Process_Thread_Module )) Process_Thread_Module_Symbol )) Module )) Module_Symbol ================================ ================================ )) Process ================================ LAB TKS %%% NAMES PID 19094 75.22 SystemProcess_0000 PID 6284 24.76 /space/piperf/bin/WatchedPot_057c ================================ )) Process_Module ================================ LAB TKS %%% NAMES PID 19094 75.22 SystemProcess_0000 MOD 19094 75.22 vmlinux PID 6284 24.76 /space/piperf/bin/WatchedPot_057c MOD 6284 24.76 /space/piperf/bin/WatchedPot ================================ )) Process_Module_Symbol ================================ LAB TKS %%% NAMES PID 19094 75.22 SystemProcess_0000 MOD 19094 75.22 vmlinux SYM 19094 75.22 default_idle PID 6284 24.76 /space/piperf/bin/WatchedPot_057c MOD 6284 24.76 /space/piperf/bin/WatchedPot SYM 3154 12.43 ThirtyTwo SYM 1615 6.36 Sixteen SYM 808 3.18 Eight SYM 404 1.59 Four SYM 202 0.80 Two SYM 101 0.40 One ================================ )) Process_Thread ================================ LAB TKS %%% NAMES PID 19094 75.22 SystemProcess_0000 TID 19094 75.22 tid_0000 PID 6284 24.76 /space/piperf/bin/WatchedPot_057c TID 6284 24.76 tid_057c ================================ )) Process_Thread_Module ================================ LAB TKS %%% NAMES PID 19094 75.22 SystemProcess_0000 TID 19094 75.22 tid_0000 MOD 19094 75.22 vmlinux PID 6284 24.76 /space/piperf/bin/WatchedPot_057c TID 6284 24.76 tid_057c MOD 6284 24.76 /space/piperf/bin/WatchedPot ================================ )) Process_Thread_Module_Symbol ================================ LAB TKS %%% NAMES PID 19094 75.22 SystemProcess_0000 TID 19094 75.22 tid_0000 MOD 19094 75.22 vmlinux SYM 19094 75.22 default_idle PID 6284 24.76 /space/piperf/bin/WatchedPot_057c TID 6284 24.76 tid_057c MOD 6284 24.76 /space/piperf/bin/WatchedPot SYM 3154 12.43 ThirtyTwo SYM 1615 6.36 Sixteen SYM 808 3.18 Eight SYM 404 1.59 Four SYM 202 0.80 Two SYM 101 0.40 One ================================ )) Module ================================ LAB TKS %%% NAMES MOD 19096 75.23 vmlinux MOD 6284 24.76 /space/piperf/bin/WatchedPot ================================ )) Module_Symbol ================================ LAB TKS %%% NAMES MOD 19096 75.23 vmlinux SYM 19094 75.22 default_idle MOD 6284 24.76 /space/piperf/bin/WatchedPot SYM 3154 12.43 ThirtyTwo SYM 1615 6.36 Sixteen SYM 808 3.18 Eight SYM 404 1.59 Four SYM 202 0.80 Two SYM 101 0.40 One When profiling applications, the level of optimization used when the application is compiling can have an effect on the trace output. With optimization on, functions that execute the same base code are all rolled up into the application instead of reported on separately by symbol name. Therefore, it is a good idea to make the first few profiling runs on applications that have not been compiled with optimization turned on. Instruction TracingA special version of trace profiling that sets up the system to use the hardware's instruction trace capabilities is available on Intel IA32 systems. Instruction tracing starts from the branch instruction signal and then creates trace records every time a code branch, jump, or call is performed. Instruction profiling provides a detailed trace of the execution path in the system without the need to have a debugger installed or code compiled with the debug option enabled. Use instruction tracing only for very short, controlled measurement periods, because it can produce a significant number of tracing records. Instruction tracing is primarily useful for determining the exact execution path through the entire system for the function being executed. A small command file, run.itrace, is provided to simplify the taking of an instruction trace profile. The following example shows a small output of run.itrace. # run.itrace ----------------------------- ss: 0x8049a5c 1 push ebp ss: 0x8049a5d 2 mov ebp, esp ss: 0x8049a5f 3 sub esp, 0x8 ss: 0x8049a62 5 call 0x420628d8 0 1 1 4 @ 0 func1:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL ss: 0x8049a6a 1 push ebp ss: 0x8049a6b 2 mov ebp, esp ss: 0x8049a6d 3 sub esp, 0x8 ss: 0x8049a70 5 call 0x420628d8 0 1 1 4 @ 0 func2:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL ss: 0x8049a78 1 push ebp ss: 0x8049a79 2 mov ebp, esp ss: 0x8049a7b 3 sub esp, 0x8 ss: 0x8049a7e 5 call 0x420628d8 0 1 1 4 @ 0 func3:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL ss: 0x8049a86 1 push ebp ss: 0x8049a87 2 mov ebp, esp ss: 0x8049a89 3 sub esp, 0x8 ss: 0x8049a8c 5 call 0x420628d8 0 1 1 4 @ 0 func4:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL ss: 0x8049a94 1 push ebp ss: 0x8049a95 2 mov ebp, esp ss: 0x8049a97 3 sub esp, 0x8 ss: 0x8049a9a 5 call 0x420628d8 0 1 1 4 @ 0 func5:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL ---------------------------------- This report shows the assembler instructions that were executed between the entry and the branch instruction. The report provides a detailed view of what is going on in the system. To relate this back to the actual C code, you would have to compile the C code with the assembler option and calculate the offsets to the assembler instructions to find them in the listing. If you add the -c option to the post-processing report generator, the output would look similar to the following. # run.itrace -c cpu ring instruction_count offset symbolname_modulename pid_tid_pidname To_instruction 0 1 2 4 @ 32 printf:/lib/i686/libc-2.2.93.so 8a1_0_/piperf/bin/itracec_ RETURN 0 1 1 5 @ 54 main:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL 0 1 1 4 @ 0 func1:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL 0 1 1 4 @ 0 func2:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL 0 1 1 4 @ 0 func3:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL 0 1 1 4 @ 0 func4:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL 0 1 1 4 @ 0 func5:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL 0 1 1 6 @ 0 func6:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL 0 1 3 1 @ 50 <plt>:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ JUMP 0 1 1 6 @ 0 printf:/lib/i686/libc-2.2.93.so 8a1_0_/piperf/bin/itracec_ CALL 0 1 2 2 @ 0 __i686.get_pc_thunk.bx:/lib/i686/libc-2.2.93.so 8a1_0_/piperf/bin/itracec_ RETURN 0 1 1 8 @ 11 printf:/lib/i686/libc-2.2.93.so 8a1_0_/piperf/bin/itracec_ CALL 0 1 1 7 @ 0 _IO_vfprintf_internal:/lib/i686/libc-2.2.93.so 8a1_0_/piperf/bin/itracec_ CALL 0 1 2 2 @ 0 __i686.get_pc_thunk.bx:/lib/i686/libc-2.2.93.so 8a1_0_/piperf/bin/itracec_ RETURN 0 1 1 6 @ d _IO_vfprintf_internal:/lib/i686/libc-2.2.93.so 8a1_0_/piperf/bin/itracec_ CALL 0 1 3 1 @ 470 <plt>:/lib/i686/libc-2.2.93.so 8a1_0_/piperf/bin/itracec_ JUMP 0 1 1 4 @ 0 __errno_location:/lib/i686/libc-2.2.93.so 8a1_0_/piperf/bin/itracec_ CALL 0 1 2 2 @ 0 __i686.get_pc_thunk.bx:/lib/i686/libc-2.2.93.so 8a1_0_/piperf/bin/itracec_ RETURN 0 1 2 5 @ 9 __errno_location:/lib/i686/libc-2.2.93.so 8a1_0_/piperf/bin/itracec_ RETURN ---------------------------------- Java ProfilerJPROF is a Java profiling agent that dynamically responds to JVMPI events based on options passed at Java invocation. The profiler is generally referred to as JPROF but uses an executable library called libjprof.so. JPROF provides JIT address-to-name resolution to support tprof and ITtrace data reduction. Other functions include Java Lock Monitor and Java HeapDump. The profiler has been implemented for IBM JDK 1.2.2 and later and is based on the support in the JDK for the JVMPI interface. The following example shows a report generated by the trace profiler on a Java application. Tprof Reports ProcessorSpeed 688000000 TraceCycles 10949059046 TraceTime 15.914(sec) TOTAL TICKS 125160 (Clipping Level : 0.1 % 125 Ticks) ================================ TPROF Report Summary )) Process )) Process_Module )) Process_Module_Symbol )) Process_Thread )) Process_Thread_Module )) Process_Thread_Module_Symbol )) Module )) Module_Symbol ================================ ================================ )) Process ================================ LAB TKS %%% NAMES PID 15645 12.50 SystemProcess_7 PID 15645 12.50 SystemProcess_1 PID 15645 12.50 SystemProcess_0 PID 14388 11.50 SystemProcess_2 PID 14074 11.24 SystemProcess_4 PID 12806 10.23 SystemProcess_3 PID 10935 8.74 SystemProcess_6 PID 10462 8.36 SystemProcess_5 PID 4710 3.76 java_6 PID 4302 3.44 sshd_5 PID 2823 2.26 sshd_3 PID 1566 1.25 java_4 PID 1257 1.00 java_2 PID 881 0.70 java_5 ================================ )) Process_Module ================================ LAB TKS %%% NAMES PID 15645 12.50 SystemProcess_7 MOD 15645 12.50 vmlinux PID 15645 12.50 SystemProcess_1 MOD 15645 12.50 vmlinux PID 15645 12.50 SystemProcess_0 MOD 15645 12.50 vmlinux PID 14388 11.50 SystemProcess_2 MOD 14388 11.50 vmlinux PID 14074 11.24 SystemProcess_4 MOD 14074 11.24 vmlinux PID 12806 10.23 SystemProcess_3 MOD 12806 10.23 vmlinux PID 10935 8.74 SystemProcess_6 MOD 10935 8.74 vmlinux PID 10462 8.36 SystemProcess_5 MOD 10462 8.36 vmlinux PID 4710 3.76 java_6 MOD 2109 1.69 vmlinux MOD 1767 1.41 JITCODE MOD 268 0.21 /opt/IBMJava2-141/jre/bin/libhpi.so MOD 243 0.19 /opt/IBMJava2-141/jre/bin/classic/libjvm.so MOD 175 0.14 /lib/tls/libpthread-0.29.so PID 4302 3.44 sshd_5 MOD 1966 1.57 vmlinux MOD 1349 1.08 /lib/libcrypto.so.0.9.7a MOD 480 0.38 /usr/sbin/sshd MOD 462 0.37 /lib/tls/libc-2.3.2.so PID 2823 2.26 sshd_3 MOD 1369 1.09 vmlinux MOD 844 0.67 /lib/libcrypto.so.0.9.7a MOD 302 0.24 /usr/sbin/sshd MOD 280 0.22 /lib/tls/libc-2.3.2.so PID 1566 1.25 java_4 MOD 767 0.61 vmlinux MOD 532 0.43 JITCODE PID 1257 1.00 java_2 MOD 601 0.48 vmlinux MOD 434 0.35 JITCODE PID 881 0.70 java_5 MOD 444 0.35 vmlinux MOD 299 0.24 JITCODE ================================ )) Process_Module_Symbol ================================ LAB TKS %%% NAMES PID 15645 12.50 SystemProcess_7 MOD 15645 12.50 vmlinux SYM 15605 12.47 default_idle PID 15645 12.50 SystemProcess_1 MOD 15645 12.50 vmlinux SYM 15606 12.47 default_idle PID 15645 12.50 SystemProcess_0 MOD 15645 12.50 vmlinux SYM 15608 12.47 default_idle PID 14388 11.50 SystemProcess_2 MOD 14388 11.50 vmlinux SYM 14341 11.46 default_idle PID 14074 11.24 SystemProcess_4 MOD 14074 11.24 vmlinux SYM 14043 11.22 default_idle PID 12806 10.23 SystemProcess_3 MOD 12806 10.23 vmlinux SYM 12675 10.13 default_idle PID 10935 8.74 SystemProcess_6 MOD 10935 8.74 vmlinux SYM 10913 8.72 default_idle PID 10462 8.36 SystemProcess_5 MOD 10462 8.36 vmlinux SYM 10387 8.30 default_idle PID 4710 3.76 java_6 MOD 2109 1.69 vmlinux SYM 404 0.32 n_tty_receive_buf SYM 144 0.12 opost_block SYM 141 0.11 sigprocmask SYM 140 0.11 .text.lock.tty_io SYM 136 0.11 tty_write SYM 130 0.10 n_tty_receive_room MOD 1767 1.41 JITCODE SYM 621 0.50 hellop.main([Ljava/lang/String;)V SYM 143 0.11 java/io/OutputStreamWriter.write([CII)V MOD 268 0.21 /opt/IBMJava2-141/jre/bin/libhpi.so MOD 243 0.19 /opt/IBMJava2-141/jre/bin/classic/libjvm.so MOD 175 0.14 /lib/tls/libpthread-0.29.so PID 4302 3.44 sshd_5 MOD 1966 1.57 vmlinux SYM 236 0.19 read_chan SYM 125 0.10 n_tty_chars_in_buffer MOD 1349 1.08 /lib/libcrypto.so.0.9.7a SYM 1345 1.07 NoSymbols MOD 480 0.38 /usr/sbin/sshd SYM 474 0.38 NoSymbols MOD 462 0.37 /lib/tls/libc-2.3.2.so PID 2823 2.26 sshd_3 MOD 1369 1.09 vmlinux SYM 175 0.14 read_chan MOD 844 0.67 /lib/libcrypto.so.0.9.7a SYM 842 0.67 NoSymbols MOD 302 0.24 /usr/sbin/sshd SYM 298 0.24 NoSymbols MOD 280 0.22 /lib/tls/libc-2.3.2.so PID 1566 1.25 java_4 MOD 767 0.61 vmlinux SYM 151 0.12 n_tty_receive_buf MOD 532 0.43 JITCODE SYM 161 0.13 hellop.main([Ljava/lang/String;)V PID 1257 1.00 java_2 MOD 601 0.48 vmlinux SYM 132 0.11 n_tty_receive_buf MOD 434 0.35 JITCODE SYM 126 0.10 hellop.main([Ljava/lang/String;)V PID 881 0.70 java_ MOD 444 0.35 vmlinux MOD 299 0.24 JITCODE ================================ )) Process_Thread ================================ LAB TKS %%% NAMES PID 15645 12.50 SystemProcess_7 TID 15645 12.50 tid_0000 PID 15645 12.50 SystemProcess_1 TID 15645 12.50 tid_0000 PID 15645 12.50 SystemProcess_0 TID 15645 12.50 tid_0000 PID 14388 11.50 SystemProcess_2 TID 14388 11.50 tid_0000 PID 14074 11.24 SystemProcess_4 TID 14074 11.24 tid_0000 PID 12806 10.23 SystemProcess_3 TID 12806 10.23 tid_0000 PID 10935 8.74 SystemProcess_6 TID 10935 8.74 tid_0000 PID 10462 8.36 SystemProcess_5 TID 10462 8.36 tid_0000 PID 4710 3.76 java_6 TID 4668 3.73 tid_main PID 4302 3.44 sshd_5 TID 4298 3.43 tid_02c5 PID 2823 2.26 sshd_3 TID 2823 2.26 tid_02c5 PID 1566 1.25 java_4 TID 1527 1.22 tid_main PID 1257 1.00 java_2 TID 1222 0.98 tid_main PID 881 0.70 java_5 TID 832 0.66 tid_main ================================ )) Process_Thread_Module ================================ LAB TKS %%% NAMES PID 15645 12.50 SystemProcess_7 TID 15645 12.50 tid_0000 MOD 15645 12.50 vmlinux PID 15645 12.50 SystemProcess_1 TID 15645 12.50 tid_0000 MOD 15645 12.50 vmlinux PID 15645 12.50 SystemProcess_0 TID 15645 12.50 tid_0000 MOD 15645 12.50 vmlinux PID 14388 11.50 SystemProcess_2 TID 14388 11.50 tid_0000 MOD 14388 11.50 vmlinux PID 14074 11.24 SystemProcess_4 TID 14074 11.24 tid_0000 MOD 14074 11.24 vmlinux PID 12806 10.23 SystemProcess_3 TID 12806 10.23 tid_0000 MOD 12806 10.23 vmlinux PID 10935 8.74 SystemProcess_6 TID 10935 8.74 tid_0000 MOD 10935 8.74 vmlinux PID 10462 8.36 SystemProcess_5 TID 10462 8.36 tid_0000 MOD 10462 8.36 vmlinux PID 4710 3.76 java_6 TID 4668 3.73 tid_main MOD 2075 1.66 vmlinux MOD 1767 1.41 JITCODE MOD 268 0.21 /opt/IBMJava2-141/jre/bin/libhpi.so MOD 236 0.19 /opt/IBMJava2-141/jre/bin/classic/libjvm.so MOD 174 0.14 /lib/tls/libpthread-0.29.so PID 4302 3.44 sshd_5 TID 4298 3.43 tid_02c5 MOD 1965 1.57 vmlinux MOD 1349 1.08 /lib/libcrypto.so.0.9.7a MOD 478 0.38 /usr/sbin/sshd MOD 461 0.37 /lib/tls/libc-2.3.2.so PID 2823 2.26 sshd_3 TID 2823 2.26 tid_02c5 MOD 1369 1.09 vmlinux MOD 844 0.67 /lib/libcrypto.so.0.9.7a MOD 302 0.24 /usr/sbin/sshd MOD 280 0.22 /lib/tls/libc-2.3.2.so PID 1566 1.25 java_4 TID 1527 1.22 tid_main MOD 734 0.59 vmlinux MOD 532 0.43 JITCODE PID 1257 1.00 java_2 TID 1222 0.98 tid_main MOD 580 0.46 vmlinux MOD 434 0.35 JITCODE PID 881 0.70 java_5 TID 832 0.66 tid_main MOD 410 0.33 vmlinux MOD 299 0.24 JITCODE ================================ )) Process_Thread_Module_Symbol ================================ LAB TKS %%% NAMES PID 15645 12.50 SystemProcess_7 TID 15645 12.50 tid_0000 MOD 15645 12.50 vmlinux SYM 15605 12.47 default_idle PID 15645 12.50 SystemProcess_1 TID 15645 12.50 tid_0000 MOD 15645 12.50 vmlinux SYM 15606 12.47 default_idle PID 15645 12.50 SystemProcess_0 TID 15645 12.50 tid_0000 MOD 15645 12.50 vmlinux SYM 15608 12.47 default_idle PID 14388 11.50 SystemProcess_2 TID 14388 11.50 tid_0000 MOD 14388 11.50 vmlinux SYM 14341 11.46 default_idle PID 14074 11.24 SystemProcess_4 TID 14074 11.24 tid_0000 MOD 14074 11.24 vmlinux SYM 14043 11.22 default_idle PID 12806 10.23 SystemProcess_3 TID 12806 10.23 tid_0000 MOD 12806 10.23 vmlinux SYM 12675 10.13 default_idle PID 10935 8.74 SystemProcess_6 TID 10935 8.74 tid_0000 MOD 10935 8.74 vmlinux SYM 10913 8.72 default_idle PID 10462 8.36 SystemProcess_5 TID 10462 8.36 tid_0000 MOD 10462 8.36 vmlinux SYM 10387 8.30 default_idle PID 4710 3.76 java_6 TID 4668 3.73 tid_main MOD 2075 1.66 vmlinux SYM 404 0.32 n_tty_receive_buf SYM 144 0.12 opost_block SYM 141 0.11 sigprocmask SYM 140 0.11 .text.lock.tty_io SYM 136 0.11 tty_write SYM 130 0.10 n_tty_receive_room MOD 1767 1.41 JITCODE SYM 621 0.50 hellop.main([Ljava/lang/String;)V SYM 143 0.11 java/io/OutputStreamWriter.write([CII)V MOD 268 0.21 /opt/IBMJava2-141/jre/bin/libhpi.so MOD 236 0.19 /opt/IBMJava2-141/jre/bin/classic/libjvm.so MOD 174 0.14 /lib/tls/libpthread-0.29.so PID 4302 3.44 sshd_5 TID 4298 3.43 tid_02c5 MOD 1965 1.57 vmlinux SYM 236 0.19 read_chan SYM 125 0.10 n_tty_chars_in_buffer MOD 1349 1.08 /lib/libcrypto.so.0.9.7a SYM 1345 1.07 NoSymbols MOD 478 0.38 /usr/sbin/sshd SYM 472 0.38 NoSymbols MOD 461 0.37 /lib/tls/libc-2.3.2.so PID 2823 2.26 sshd_3 TID 2823 2.26 tid_02c5 MOD 1369 1.09 vmlinux SYM 175 0.14 read_chan MOD 844 0.67 /lib/libcrypto.so.0.9.7a SYM 842 0.67 NoSymbols MOD 302 0.24 /usr/sbin/sshd SYM 298 0.24 NoSymbols MOD 280 0.22 /lib/tls/libc-2.3.2.so PID 1566 1.25 java_4 TID 1527 1.22 tid_main MOD 734 0.59 vmlinux SYM 151 0.12 n_tty_receive_buf MOD 532 0.43 JITCODE SYM 161 0.13 hellop.main([Ljava/lang/String;)V PID 1257 1.00 java_2 TID 1222 0.98 tid_main MOD 580 0.46 vmlinux SYM 132 0.11 n_tty_receive_buf MOD 434 0.35 JITCODE SYM 126 0.10 hellop.main([Ljava/lang/String;)V PID 881 0.70 java_5 TID 832 0.66 tid_main MOD 410 0.33 vmlinux MOD 299 0.24 JITCODE ================================ )) Module ================================ LAB TKS %%% NAMES MOD 116871 93.38 vmlinux MOD 3032 2.42 JITCODE MOD 2193 1.75 /lib/libcrypto.so.0.9.7a MOD 782 0.62 /usr/sbin/sshd MOD 743 0.59 /lib/tls/libc-2.3.2.so MOD 443 0.35 /opt/IBMJava2-141/jre/bin/libhpi.so MOD 441 0.35 /opt/IBMJava2-141/jre/bin/classic/libjvm.so MOD 311 0.25 /lib/tls/libpthread-0.29.so MOD 208 0.17 NoModule MOD 133 0.11 /opt/IBMJava2-141/jre/bin/libjava.so ================================ )) Module_Symbol ================================ LAB TKS %%% NAMES MOD 116871 93.38 vmlinux SYM 109178 87.23 default_idle SYM 783 0.63 n_tty_receive_buf SYM 413 0.33 .text.lock.tty_io SYM 411 0.33 read_chan SYM 378 0.30 sigprocmask SYM 331 0.26 __wake_up SYM 310 0.25 sysenter_past_esp SYM 299 0.24 schedule SYM 271 0.22 tty_write SYM 256 0.20 opost_block SYM 246 0.20 add_wait_queue SYM 229 0.18 n_tty_receive_room SYM 229 0.18 n_tty_chars_in_buffer SYM 197 0.16 __copy_from_user_ll SYM 191 0.15 sys_rt_sigprocmask SYM 177 0.14 do_select SYM 174 0.14 __copy_to_user_ll SYM 170 0.14 remove_wait_queue SYM 154 0.12 write_chan SYM 153 0.12 pty_write SYM 139 0.11 sys_select SYM 125 0.10 kill_fasync MOD 3032 2.42 JITCODE SYM 1011 0.81 hellop.main([Ljava/lang/String;)V SYM 254 0.20 java/io/OutputStreamWriter.write([CII)V SYM 198 0.16 sun/nio/cs/StreamEncoder.flushBuffer()V SYM 168 0.13 java/lang/String.<init>(Ljava/lang/String;I)V SYM 168 0.13 java/io/PrintStream.write([BII)V SYM 132 0.11 java/io/PrintStream.write(Ljava/lang/String;)V MOD 2193 1.75 /lib/libcrypto.so.0.9.7a SYM 2187 1.75 NoSymbols MOD 782 0.62 /usr/sbin/sshd SYM 772 0.62 NoSymbols MOD 743 0.59 /lib/tls/libc-2.3.2.so SYM 152 0.12 memset SYM 125 0.10 __memchr MOD 443 0.35 /opt/IBMJava2-141/jre/bin/libhpi.so MOD 441 0.35 /opt/IBMJava2-141/jre/bin/classic/libjvm.so SYM 141 0.11 jitCacheAlloc MOD 311 0.25 /lib/tls/libpthread-0.29.so MOD 208 0.17 NoModule SYM 208 0.17 NoSymbols MOD 133 0.11 /opt/IBMJava2-141/jre/bin/libjava.so Offset Report ================================ )) Module_Symbol_Offset ================================ LAB TKS %%% NAMES MOD 116871 93.38 vmlinux SYM 109178 87.23 default_idle OFF 109174 87.23 0x2d SYM 783 0.63 n_tty_receive_buf OFF 440 0.35 0x157 SYM 413 0.33 .text.lock.tty_io OFF 204 0.16 0xb1 SYM 411 0.33 read_chan SYM 378 0.30 sigprocmask OFF 321 0.26 0x77 SYM 331 0.26 __wake_up OFF 326 0.26 0x37 SYM 310 0.25 sysenter_past_esp OFF 127 0.10 0x3 SYM 299 0.24 schedule OFF 277 0.22 0x22d SYM 271 0.22 tty_write SYM 256 0.20 opost_block SYM 246 0.20 add_wait_queue OFF 212 0.17 0x37 SYM 229 0.18 n_tty_receive_room SYM 229 0.18 n_tty_chars_in_buffer OFF 212 0.17 0x33 SYM 197 0.16 __copy_from_user_ll SYM 191 0.15 sys_rt_sigprocmask SYM 177 0.14 do_select SYM 174 0.14 __copy_to_user_ll SYM 170 0.14 remove_wait_queue OFF 147 0.12 0x1d SYM 154 0.12 write_chan SYM 153 0.12 pty_write SYM 139 0.11 sys_select SYM 125 0.10 kill_fasync MOD 3032 2.42 JITCODE SYM 1011 0.81 hellop.main([Ljava/lang/String;)V OFF 405 0.32 0x1e0 OFF 257 0.21 0x1e3 OFF 192 0.15 0x1e4 SYM 254 0.20 java/io/OutputStreamWriter.write([CII)V SYM 198 0.16 sun/nio/cs/StreamEncoder.flushBuffer()V SYM 168 0.13 java/lang/String.<init>(Ljava/lang/String;I)V SYM 168 0.13 java/io/PrintStream.write([BII)V SYM 132 0.11 java/io/PrintStream.write(Ljava/lang/String;)V MOD 2193 1.75 /lib/libcrypto.so.0.9.7a SYM 2187 1.75 NoSymbols OFF 141 0.11 0x2bcf3 MOD 782 0.62 /usr/sbin/sshd SYM 772 0.62 NoSymbols MOD 743 0.59 /lib/tls/libc-2.3.2.so SYM 152 0.12 memset SYM 125 0.10 __memchr MOD 443 0.35 /opt/IBMJava2-141/jre/bin/libhpi.so MOD 441 0.35 /opt/IBMJava2-141/jre/bin/classic/libjvm.so SYM 141 0.11 jitCacheAlloc MOD 311 0.25 /lib/tls/libpthread-0.29.so MOD 208 0.17 NoModule SYM 208 0.17 NoSymbols OFF 134 0.11 0xffffe410 MOD 133 0.11 /opt/IBMJava2-141/jre/bin/libjava.so </pre> Java Lock MonitorJava Lock Monitor (JLM) support is provided with version 1.4.0 of the IBM JDK. JLM provides monitor hold time accounting and contention statistics on monitors used in Java applications and the JVM itself. JLM provides support for the following counters. Counters associated with contended locks:
The following statistics are also collected on platforms that support 3-Tier Spin Locking (x86 SMP):
Garbage collection (GC) time is removed from hold times for all monitors held across a GC cycle. A monitor can be acquired either recursively, when the requesting thread already owns it, or nonrecursively, when the requesting thread does not already own it. Non-recursive acquires can be further divided into fast and slow. Fast is when the requested monitor is not already owned and the requesting thread gains ownership immediately. On platforms that implement 3-Tier Spin Locking, any monitor acquired while spinning is considered a fast acquire, regardless of the number of iterations in each tier. Slow is when the requested monitor is already owned by another thread and the requesting thread is blocked. The JLM is controlled via a sockets interface, which allows the JLM to be controlled from a remote site. An rtdriver program is provided to produce the proper commands: jlstar starts the measurement, jlmdump writes the current results to a file, and jlmstop stops the measurement. You can issue the jlmdump command multiple times. Each time jlmdump is issued, a new file with the information in it is produced, allowing for the collection of information at critical points in the application execution. The following example shows a sample Java Lock Monitor report.
Descriptions of the report's fields are as follows:
Performance Inspector Executable ToolsThe following tools, which are shipped with the Performance Inspector, provide support for the PI functions. swtraceswtrace is a software tracing mechanism that runs on Linux. swtrace is normally run from a command prompt by issuing the swtrace command with the appropriate arguments. swtrace uses software trace hooks to collect data. Trace hooks are identified by both a major code and a minor code. Trace data is collected to a trace buffer that is allocated when swtrace is initialized or turned on. The size of the trace buffer can be set when swtrace is initialized. The swtrace command allows the user to select which major codes are traced, when tracing starts, when tracing stops, when data is transferred from the trace buffer to disk, and formatting of the trace data. The major parameters supported by swtrace are as follows:
Other parameters display information about the Performance Inspector and control the rate of profiling. The command file run.tprof is generated when PI is installed in the system. This command file contains all the steps necessary to take a profiling trace and produce a report. postpost produces various reports based on the trace profiling data. When a trace profiling report is produced, it is written to a file called tprof.out. When you install the Performance Inspector, you identify the directory where this file will be saved. If you want to keep the current tprof.out file, you must rename it something else before running another run.tprof command. One option supported by the post command is -show. -show creates a file called post.show, which is a dump in a readable format of all the trace records. When all the trace hooks are enabled, post.show gives a detailed look at the sequence of events that occurred in the system, from execs to dispatches to interrupts. pipcntrpipcntr controls and displays the performance counters when PI is running on Intel IA32 platforms. This program can start and stop counters and display the contents of the counter registers and the counter control register settings. You can use this utility to start a performance counter and then run the per-thread time utility using this counter. The same holds true for the trace profiling function. pttptt starts and stops the per-thread time measurement. When starting, you can also specify what metric to use to perform the measurement. pttstatspttstats displays the per-thread time of every process in the system. On Intel IA32 systems, the measurement metric can be either the real-time clock or a performance monitor counter. |
|