Section 3.8. Java Processes

3.8. Java Processes

The following sections should shed some light on what your Java applications are doing. Topics such as profiling and tracing are discussed.

3.8.1. Process Stack on a Java Virtual Machine: `pstack`

You can use the C++ stack unmangler with Java virtual machine (JVM) targets to show the stacks for Java applications. The c++filt utility is provided with the Sun Workshop compiler tools.

$ pstack 27494 |c++filt 27494:  /usr/bin/java -client -verbose:gc -Xbatch -Xss256k -XX:+AggressiveHeap -----------------  lwp# 1 / thread# 1  --------------------  ff3409b4 pollsys  (0, 0, ffbfe858, 0)  ff2dcec8 poll     (0, 0, 1d4c0, 10624c00, 0, 0) + 7c  fed316d4 int os_sleep(long long,int) (0, 1d4c0, 1, ff3, 372c0, 0) + 148  fed2f6e4 int os::sleep(Thread*,long long,int) (372c0, 0, 1d4c0, 7, 4, ff14f934) + 284  fedc21e0 JVM_Sleep (2, ff14dd24, 0, 1d4c0, ff1470dc, 372c0) + 260  f8c0bc20 * java/lang/Thread.sleep(J)V+0  f8c0bbc4 * java/lang/Thread.sleep(J)V+0  f8c05764 * spec/jbb/JBButil.SecondsToSleep(J)V+11 (line 740)  f8c05764 * spec/jbb/Company.displayResultTotals(ZZ)V+235 (line 651)  f8c05764 * spec/jbb/JBBmain.DoARun(Lspec/jbb/Company;SSII)V+197 (line 277)  f8c05764 * spec/jbb/JBBmain.DOIT(Lspec/jbb/infra/Factory/Container;)V+186 (line 732)  f8c05764 * spec/jbb/JBBmain.main([Ljava/lang/String;)V+1220 (line 1019)  f8c00218 * StubRoutines (1)  fecd9f00 void JavaCalls::call_helper(JavaValue*,methodHandle*,JavaCallArgu- ments*,Thread*) (1, 372c0, ffbff018, ffbfef50, ffbff01c, 0) + 5b8  fedb8e84 jni_CallStaticVoidMethod (ff14dd24, ff1470dc, 3788c, 372c0, 0, 37488) + 514  000123b4 main     (ff14a040, 576d1a, fed2a6d0, 2, 2, 1d8) + 1314  00011088 _start   (0, 0, 0, 0, 0, 0) + 108

3.8.2. JVM Profiling

While the JVM has long included the -Xrunhprof profiling flag, the Java 2 Platform, Standard Edition (J2SE) 5.0 and later use the JVMTI for heap and CPU profiling. Usage information is obtained with the java -Xrunhprof command. This profiling flag includes a variety of options and returns a lot of data. As a result, using a large number of options can significantly impact application performance.

To observe locks, use the command in the following example. Note that setting monitor=y specifies that locks should be observed. Setting msa=y turns on Solaris microstate accounting (see Section 3.2.2, and Section 2.10.3 in Solaris^™ Internals), and depth=8 sets the depth of the stack displayed.

# java -Xrunhprof:cpu=times,monitor=y,msa=y,depth=8,file=path_to_result_file app_name MONITOR DUMP BEGIN\     THREAD 200000, trace 302389, status: CW\     THREAD 200001, trace 300000, status: R\     THREAD 201044, trace 302505, status: R\ .....     MONITOR Ljava/lang/StringBuffer;\         owner: thread 200058, entry count: 1\         waiting to enter:\         waiting to be notified:\ MONITOR DUMP END\ MONITOR TIME BEGIN (total = 2442 ms) Sat Nov  5 11:51:04 2005\ rank   self  accum   count trace monitor\    1 64.51% 64.51%     364 302089 java.lang.Class (Java)\    2 20.99% 85.50%     294 302094 java.lang.Class (Java)\    3  9.94% 95.44%     128 302027 sun.misc.Launcher$AppClassLoader (Java)\    4  4.17% 99.61%     164 302122 sun.misc.Launcher$AppClassLoader (Java)\    5  0.30% 99.90%      46 302158 sun.misc.Launcher$AppClassLoader (Java)\    6  0.05% 99.95%      14 302163 sun.misc.Launcher$AppClassLoader (Java)\    7  0.03% 99.98%      10 302202 sun.misc.Launcher$AppClassLoader (Java)\    8  0.02% 100.00%       4 302311 sun.misc.Launcher$AppClassLoader (Java)\ MONITOR TIME END\

This command returns verbose data, including all the call stacks in the Java process. Note two sections at the bottom of the output: the MONITOR DUMP and MONITOR TIME sections. The MONITOR DUMP section is a complete snapshot of all the monitors and threads in the system. MONITOR TIME is a profile of monitor contention obtained by measuring the time spent by a thread waiting to enter a monitor. Entries in this record are ranked by the percentage of total monitor contention time and a brief description of the monitor.

In previous versions of the JVM, one option is to dump all the stacks on the running VM by sending a SIGQUIT (signal number 3) to the Java process with the kill command. This dumps the stacks for all VM threads to the standard error as shown below.

# kill -3 <pid> Full thread dump Java HotSpot(TM) Client VM (1.4.1_06-b01 mixed mode): "Signal Dispatcher" daemon prio=10 tid=0xba6a8 nid=0x7 waiting on condition [0..0] "Finalizer" daemon prio=8 tid=0xb48b8 nid=0x4 in Object.wait() [f2b7f000..f2b7fc24]         at java.lang.Object.wait(Native Method)         - waiting on <f2c00490> (a java.lang.ref.ReferenceQueue$Lock)         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)         - locked <f2c00490> (a java.lang.ref.ReferenceQueue$Lock)         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127)         at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0xb2f88 nid=0x3 in Object.wait() [facff000..facffc24]         at java.lang.Object.wait(Native Method)         - waiting on <f2c00380> (a java.lang.ref.Reference$Lock)         at java.lang.Object.wait(Object.java:426)         at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:113)         - locked <f2c00380> (a java.lang.ref.Reference$Lock) "main" prio=5 tid=0x2c240 nid=0x1 runnable [ffbfe000..ffbfe5fc]         at testMain.doit2(testMain.java:12)         at testMain.main(testMain.java:64) "VM Thread" prio=5 tid=0xb1b30 nid=0x2 runnable "VM Periodic Task Thread" prio=10 tid=0xb9408 nid=0x5 runnable "Suspend Checker Thread" prio=10 tid=0xb9d58 nid=0x6 runnable

If the top of the stack for a number of threads terminates in a monitor call, this is the place to drill down and determine what resource is being contended. Sometimes removing a lock that protects a hot structure can require many architectural changes that are not possible. The lock might even be in a third-party library over which you have no control. In such cases, multiple instances of the application are probably the best way to achieve scaling.

3.8.3. Tuning Java Garbage Collection

Tuning garbage collection (GC) is one of the most important performance tasks for Java applications. To achieve acceptable response times, you will often have to tune GC. Doing that requires you to know the following:

Frequency of garbage collection events
Whether Young Generation or Full GC is used
Duration of the garbage collection
Amount of garbage generated

To obtain this data, add the -verbosegc, -XX:+PrintGCTimeStamps, and -XX:+PrintGCDetails flags to the regular JVM command line.

1953.954: [GC [PSYoungGen: 1413632K->37248K(1776640K)] 2782033K->1440033K(3316736K), 0.3666410 secs] 2018.424: [GC [PSYoungGen: 1477376K->37584K(1760640K)] 2880161K->1473633K(3300736K), 0.3825016 secs] 2018.806: [Full GC [PSYoungGen: 37584K->0K(1760640K)] [ParOldGen: 1436049K- >449978K(1540096K)] 147363 3K->449978K(3300736K) [PSPermGen: 4634K->4631K(16384K)], 5.3205801 secs] 2085.554: [GC [PSYoungGen: 1440128K->39968K(1808384K)] 1890106K->489946K(3348480K), 0.2442195 secs]

The preceding example indicates that at 2018 seconds a Young Generation GC cleaned 3.3 Gbytes and took .38 seconds to complete. This was quickly followed by a Full GC that took 5.3 seconds to complete.

On systems with many CPUs (or hardware threads), the increased throughput often generates significantly more garbage in the VM, and previous GC tuning may no longer be valid. Sometimes Full GC is generated where previously only Young Generation existed. Dump the GC details to a log file to confirm.

Avoid full GC whenever you can because it severely affects response time. Full GC is usually an indication that the Java heap is too small. Increase the heap size by using the -Xmx and -Xms options until Full GCs are no longer triggered. It is best to preallocate the heap by setting -Xmx and -Xms to the same value. For example, to set the Java heap to 3.5 Gbytes, add the -Xmx3550m, -Xms3550m, -Xmn2g, and -Xss128k options. The J2SE 1.5.0_06 release also introduced parallelism into the old GCs. Add the -XX:+UseParallelOldGC option to the standard JVM flags to enable this feature.

For Young Generation the number of parallel GC threads is the number of CPUs presented by the Solaris OS. On UltraSPARC T1 processor-based systems this equates to the number of threads. It may be necessary to scale back the number of threads involved in Young Generation GC to achieve response time constraints. To reduce the number of threads, you can set XX:ParallelGCThreads=number_of_threads.

A good starting point is to set the GC threads to the number of cores on the system. Putting it all together yields the following flags.

-Xmx3550m -Xms3550m -Xmn2g -Xss128k -XX:+UseParallelOldGC -XX:+UseParallelGC -XX:Paral- lelGCThreads=8 -XX:+PrintGCDetails -XX:+PrintGCTimestamps

Older versions of the Java virtual machine, such as 1.3, do not have parallel GC. This can be an issue on CMT processors because GC can stall the entire VM. Parallel GC is available from 1.4.2 onward, so this is a good starting point for Java applications on multiprocessor-based systems.

3.8.4. Using DTrace on Java Applications

The J2SE 6 (code-named Mustang) release introduces DTrace support within the Java HotSpot virtual machine. The providers and probes included in the Mustang release make it possible for DTrace to collect performance data for applications written in the Java programming language.

The Mustang release contains two built-in DTrace providers: hotspot and hotspot_jni. All probes published by these providers are user-level statically defined tracing (USDT) probes, accessed by the PID of the Java HotSpot virtual machine process.

The hotspot provider contains probes related to the following Java HotSpot virtual machine subsystems:

VM life cycle probes. For VM initialization and shutdown
Thread life cycle probes. For thread start and stop events
Class-loading probes. For class loading and unloading activity
Garbage collection probes. For systemwide garbage and memory pool collection
Method compilation probes. For indication of which methods are being compiled by which compiler
Monitor probes. For all wait and notification events, plus contended monitor entry and exit events
Application probes. For fine-grained examination of thread execution, method entry/method returns, and object allocation

All hotspot probes originate in the VM library (libjvm.so), and as such, are also provided from programs that embed the VM. The hotspot_jni provider contains probes related to the Java Native Interface (JNI), located at the entry and return points of all JNI methods. In addition, the DTrace jstack() action prints mixed-mode stack traces including both Java method and native function names.

As an example, the following D script (usestack.d) uses the DTrace jstack() action to print the stack trace.

#!/usr/sbin/dtrace -s BEGIN { this->cnt = 0; } syscall::pollsys:entry /pid == $1 && tid == 1/ {    this->cnt++;    printf("\n\tTID: %d", tid);    jstack(50); } syscall:::entry /this->cnt == 1/ {    exit(0); }

And the stack trace itself appears as follows.

# ./usejstack.d 1344 | c++filt CPU     ID                    FUNCTION:NAME  0    316                    pollsys:entry    TID: 1    libc.so.1`__pollsys+0xa    libc.so.1`poll+0x52    libjvm.so`int os_sleep(long long,int)+0xb4    libjvm.so`int os::sleep(Thread*,long long,int)+0x1ce    libjvm.so`JVM_Sleep+0x1bc    java/lang/Thread.sleep    dtest.method3    dtest.method2    dtest.method1    dtest.main    StubRoutines (1)    libjvm.so`void JavaCalls::call_helper(JavaValue*,methodHandle*,JavaCallArgu- ments*,Thread*)+0x1b5    libjvm.so`void os::os_exception_wrapper(void(*)(JavaValue*,methodHandle*,JavaCallAr- guments*,Thread*),JavaValue*,methodHandle*,Ja vaCallArguments*,Thread*)+0x18    libjvm.so`void JavaCalls::call(JavaValue*,methodHandle,JavaCallArgu- ments*,Thread*)+0x2d    libjvm.so`void jni_invoke_static(JNIEnv_*,JavaValue*,_jobject*,JNICallType,_ jmethodID*,JNI_ArgumentPush er*,Thread*)+0x214    libjvm.so`jni_CallStaticVoidMethod+0x244    java`main+0x642    StubRoutines (1)

The command line shows that the output from this script was piped to the c++filt utility, which demangles C++ mangled names, making the output easier to read. The DTrace header output shows that the CPU number is 0, the probe number is 316, the thread ID (TID) is 1, and the probe name is pollsys:entry, where pollsys is the name of the system call. The stack trace frames appear from top to bottom in the following order: two system call frames, three VM frames, five Java method frames, and VMframes in the remainder.

For further information on using DTrace with Java applications, see Section 10.3.

3.8. Java Processes

3.8.1. Process Stack on a Java Virtual Machine: pstack

3.8.2. JVM Profiling

3.8.3. Tuning Java Garbage Collection

3.8.4. Using DTrace on Java Applications

3.8.1. Process Stack on a Java Virtual Machine: `pstack`