The best way to speed up a sluggish JVM program is to get a faster JVM implementation. This gives you a performance boost without risking the correctness of your program.
JVMs are available from a variety of sources and run on many different kinds of computers. Even on the same computer, the performance of one JVM implementation can be 20 times faster than that of another. Performance differences between JVM implementations stem from a variety of sources.
The most important performance improvement in JVM implementations is the just-in-time code generator, or JIT. Non-JIT JVMs interpret each bytecode in the class as it is running. These JVMs are easy to write, but they are often slow, since the computer must interpret each bytecode as well as perform the operation the bytecode calls for. A JIT works by translating JVM bytecodes into native machine code for the platform. The native code runs faster because no CPU cycles need to be spent on the JVM interpreter itself.
Most compilers for languages other than Java translate the program into native code during compilation. These system-dependent binary programs are distributed in native code format. This means that each binary program is limited to the target platform it was compiled for. A JIT translates programs into native code after the classes are loaded. Native code is not generated until it is needed (thus the name just-in-time). This means that the program can be optimized for precisely the conditions on the computer when the program is run. If the computer is modified (for example, more memory is added), the next time the program is run the JIT can automatically adjust the program to take advantage of the additional memory.
A JIT compiler can make many optimizations to the instructions as it is compiling them. It can take advantage of the processor it is running on by using instructions specific to that processor. It can order the instructions to take maximum advantage of the pipelining the processor provides. (Pipelining enables one processor to run several sequential instructions simultaneously, using different parts of the chip.)
A JIT can even make optimizations as the program is running. This is the key idea of the HotSpot™ JVM. HotSpot can identify which parts of the program are used most and spend special effort optimizing that code. This allows the system to make optimizations on the fly, something ordinary optimizers cannot do. HotSpot can even regenerate native code when new classes are loaded if these classes upset the assumptions that had been made during previous optimizations (for example, by introducing a subclass of a class that previously had no subclasses, which might make it impossible to inline certain methods). See section 14.3 for more about inlining.
As these and other optimizations are incorporated in JVM implementations, JVM programs run faster and faster. It is possible that in the future, Java programs will even run faster than the corresponding C++ programs, because the compiler is capable of seeing optimizations at runtime that a C++ programmer or compiler cannot at compile time.
14.1.1 Garbage Collection Performance
One big difference in performance between JVM implementations depends on the garbage collector. The simplest garbage-collecting systems require all threads in the system to wait while all of the still-reachable objects are found. Smarter schemes allow better performance.
In the simplest garbage-collection schemes, the garbage collector starts with each reference in the local variable array and operand stack. It marks each object referenced in the local variable array or operand stack as live. (This applies to all the operand stacks and local variables on the Java stack, not just the currently active one.) Then the garbage collector checks each field within each live object. If this field points to another object, this object is also marked live.
This process is repeated until there are no more fields to check. Any objects that are not marked live are considered garbage. These objects are no longer accessible; if they were accessible, then there would have been some path that marked them as live. The memory for garbage objects can be reclaimed.
If the threads were allowed to continue during the marking process, one thread might move the last reference to a still-reachable object from a storage location that hasn't been checked yet to one that has, making a still reachable object appear unreachable. That object would be reclaimed, since the garbage collector never saw a reference to it, making further references to the object return incorrect results or even crashing the system.
Figure 14.1 shows the results of a garbage collection operation. Any object that can be referenced from the stack or the variable pool is considered live. Objects that can be reached from other live objects, like object 3, are also live. Objects that aren't live are garbage, and may be collected; these are shown in gray. Garbage objects may still hold references to other objects, live or garbage, but they're still garbage.
Figure 14.1. Live and dead objects
This garbage collection technique is very effective and easy to implement, but it can drastically reduce performance: sometimes the system shuts down for long periods while garbage collection occurs. The system may try to do garbage collection when the CPU isn't otherwise occupied, but the longer garbage collection takes the more likely it is that something will require the CPU's attention.
More sophisticated garbage collection techniques, such as generational garbage collection, work with subsets of the heap, allowing the garbage collection to run quickly. Generational garbage collection takes advantage of the assumption that an object that has been alive for a long time probably has a long lifespan and will continue to live for some time. Recently allocated objects are more likely to be temporary, which means they can be reclaimed soon.
Objects are grouped into generations based on when they were created. The newer generations are checked more frequently, since they are more likely to contain garbage, while older generations are checked less often.
Because the garbage collector looks at only a portion of the heap, it is easier to take advantage of short lulls in system load, and the system may be able to perform garbage collection without affecting performance at all.
As time goes on, it may be necessary to move the objects in memory. There may be plenty of space left, which comes from reclaiming small objects, but this space is fragmented. In Figure 14-1 it would be hard to find a place to put a large new object, since the objects are scattered all over memory. If all the objects were packed together more tightly there would be plenty of space. This is called compaction.
In compaction, objects are copied from one memory location to another. If the objects to be copied are large, this may take a lot of time. Different JVM implementations have different policies for when compaction occurs and which objects are to be moved. Better implementations try to pack long-lived objects together so that they don't have to be moved very often.
14.1.2 Implementing References
The Java virtual machine specification leaves the virtual machine implementor a lot of room for choosing techniques that perform quickly. Usually, these techniques involve a tradeoff. The implementor may be forced to choose between speed and memory. Many techniques make some programs slower and others faster.
For example, the JVM implementor may decide to represent references with a pointer to an object in memory. Another implementor may choose to use a pointer-to-a-pointer, also called a handle. Figure 14.2 shows the difference between the two techniques.
Figure 14.2. Two ways to implement references
If references are implemented as pointers, the objects can be hard to move in memory when compaction is done, since compaction entails updating all the pointers. This is done either by maintaining a list of all pointers to be updated or by sweeping through the whole memory and updating all pointers. Either way, it can be expensive, especially if there are many references.
If references are implemented as handles, moving an object is easy: the object can be moved, and only the handle needs to be updated. Since the handles are all the same size, it isn't necessary to compact the handle table. However, every time you want to use a reference, it must be dereferenced twice, once to find the location in the handle table and once to find the object itself.
Performance can be improved by temporarily preventing the object from moving. Then you can use the pointer directly, instead of the handle. This is known as pinning the handle. This improves performance, since only one pointer link needs to be followed, instead of two. However, pinning can cause problems as well. If a handle remains locked for too long, then memory gradually becomes full of immovable objects. Effective compaction is impossible, and the memory becomes badly fragmented.