The Garbage Collector | Practical Java Game Programming (Charles River Media Game Development)

Many developers who are not used to automatic memory management tend to be very skeptical about the performance of the garbage collector (GC). Memory management is an issue that has to be dealt with for just about any application. Many games come with custom memory managers that perform compaction to minimize fragmentation of the heap. In addition to performing memory compaction, the GC is responsible for allocating and reclaiming objects during runtime. This task is especially important because unlike C/C++, all Java objects are allocated from the heap. In Java, all objects are allocated using the new operator, which returns a reference to the allocated object. The only data that is allocated on the stack (in a frame) is primitive data types as well as object references. Nevertheless, even the frames are allocated on the heap.

Over the years, many improvements have been made to the GC. JDK 1.4.2 comes with four collection algorithms, each of which is designed for applications with different requirements. The one-size-fits-all approach to garbage collection does not work well for the wide variety of applications written in Java. For example, a Java program that runs on a server may not be concerned about pauses. On the other hand, even the slightest pauses in a real-time program such as a game can be noticeable. Fortunately, different algorithms can meet the needs of different applications. In addition to different algorithms, the behavior of the GC is highly tunable through different parameters.

The Default Collector

A garbage collector must determine which objects are garbage so that it can collect them. A simple approach would be to traverse the entire object graph. Any object that cannot be reached can be safely flagged as garbage. Though possible, this approach is extremely inefficient. Instead, the memory is divided into three partitions. Each partition corresponds to objects of a specific generation. The first partition stores young objects. Young objects are those that have recently been created. The second partition stores tenured objects. The third partition stores objects that are considered permanent, such as instances of java.lang.Class. A newly created object is part of the young generation. Young objects are assumed to die soon; however, if a young object survives through enough collection passes and shows that it was not used only for a brief computation, it is promoted to the tenure partition.

By separating objects that are likely to die from those that are likely to stay around for a while, the overall cost of garbage collection can be reduced. By using separate partitions, the GC will not have to traverse the entire object graph when it wants to reclaim dead objects. In addition, it can perform housekeeping on each partition at different intervals. Generally, when a partition becomes full, housekeeping is performed on it to reclaim dead objects and compact the live objects. The garbage collection of the young generation is referred to as a minor collection, and the collection of the tenured generation is referred to as a major collection. A minor collection is generally much faster than a major collection, primarily because the size of the young generation is typically much smaller than the tenured generation. In addition, because it is likely that most of the objects in the young generation are dead, its compaction requires even less time. The fewer objects survive, the less copying and updating has to occur. Note that when a minor collection is performed, not all dead objects can be collected because some of them may be referred to by dead objects in the tenured generation that have not been collected yet.

When dead objects are collected, the corresponding partition becomes fragmented. To defragment or compact the live objects in a partition, the objects in the partition have to be moved. Even though Java objects are moved during garbage collection, references to Java objects remain valid. That is, as far as a code segment is concerned, it should still be able to access the objects that were moved. To deal with this problem, the VM can represent references as handles. A handle is an identifier that can be used to retrieve a direct pointer to an object in the heap. By using a handle, when the location of an object is changed, the identifier that is used to retrieve the direct pointer can still be used and does not have to change. This approach was taken by older VMs, but newer VMs use handleless objects because accessing an object through a handle requires an extra level of indirection, which is more expensive than using direct pointers in a typical C/C++ program. Instead of referring to objects indirectly, newer VMs allow Java code to use the actual memory address of the objects. This approach significantly increases the time required to access an object. On the other hand, using actual pointers means that if an object has multiple references to it, when it is relocated by the GC, every single reference must be updated so that it stores the new memory address. Note that even though the VMs that use handles pay a greater cost to access an object, when they perform housekeeping, they have to update only a single item. Despite this fact, handleless objects have proven to allow for better overall performance.

The behavior of a collection algorithm can be described in terms of footprint, pauses, promptness, and throughput. Footprint and pauses are rather obvious from their names. Promptness refers to how promptly dead objects are reclaimed. Throughput is the amount of time that a running program does not spend on garbage collection. There are four collection algorithms available. The default and incremental collectors are beneficial for most applications that run on single CPU machines. The throughput and concurrent collectors are intended for multiprocessor machines. The default collector is often the best choice for games. It is highly tunable and worthy of fully understanding.

As mentioned earlier, a minor collection occurs when the young generation is full, and a major collection occurs when the tenured generation is full. The size of the heap and each of its partitions directly affects the behavior of the garbage collector. The size of the heap can be set using the -Xmx option. Even though a larger heap reduces the major collection intervals, you should note that the collection cost of a larger heap is higher. This means that even though the GC will not run as often, when it does perform housekeeping it has more work to do. The -Xms option can be used to set the initial size of the heap so that even with a large maximum heap size, the collection intervals can occur more often and, therefore, have smaller pauses. When the VM launches, the entire space is reserved, but if the -Xms is smaller than the -Xmx, only a portion of each partition is actually used. Setting both the initial size and the maximum size to the same value indicates that the heap should always be used in its entirety. If the initial size is smaller than the maximum size, at each collection, the size of the heap is changed so that a certain ratio of free space to used space is maintained. When the size of the heap is changed, it will not exceed the boundaries set by the -Xmx and -Xms options. The parameters -XX:MinHeapFreeRatio and -XX:MaxHeapFreeRatio are used by the VM to compute the appropriate size of the heap. The -XX:MinHeapFreeRatio parameter is used to grow the heap size to make sure that enough free space is available. Similarly, -XX:MaxHeapFreeRatio is used to shrink the heap size.

Note that changing the size of the heap means that the size of each partition can be affected. Because the size of the young generation is an important factor for just about every application, additional parameters are available for tweaking its size. The -XX:NewRatio allows you to declare the ratio of the heap that should be used for the young (or new) generation. So setting -XX:NewRatio=2 means there should be one unit allotted for the young generation for every two units allotted to the tenured generation. This parameter is used to compute the maximum size of the young generation. Because of the importance of the size of the young generation, the parameters -XX:NewSize and -XX:MaxNewSize can be used to bound its size. Similarly, -XX:PermSize and -XX:MaxPermSize can be used to bound the size of the permanent generation.

Other Collectors

The incremental garbage collector, also known as the train collector, is another collector that can be beneficial for a game because it is not necessarily intended for multiprocessor machines. The tenured generation is typically larger than the young generation and does not have many dead objects, so its housekeeping is far more expensive than the housekeeping of the young generation. The incremental collector tries to reduce pauses by breaking up the housekeeping of the tenured generation over multiple, smaller steps. At every minor collection, part of the tenured generation is collected as well. Because this algorithm only cleans part of the tenured generation, some complications arise. For example, the collector has to keep track of how much work is left. This means that the throughput is lower. In addition, because collecting only part of the tenured generation at a time can cause the overall collection of the tenured generation to take longer, the promptness of this collector cannot be as good as the promptness of the default collector. In other words, dead objects can stay around longer. Having a smaller young generation can help the collection of the tenured generation to occur more often for this specific collection algorithm. Using the incremental collector can also mean that the tenured generation is likely to become fragmented. To work around this issue, using a larger tenured generation can help reduce the fragmentation related concerns. In essence, the incremental collector reduces pauses at the cost of additional bookkeeping, as well as longer and more frequent minor collection. You should generally avoid using the incremental collector. If the default GC results in noticeable pauses and tweaking it does not rectify the problem, you can then consider the incremental collector. The command-line option -Xincgc can be specified to use the incremental collector.

The remaining two collectors are the throughput and concurrent collectors. They can pay off substantially on multiprocessor machines. The throughput collector is similar to the default collector but uses multiple threads to collect the young generation. This approach causes the minor collections to be very smooth. The overhead of multiple threads will be minuscule if the machine has more than one processor. On a single-processor machine, this collector is likely not to perform as well as the default collector.

The concurrent collector is also similar to the default collector but instead uses a separate thread that runs concurrent with the application threads and collects the tenured generation. The thread should be able to collect the tenured generation before it becomes full. If it does not, a full collection is forced. A full collection means that all the application threads will be paused and then resumed after the tenured generation is fully collected. Even though it is possible to have some improvement on a single-processor machine, the concurrent collector often results in poor performance on a single-processor machine. The command-line option -XX:+UseConcMarkSweepGC can be specified to use the concurrent collector.