Performance Tips | Practical Java Game Programming (Charles River Media Game Development)

This section offers generic and Java-specific performance tips that are presented as two separate sections. You should keep them in mind while designing, implementing, and tuning your game. Some may seem more intuitive than others. Nevertheless, they have been overlooked in many professional games and applications.

It is also crucial to know that a good design is generally more important than optimized code in the long run. Keep in mind that optimizations require more development and testing time. They can also make the code harder to maintain. You should not always give up good design for better performance, or vice versa. You should consider each design and optimization decision on a case-by-case basis.

General Tips

Focus on the Right Problem

Every computation requires some amount of CPU time to complete. Some take a large portion of the overall execution time of a game, whereas others take a negligible amount. Just because a single instance of a computation requires many CPU cycles does not mean it is a performance bottleneck. Similarly, just because a single instance of a computation requires very few CPU cycles does not mean it cannot be a performance bottleneck. To find out whether a method or computation is a bottleneck, it is extremely helpful to know what percentage of the game’s execution time is spent performing it. Profile your game to find the bottlenecks so you can spend your time fixing the right problem and not waste your time on problems that are simply not important.

Some computations have a lot more room for optimization than others. For example, changing a computation so that it uses optimal data structures and algorithms that are more efficient can pay off far better than trying to optimize, say, the square root method. Once the algorithms and data structure are optimal, only if a computation is still a bottleneck should you consider desperate optimizations that can significantly reduce readability and increase maintenance cost.

Perceived Performance Is More Important Than Actual Performance

Keep in mind that in the end, your game is as fast as the user perceives it to be. The user judges only what he can detect. The user does not count CPU cycles to judge how fast a game runs. If some computations in the game execute much faster than can be detected, but others execute slow enough to be detected, most users will conclude that the game is slow. When putting the game together, you should be more concerned about the overall performance of the game, as opposed to the performance of, say, the collision-detection algorithms.

Take Advantage of the Strengths and Avoid the Weaknesses

Every design and framework has strengths and weaknesses. This is true of Java, your code, and existing APIs written for any language or platform. By understanding the strengths and weaknesses of a framework, you can take advantage of its strengths and, more important, avoid its weaknesses. If a task that must perform often exploits the weakness of a framework, you should try to find workarounds. Even if there is no obvious workaround, it is much better to know that a task is exploiting a weakness of the framework than to do so blindly.

Don’t Assume All Optimizations Improve Performance

Optimizations are like investments. The same way an investment may not be worth making, an optimization may not be worth performing. Just like an investment, every optimization has a cost associated with it that must be paid. There is generally a crossing point where an investment starts to pay off. If that point is not reached, the investment will be a loss. Similarly, an optimization can in fact slow down your game. This is especially true for optimizations that are performed during runtime because CPU cycles are much more valuable during runtime than when the game code and content are being compiled.

The reason why the HotSpot VMs have two configurations is that they want to avoid optimizations that are likely to be losses. The client configuration does not perform optimizations that are simply not worth making because their overhead is too much to justify. On the other hand, the server configuration assumes that an application will run for a while. Therefore, it performs additional sophisticated optimizations that require more CPU cycles to perform but are still likely to pay off.

As another example, consider the following question: is it better to have an unsorted list and perform an exhaustive search when an element needs be removed, or is it better to pay the cost of keeping a list sorted so that the remove operations can be performed efficiently? The answer is not that simple. If there are many insertions, only a few removes, and the list is small, it is actually better to have an unsorted list and perform an exhaustive search for every remove operation.

Another example is visibility culling. A form of visibility testing known as occlusion culling is performed to remove geometries that are not visible because they are behind other geometries. Detecting geometries that fall behind others is an expensive task, but if on average it culls a significant chunk of the scene, it can be an investment well worth making.

To make sure optimizations pay off, you need to benchmark your code. It is also important to test your code with data that represents scenarios common in the game. This is one of the reasons why it is good to use macro-benchmarks in addition to micro-benchmarks.

Avoid Redundant and Unnecessary Computations

You can greatly improve the performance of your game by finding ways to perform a computation fewer times. Halving the number of times a computation is performed is even better than optimizing the computation by 100 percent or doubling its speed. You should be especially cautious about redundant computations in critical loops.

Sometimes, you can guarantee the result of a computation has not changed. In such situations, it is often beneficial to save the result of the computation and reuse it. For example, when a vehicle moves around on a terrain, its orientation has to be updated so that it is aligned to the terrain. If the vehicle has not moved since the last time it was rendered, there is no need to recompute its orientation. Even simpler computations such as recomputing the length of a vector multiple times can cause performance problems.

Other times, even if there is no guarantee that the result of a computation has not changed, you may be able to use slightly outdated data without any observable impact. For example, a non-player character (NPC) may be able to use some perceptual information that is tens of frames old without the player detecting it.

There are also scenarios where not ignoring redundant data can cause significant performance problems. Such problems are common in event-based applications that queue up events to make sure every single one is handled. For example, if a scrollbar forwards 15 scroll messages to a panel that has an expensive paint method, if the panel tries to process each event one at a time, a substantial delay will result from the time the user drags the scrollbar until the panel catches up. Similar problems can arise if an NPC receives multiple events about seeing the same enemy, or a client game that receives multiple network packets notifying about similar events. In such circumstances, it can be advantageous to ignore some of the older events when newer ones are received.

Precompute Expensive or Common Calculations

Precomputation is a technique that is heavily used in numerous games. Many computations can be performed before a game is even shipped. The results can be stored in a file, which can be loaded and used during runtime. Precomputation is essentially a tradeoff between memory consumption and CPU utilization during runtime.

For example, many games precompute the best path between every interesting location in a level and store them in a table. Some games precompute the best path between every pair of triangles that represents a walkable region in an entire level. By doing so, when the NPCs want to move around the level, they can simply look up a precomputed path based on the triangle they are on and the triangle they want to reach. As you can probably imagine, a table that stores the best path between every two triangles in a level can require a significant amount of memory during runtime. On the other hand, if multiple NPCs want to move around in a level, they do not have to perform individual searches every time they pick a new destination. Even the faster path-planning algorithms such as A* (A-Star) can be resource intensive when performed during runtime.

As another example, many games use binary space-partitioning (BSP) trees to divide a scene into volumes. When the level is being exported, an algorithm determines which other volumes can be visible from any point within a given volume. The visibility results are then stored in each volume. By doing so, the renderer can use the precomputed visibility list to determine which other volumes do not have to be rendered when the camera is in a specific volume.

Compress and Decompress Data During Runtime

Almost every game developed for a console desperately needs more memory than it has available to it. Because even modern-day consoles have as little as 32MB of RAM to store all code, data, and intermediate data, memory is very valuable. Because of this, many console games compress and decompress data in memory to make more room.

This technique can be used on PCs as well. Maintaining compressed data is particularly popular when a significant amount of precomputed data must be managed during runtime. To free up some memory, you can also write out large chunks of temporary data to the hard drive. To keep data more accessible, you may choose to use memory-mapped files. Memory-mapped files can take advantage of the underlying operating system’s virtual memory manager. Because the OS is managing the data, you can typically get better performance than if you were to directly read and write from a file. Memory-mapped files have been added to Java as part of the new IO (NIO) package and are discussed in Chapter 5, “Java IO and NIO.” Alternatively, you can write a few native methods to gain direct access to the virtual memory manager functions. Please refer to Chapter 11, “Java Native Interface,” for more information on writing native methods.

Cull Expensive Computations and Try to Use Multiple Levels of Detail

Culling and multiple levels of details (LODs) are typically thought of as techniques specific to geometry rendering. Nevertheless, the same concepts can be used for any expensive computation. Geometry culling is the process of eliminating geometry that does not have to be rendered. For example, geometry that is not in front of the camera should not be rendered. Culling can be performed on sound sources or NPCs that are not close enough to the player so that their corresponding computations can be eliminated.

Geometry can have multiple LODs, so that when the camera is far away, a version of the geometry that has less detail can be rendered. This is because objects that are in the distance are less noticeable because they take up less space on the screen. Hence, rendering a simpler version can be undetectable by the user. LODs can be used for other computations, such as the physics of a flying airplane. If an opponent plane is far away, simpler flight models can be used.

Anticipate Upcoming Computations

When the likelihood of an action is relatively high, some of the computations corresponding to the action can be performed in the background. By anticipating upcoming computations, some computations can be performed ahead of time to lessen the sudden need to perform expensive computations in a short period of time.

For example, when the player is reaching the end of a level, the next level can start to be processed because it is likely that the player will leave the current level soon. Anticipation is fundamental for continuous worlds where there are no clear breaks between levels. Even when there are clear separations between levels, many games try to lessen the transition by preloading and preinitializing necessary data. To buy some extra time, many games have levels that are linked by rather empty hallways or even additional cut scenes.

When the game is on the main menu, you can start doing some of the inevitable computations that must be performed. Similarly, when a game has been launched and an active campaign is in progress, the player is far more likely to continue the campaign as opposed to exiting the game.

Approximate Instead of Computing Exact Values

Approximation is a technique that is used by many computations. Many times, by simply approximating some data, a reasonable amount of CPU cycles can be saved. In fact, many tasks performed in games would be computationally impractical to perform accurately.

Consider the task of determining which opponents are visible to an NPC. How expensive do you think it is to precisely determine which opponents are visible? Well, to precisely conclude whether an NPC can see the player or another NPC, the entire scene must be rendered from the perspective of the NPC, and then, if any of the pixels of the opponent end up in the final image, the NPC has the potential of seeing that opponent. This is, of course, not done in actual games. Instead, a single ray is cast from the eye of the NPC to, say, the head of another NPC. If nothing intersects with the ray, the NPC can potentially see the opponent. If only a single ray is tested, if a single branch of a tree happens to intersect with the ray, the NPC is not able to see the opponent, even if 99 percent of the opponent’s body is visible. To be more realistic, some games have a target point on each limb, which is tested in the same fashion.

Let’s say that you want to simulate an airplane or vehicle behavior. Do you need to know the exact surface area under the plane to compute the lift? Do you have to dynamically compute the exact area of a tire’s contact patches to compute the necessary forces? Even if you have the most precise data available to you, if the computations you perform on the precise data are not exact, you will not get precise results. Furthermore, performing precise computations can prove to be far too expensive to justify.

Many games have used square root and sine tables that store precomputed results for angles at certain intervals. When the sine of an angle needs to be computed, the angle is clamped to a close enough value for which the result has been precomputed. Such techniques approximate the output value. There are also techniques to approximate distance. For example, in some scenarios, it may be sufficient to use the average of delta x and delta y.

Do Not Make Your Data More Detailed Than It Needs to Be

The quality of data on which some computation occurs is directly related to both CPU and memory consumption. The more detailed the data, the more disk space is required and the more RAM is needed to store the data in memory. In addition, higher resolution data means that more processing is required to use the data. This is true for data such as geometry, sound, and textures, or even data that is used to represent the world for the NPCs.

There are tools that can sample textures and generate equivalent textures that have substantially lower color depth without having any detectable impact. In fact, it is possible to have 4-bit textures (only 16 colors) that are as vibrant as 32-bit texture. Good algorithm can look at a source image and select the most important 16 colors that represent the image. If you want to have a rich world that uses only 4-bit textures, you must separate texture to distinguished categories such as sky, trees, and ground.

Even if the data is detailed, the detailed data can be used for some computation, but a lower-resolution copy can be used for another computation. For example, the world used by the NPCs to perform visibility tests can have much lower detail than the representation used for rendering the level. Most games do not use the actual triangles and polygons of the level for collision purposes. Some games use simpler representations such as spheres and boxes. Many games use very low-resolution alternate geometry for collision detection. The alternate representation of the world is created to roughly estimate the actual geometry of the world.

Abstract Data into a Hierarchy

If you must perform computations on a substantial amount of data, you can group small segments together to generate an abstract, lower-resolution representation. If data is represented as a hierarchy, the low-resolution representation can be used to determine which part, if any, of the high-resolution representation should be the focus of the computation.

For example, eight spheres can be used to approximate the limbs of a humanoid character. Each of the eight spheres can point to the actual geometry of the corresponding limb. Another sphere that is large enough to include all the eight spheres can be used to represent an abstract representation of the entire character. This hierarchy, which has three levels, can be used to efficiently compute the point of collision of a bullet. A quick check against the large sphere can determine whether the character has been hit. If the character has been hit, additional checks against the spheres that represent the limbs can determine which limb was hit. If a game needs to know the exact point of collision, the geometry pointed to by the sphere that was hit can be tested to find the exact point of collision. By abstracting the data and generating lower-resolution representations, we avoid having to always perform collision detection against the high-resolution data (that is, the actual triangles of the character).

This concept is applicable to other types of computation as well. For example, if an NPC has to plan its way through an excessively large world, he can use an abstracted or low-resolution version of the level to put together a rough plan and then try to work out the details of his plan using the high-resolution version. This is known as hierarchical path planning. Hierarchical representations are actually much more humanlike than always using fully detailed data.

Be Aware of the Upper Bound of Your Algorithms

Know how much a computation can cost you in the worst-case scenario. If there is a substantial difference between the best-case and worst-case scenarios, it is crucial to know how often the worst-case scenarios can occur. It is not a tragedy for a computation to run faster than necessary, but running too slowly can be a tragedy.

For example, heuristic searches such as A* (A-Star) can run very reasonably. However, in the worst-case scenario, they exhaust the entire search space, making them as bad as brute force algorithms such as Breadth-first and Dijkstra. This means that if worst-case scenarios ever occur, the amount of memory and CPU consumed by the algorithm is extremely high. Therefore, every attempt has to be made either to make sure that searches that result in the exhaustion of the search space never occur, or there is a way to suspect such scenarios and terminate the search.

As another example, consider a rendering system that uses a form of hierarchy such as a quad tree to determine which parts of the scene do not have to be rendered. Just because most of the time you can get good performance does not mean that the game will run well. You have to be aware of how badly the performance can get if the player goes anywhere in the level and looks in any direction. As with the last example, you can try to make sure that extremely bad scenarios can never occur by tweaking the level. Unlike the last example, it is not practical to terminate the rendering of the remaining geometry because the result can be immediately detectible by the player.

Specialized Code Performs Better Than General-Purpose Code

An implementation that has been tailored for a specific task is generally more efficient than its equivalent generic implementation. This advantage comes with higher development and maintenance costs, however. In addition, specialized implementations are usable for fewer problems. This is because specialized code can use problem-specific knowledge to take advantage of every fact and make specific assumptions.

For example, a collision-detection algorithm designed to detect collision between axis-aligned boxes will perform significantly better than a system that needs to handle collision between arbitrary geometry. As another example, a handcrafted linked list can be far more efficient than the generic implementation available in Java collections framework or C++ standard template library. Specific comments are made about the Java collections framework in a dedicated performance tip.

Laziness Can Be Good When It Comes to Computation

When appropriate, you may want to put off expensive computations as long as possible to alleviate sudden computation spikes. Lazy loading, initialization, and computation can help improve the performance of your game by making it appear smoother.

Use Threads with Care

Consider the following question: is it better for a thread to sit and wait or always run as fast as it possibly can? As you might have guessed, it entirely depends on the situation. Some threads should not be blocked so they can constantly perform critical computations. On the other hand, some threads must be blocked often so they do not consume CPU cycles that can be used by other threads with higher-priority tasks.

For example, slight pauses in the thread responsible for rendering can be very undesirable. If the render thread is used to load an opponent model when a new player joins the game, an unwelcome pause will occur. On the other hand, a thread that is supposed to collect network packets or deliver game events should not constantly consume CPU cycles if it doesn’t have anything to process. If no events or packets are available, the thread should be blocked until new data is available.

As another example, consider a controller device in streaming mode. If a separate thread is responsible for retrieving the data, the thread may end up wasting a lot of CPU time. If the thread is simply in a loop that reads any available packets and copies them into a structure that can be accessed by the game thread, the thread may end up going back and forth to retrieve the data at a much higher rate than necessary. In this scenario, many CPU cycles are wasted that could have been put to better use by other threads.

As a side note, many console games do not use multiple operating-system-level threads. Instead, they emulate threads at the game level. Tasks are added to a scheduler’s list, which then gives each task a chance to run. This approach is sometimes even preferred on PCs because it has low overhead and tasks can be scheduled to start at a specific time in the future. However, if such a scheduler is implemented with a single OS-level thread, the scheduler must have faith in each task and assume that each of them will either complete immediately or return without completing and indicate that it needs to be rescheduled. If any of the scheduled tasks forget to return promptly, the scheduler will not have a clue until the task returns. In fact, if a task never returns, the entire scheduler will halt.

Java-Specific Tips

Better Bytecodes Mean Better Execution Performance

In general, more optimal bytecodes can make your application run faster. This is true for both interpreted bytecodes and bytecodes that have been compiled to native CPU instructions. If you use a VM implementation that is simple and does not have a compiler that can compile the bytecodes to native code, you should use third-party optimizing compilers to get better performance. Sun static compilers do not perform any real optimizations. They perform only basic optimizations defined by the Java language specifications. The optimizations are left for the HotSpot compiler.

Methods that have four or fewer parameters have less overhead because the VM has special instructions for them. For example, iload_0, iload_1, iload_2, and iload_3 are special opcodes that load an integer from a local variable onto the operand stack. The first local variable of an instance method stores the this pointer. The rest are the method parameters and other local variables defined in the method. In addition, the VM has opcodes that refer to constants –1, 0, 1, 2, 3, 4, and 5. For example, the opcode iconst_1 pushes the integer 1 onto the stack without having to extract the operand from a bytecode.

Because the VM is limited to 256 opcodes, the instructions are not orthogonal. There is not a one-to-one mapping between integer opcodes and byte opcodes. The compiler treats bytes, chars, and shorts as integers and then narrows the result using appropriate cast opcodes. In fact, int is the VM’s favorite type.

Statically Bound Methods Are Faster than Virtual Methods

When possible use private or static methods because, being statically bound, they are invoked faster. As explained earlier in the chapter, methods that are statically bound can be invoked directly and do not need to be resolved during runtime. Dynamically bound methods are those that are invoked using the invokevirtual opcode. It is important to know that using the keyword final does not make a method statically bound. In fact, final methods are dynamically bound and are still invoked with the invokevirtual opcode. You should decide whether a method should be final only based on its purpose in the application.

Promote Inlining

Even though it is hard to absolutely control which methods are inlined by HotSpot, some guidelines can increase the likelihood that a method is inlined. Generally, it makes more sense to inline smaller methods because the overhead of calling them can be significant when compared to the actual work they do. In addition, inlining larger methods means that the code can become too bloated. Keep in mind that when a method is inlined, the body of the method is inserted at the call site. If multiple invocations of a single method are inlined, the body of the method is inserted in multiple locations.

Short and simple methods that have no dependencies are ideal candidates for inlining. Note that if a method does not call any other methods and strictly relies on local variables, final members, or static members, it can be copied and pasted by the compiler without any worries. Local variables can be declared and used where the method is inlined. The value of a final variable can simply be copied because they do not change. Finally, static member variables can be resolved without ambiguity because they are statically bound.

In addition, only methods that are statically bound can be inlined. Static and private methods are bound at compile time. Other methods must be devirtualized during runtime before they can be inlined.

Be Careful—Native Methods Are Everywhere

Invoking a native method is more expensive than invoking a Java method. In addition, native methods cannot be inlined, which can lead to additional performance costs. Try to have an idea of which methods are native or call native methods indirectly. It is also good to have an idea of what the underlying native method may be doing. You can always look in the corresponding source files if they are avail-able. Note that JDK does come with most of the Java source. The VM and all its native files are also available for download. In addition, using options such as -verbose:jni can help you monitor some JNI activities.

Some native methods use static buffers, and others dynamically allocate memory when they are invoked. If you are calling a native method that dynamically allocates memory, you should make every attempt to reduce the number of times it is invoked. For example, if you write an array greater than 8k to a file, the current implementation of the underlying native function allocates a large enough buffer, copies the data to the buffer, writes the content of the buffer to a file, and then frees the buffer. If you call the method many times, a substantial amount of extra work will be done by the OS, which can cause noticeable performance problems.

Note that some objects use native structures that can be memory intensive and expensive to initialize. You should use such objects with a lot of care. Objects that use native structures tend to have a finalizer to release their native resources when they are collected. Finalizers delay the collection of objects, and there is no guarantee as to how long an object must wait before its finalizer is invoked. Also, keep in mind that it is hard to estimate the amount of native resources used by such objects because the VM does not have the slightest clue about them.

Using native methods, of course, has advantages and is fundamental to Java. However, when a native method is called, the application temporarily gives up many of the guarantees made by Java and the VM. For example, a native library can leak memory or corrupt the memory that is used by the VM and cause it to crash. Chapter 11, “Java Native Interface,” discusses how to add native code to your game and covers the implications of using native code in an application.

String Manipulation Is More Expensive than You May Expect

Use strings carefully. Strings are immutable objects, meaning that once they are created they cannot be changed. If you need to manipulate strings, you should use StringBuffer objects. In fact, when you use the + or += operators to concatenate strings, the compiler inserts appropriate code to create StringBuffer objects so that it can append the strings together. Consider the following methods that concatenate a few strings:

String tokens[] = {"s1", "s2", "s3", "s4"}; public String test1(){     String str = "";     for (int i=0; i < tokens.length; i++){         str += tokens[i];     }          return str; }    public String test2(){     String str = "";     for (int i=0; i < tokens.length; i++){         str = new StringBuffer().append(str).append(tokens[i])               .toString();     }     return str; }

If you were to look at the bytecodes generated for the methods, you would see that they are identical. Essentially, the compiler translates:

str += tokens[i];

str = new StringBuffer().append(str).append(tokens[i]).toString();

As you can see, a simple concatenation has translated to create a new StringBuffer, calling append two times, and finally calling toString. If this wasn’t bad enough, note that toString, like most other String methods, creates and returns another String object. The following method accomplishes the same task far more efficiently:

public String test3(){     StringBuffer buffer = new StringBuffer();     for (int i=0; i < tokens.length; i++){         buffer.append(tokens[i]);     }       return buffer.toString(); }

Direct Byte Buffers Are Fundamental to Games

Use direct byte buffers to share data between Java and native code. Direct byte buffers were introduced in JDK 1.4, and games are one of their biggest beneficiaries. Because the memory allocated for a direct byte buffer does not reside in the Java heap, its content can be readily passed to native APIs, such as those of the operating system. Chapter 5, “Java IO and NIO” discussed buffers and direct byte buffers in detail and presented performance comparisons.

The Java Collections Framework Is Good and Bad

The arguments about using generic data structures in games are not new and are not specific to Java. In fact, for years developers have argued this topic, even in C++. The C++ Standard Template Library is essentially the equivalent of Java Collections Framework (JCF). It contains many of the data structures available in the collection’s API. It is extremely important that you know the difference between ArrayList and LinkedList (or the equivalent STL vector and list classes).

You can use the Java Collections Framework in your game, as long as you are careful about a few things. LinkedLists are inefficient for most tasks. Every time you insert an element in a linked list, a new internal object must be allocated to store the element. This internal element is fundamental because it is not practical to require all objects that want to be in a linked list to already have a next reference (or next pointer). Unlike a linked list, an ArrayList uses an actual array internally so that it can provide fast random access to its elements. Because of the internal array, there is no need to store next pointers.

Do not use Collection’s Vector. The Vector class was introduced with JDK 1.0, which is before the Java Collections Framework was introduced. The Vector class was changed to be part of the framework, but that was mainly to allow backward compatibility. The main difference between a Vector and an ArrayList is that the former is synchronized. However, as of JDK 1.2, it has been better to use the following:

List list = Collections.synchronizedList(new ArrayList());

Even though a Vector is slightly faster than a wrapped ArrayList, it is a better practice to use a wrapped ArrayList if you need a synchronized list.

Do not use an ArrayList when it is more appropriate to use a LinkedList. Even though an ArrayList is typically more efficient, there are times when you should stick with a linked list. Keep in mind that every time you insert or remove an object from an ArrayList, all the elements to the right of the internal array must be shifted to the right or to the left, correspondingly. For a large-enough list, the overhead of shifting (copying) the references can be greater than the disadvantages of the internal allocation and deallocation of LinkedList elements.

Because the JCF data structures are written to be as generic as possible, they are not efficient when you want to deal strictly with primitive data types. For example, if you want to have a linked list of ints, you must create a corresponding java.lang.Integer object. Even data structures such as hash tables expect an object as the key value of an entry. If a critical module of your game needs to use a hash table, you should typically use a custom-made hash table that uses primitive data types as the key values. Using an int instead of an Integer object can have substantial advantages in terms of both memory and execution.

More Objects Can Mean Less Available Memory and CPU

Avoid creating unnecessary objects when possible, especially in the hot spots of the application. Even though the GC has been designed to deal with many short-lived objects, it is still a good practice to minimize the number of objects. The more objects created, the more will have to be reclaimed, and the more work the collection and compaction is likely to be. Many times it is possible to reduce object creation by reusing an object. This approach can be problematic when dealing with multiple threads, but it is worth considering.

When using existing APIs, be aware of methods that create new objects, and try to minimize the number of calls to them. When designing an API, to minimize object creation try to receive a reference and populate it with the result. This approach is much better than creating a new object to return some data.

Do Not Leak Objects

Do not forget that objects will not be considered garbage as long as they are reachable. It is easy to inadvertently hold onto objects that are no longer needed. This is especially true during the loading of a game. It does not hurt to explicitly set references to null if they are class members and you are certain that you do not need their corresponding objects. It is generally a good practice to use local references because they help reduce the possibility of such memory leaks in Java.

Minimize the Number of Classes

Avoid making unnecessary classes. More classes mean more memory consumption and longer startup time. For example, if you need a basic vector structure to store three floats, it may be better to use an array of length three as opposed to making a class that has x, y, and z fields. Also, be cautious about using anonymous inner classes. They are common and convenient, especially when implementing Listener classes for GUI components.

listener = new Listener() {     public void actionPerformed(){     } };

Note that the code segment results in the creation of an extra class file at compile time. It is easy to overlook the number of classes created in this manner.

Take Advantage of Conditional Compilation

The JDK 1.4 compiler (javac.exe) allows you to do conditional compilation, which is similar to using C/C++ preprocessors such as #define_DEBUG. Defining the _DEBUG preprocessor allows the code segments that follows a #ifdef _DEBUG statement to be included in the compiled representation. Similarly, not defining the preprocessor allows the debugging code to be excluded from the compiled representation. If you run javap.exe on the class file generated from the following code, you will not find any bytecodes that correspond to the code in the conditional check:

static final boolean DEBUG = false; public void test(){     int a = 10;     if (DEBUG){         System.out.println("DEBUG flag is true");     } }    0:   bipush  10 2:   istore_1 3:   return

Smaller compiled code means fewer bytecodes, which reduces memory consumption and load-time bytecode verification, as well as leads to faster execution time. Note that this not only eliminates unnecessary bytecodes from release builds but also prevents the debug strings from being added to the constant pool table.