Improving Math Function Usage | Practical Java Game Programming (Charles River Media Game Development)

Most math operations cannot be optimized directly. That is, the user cannot make the add or subtract instructions any faster. They are atomic functions (or nearly atomic), meaning that the add in the Java language maps directly to a processor instruction and, therefore, runs as fast as the processor can execute it. When using most math functions, the first areas that can be optimized are in the math function usage and the algorithms. For those cases, design is the key to performance.

However, some math functions are not atomic. Often, these functions must be iterative to compute a complete and accurate solution and, compared to simple add and multiply functions, are quite costly computationally. The classic examples are sine and cosine (and their trig relatives) and square root. These functions show up in gaming programming often because they are directly related to geometry, and much of game programming deals with geometry. We will take a close look at what can be done to improve performance for these functions. There are many more—for example, log(), which do not show up in game programming as often—but they can be dealt with in a similar fashion.

Minimizing Costly Methods

The first step in minimizing math functions’ impact to an application is to examine if the calls to the functions can be reduced. This may not readily yield any gain for well-written code because redundant calls should not be happening anyway. However, this often happens when using APIs that encapsulate these math method calls away from the developer. For example, if application code repeatedly calls a vector object’s getMagnitude() method to get the magnitude of a vector for testing and does not store the value in a local variable, the method is probably paying the cost of recalculating the same magnitude value each time getMagnitude() is called by the application code. Because calculating magnitude involves computing a square root—one of the more expensive functions—this repeated accessing is very costly. Most likely, the method documentation will not warn the developer with a note, such as “calling getMagnitude() is costly due to square root,” so the developer must simply know this is the case. Careful examination of direct as well as indirect calls to expensive math functions must be considered in any attempt to improve math performance.

Redesign Around Costly Methods

When the application code is making the minimum math calls possible, two more things can be done. One is to examine the technique being used to solve whatever problem needs to be solved. Perhaps knowing that particular math operations are expensive will encourage redesign on a particular routine. Perhaps even foundation data structures will be affected. One example is storing triangle edge and triangle surface normals for computing object collision. Often 2D and 3D collision routines need perpendicular surface vectors that are normalized or unit length for accurate solutions. Normalizing involves computing vector magnitude, which as stated earlier computes a square root. If many normals need to be used in a particular game’s collision tests, perhaps computing the normals before runtime and storing them within the collision geometry data structure is a better solution at a greater memory cost.

Deciding when and where to make modifications to basic algorithms and data structures can be challenging, and tradeoffs between speed, efficiency, and memory usage will have to be made.

Rolling Your Own

The last thing that can be done is to “roll your own,” or write your own alternative math functions. Writing math functions that are faster than the standard APIs, as well as specialized math functions, are classic gaming crafts. Although the once-large performance gains over standard math implementations are shrinking, this is still an area in which some performance edge can be squeezed out. Simply implementing the alternate math methods in Java is a straightforward process. However, there may be hidden complications to the classic techniques because of how the Java language and VM may affect those techniques and also how the VM handles the standard Java math functions as well.

Troubles with Doubles

At first glance, when attempting to improve on the Java lang.Math methods, one obvious issue sticks out—almost all of the complex functions accept only double-precision (64-bit) numbers, that is, there is no sin(float angle), only sin(double angle). A double is twice as big as a float and takes more processing to compute. One of the easiest changes to be made for mathematical optimization is to use only the lowest-bit resolution data types needed. In fact, on consoles, 16-bit floating-point numbers are typically used versus the 32-bit floating-point numbers on PCs. Java does not have direct 16-bit float support, but it is fine here because it turns out that on PCs it’s optimal to be the native data size of 32 bits rather than something smaller. (See Chapter 6, “Performance and the Java Virtual Machine.”)

Now, if the application sticks to using 32-bit primitives when its routines call the 64-bit double-based math methods, additional implicit casts from float to double are required in and out of the math function. Because of all of the overhead of casting from 32-bit to 64-bit numbers and back again, as well as the wasted accuracy lost in the conversion, it seems sin(double) would be a great candidate for replacing with a new sin(float).

Here’s where the VM starts to complicate things again. The 32-bit float to 64-bit double then double-back to float cast can be avoided by a very smart VM. What’s more, sin(double) may actually become sin(float) when an application attempts to pass in a 32-bit float instead of the declared argument of 64-bit double. It seems the latest JIT compilers can be made smart enough to catch that float passed in and the cast to float coming out, and actually switch to calling the float version of a math function instead of the stated double version of the API. This is done purely as an automatic optimization to improve performance when the calling method is working with floats.

This automatic floating of the complex functions is a great improvement for current JVMs over previous versions. However, in some cases, alternative methods can still squeeze out more performance, and in cases where the VM doesn’t automatically switch between the double and float math functions automatically, these methods will yield ever-greater speed increases.

Fixed-Point Math

Fixed-point math has a long history of use in games. In the past it was an important technique for performing decimal math operations for PCs and game systems that weren’t equipped with floating-point processing units. Today, because PCs have greatly improved floating-point capability and performance, it is a rare case where fixed-point math will give a significant performance increase. Specifically for multiplication and division operations, fixed-point math has a serious performance disadvantage on modern CPUs. Unless you are developing for systems that lack hardware floating-point support, such as smart phones, some PDAs, or other handheld devices, you are unlikely to see a performance benefit using fixed-point math. However, those systems lacking the floating-point support can benefit greatly from fixed-point math

Fixed-point math is a technique in which decimal numbers are encoded into standard integers and processed in that form instead of in the standard floating-point number format. They are called fixed point because the numeral precision on either side of the decimal point is fixed throughout a series of operations. For example, a typical fixed-point representation might use a 32-bit integer where the first 16 bits hold the integer value and the remaining 16 bits hold the decimal value of the number we want to process. To convert a regular integer to this fixed-point number, the value is simply bit-shifted over the number of desired decimal bits, in this case 16, because integers have no decimal component to deal with.

fixedPointNumber =  intNumber << 16;

Now the int fixedPointNumber has the integer value of intNumber in its upper 16 bits. One thing that should be obvious is that fixed-point math limits the range of possible integer values because we have just shifted a 32-bit integer into 16 bits. If this is a problem for a given data set, the choice of how many bits to assign to the integer part and the decimal part of the number—in this case, 16 bits each—may need to be reevaluated.

Converting floats is a bit trickier because floats are in IEEE format, so simply shifting them will not work. The correct way to convert these numbers is to multiply the float value by 2 to the power of the bit count we want to shift and cast to an int. In this example, that would be 2 to 16th power (1 shifted to the left 16 times, or 65,536, or x1000 in hex).

fixedPointNumber = (int) floatNumber * (1<<16);

To convert back to an integer from the fixed-point representation, simply shift to the right.

int_value = (int) fixedPointNumber >> 8;

Unfortunately, this method loses all data in the decimal part of the fixed-point number, which is often data that we want. However, if we want the final number in the form of a floating-point value, the conversion is simply a division by the same number we shift by for the original conversion from floating point to fixed point.

floatNumber = (float) fixedPointNumber /(1<<16);

This can also be done by multiplying the fixedPointNumber and the inverse of (1<<16), which can be precomputed because it doesn’t change for a given fixed point, to avoid the slower division operation.

Addition and subtraction of fixed-point numbers work the same as for integers, but multiplication and division do not. The problem is that the fixed-point number is effectively a scaled-up number representation, where the number of shifts holds the amount of the scaling. The CPU does not recognize this fact because we are using regular integers as the native storage. Therefore, when a CPU performs a multiple between these two larger numbers, the result is an even larger number than the result should be in the fixed-point form. For example, if the two fixed-point numbers are meant to be holding the number 1.0, in the same 16-bit shift format used earlier, the actual integer number would be 65536. Multiplying 65536*65536 (2^16 * 2^16) yields 2^32. After shifting back to get out the regular integer, we get 65536 or 2^16 again as the final value, but it should still be 1. The value is scaled up because of the multiply operation. To fix this, each result must also be shifted to the right as the multiply operation is performed.

fixedPointNumber = (int) (fixedPointNumberA * fixedPointNumberB)  >> 16;

Division is handled similarly, but the shift is performed before the division and only to the dividend.

fixedPointNumber = (int) (fixedPointNumberA << 16) /  fixedPointNumberB;

Because we are basically implementing our own custom math operations and the CPU doesn’t really know the representation natively, problems arise, such as integer overflow and precision loss, that must be accounted for and that further complicate a complete fixed-point set of operations.

Sometimes fixed-point representations are good for packing data where precision can be traded for data space, particularly in the cases of animation data, I/O, and networking. 16-bit fixed-point values require half the memory storage that single-precision floating-point values do. Although Java does not directly support 16-bit integers, two can be packed in a single 32-bit integer, doubling the data throughput as compared to floating-point data streams. In fact, fixed-point data packing can be used to pack even more data, as long as numeral precision can be sacrificed or the numeral range of the data is already limited. Using fixed-point representations to pack, stream, and store data is an excellent alternative to standard floats or doubles, even on systems that support floating-point math natively, when conserving memory usage and data bandwidth becomes a priority.

For more on fixed-point representations and operations, see the resources section.