Application | Core Techniques and Algorithms in Game Programming2003

Now we enter the arena of tuning specific subsystems, which is where most performance is hidden. We will begin by tuning the application/game logic phase. Among all the subsystems, this is the one that requires deeper knowledge about what the application is doing at each step. Let's face it, performance gains in the application stage rarely come from coding a routine straight into assembly. Algorithmic optimization can yield much better results. Rearranging data, simplifying some AI processes, and so on can produce significant speed increases.

My first suggestion is to use monitoring and profiling techniques to create a statistical chart of where time is being spent. Identify major subsystems and assign a percentage of CPU time to them. These will include collision detection, AI, network (for multiplayer titles), sound, input processing, and so on. Now focus on those subsystems that show the greatest potential. Optimizing everything is most likely unfeasible because of time constraints.

If your problem lies within the AI code, start by realizing that AI time is usually computed as:

 Time=Time for Idle AI*number of idle AIs + Time for active AI*number of active AIs

Typically, not all AI entities are active simultaneously. Those far from the viewer are in an idle state, which makes them consume less resources (think of an enemy located two miles away from the viewer). These AIs just need very quick routines that are used to discard them. Reducing the global number of AIs will definitely improve performance, but at the cost of sacrificing gameplay. Thus, my advice is to focus on three areas:

Discarding idle AIs faster
Discarding more AIs as idle
Accelerating the active AI code

Thus, you can implement better spatial subdivision and indexing to ensure that the test that determines whether an AI is considered active or idle is faster. Dividing your world into chunks or zones and storing the AIs in each zone in the same structure, for example, ensures that the test can be done trivially. As you compute the active zones for graphics processing (by using portal rendering, BSPs, and so on), you are simultaneously determining which AIs will be activated. Any approach that has to do with manually scanning AI lists to find out which ones to keep on and off is inherently evil because performance will degrade if the number of AIs increases.

Another option is to lower the threshold at which you deactivate AIs. This makes the number of active (and hence CPU hungry) AIs lower and improves performance as well. You can either consider AIs as fully idle and fully active or implement an AI level-of-detail (LOD) policy. Using AI LODs involves creating several copies of the AI routine, which carry out the same actions with different levels of detail. In a game involving lots of AIs, such as a 3D RTS, a good example would be a crossings processor. In our example, a fully active AI will have a path finding module, which will compute paths through the level, and a low-level crossing processor, which resolves those situations in which two units must cross each other. For a semiactive unit, we still need the path finding code (gameplay would be broken if these units could cross obstacles). But most likely, we will be able to shut down the crossing processor. We won't be watching distant units as closely, and after all, skipping a crossing (and simply running into each other) will probably not affect gameplay that much. This way we can reduce CPU cost.

An alternative option is to actually accelerate the cost of the high-detail, active AIs. This requires careful analysis and must be tackled on a case-by-case basis. Maybe your AI collision detection is too precise and slow, and can be sped up. Maybe the rule system has too many rules. Maybe the script interpreter is simply too slow. Again, know your code and try to make the most of it.

Another area within the application stage to watch out for is audio processing. The cost of this stage can skyrocket if you stack too many audio tracks, or the compression standard uses too much CPU resources. Formats like MP3 require complex codecs, which take up significant system resources. Some platforms will come with hardware-assisted decoders, but watch out for performance if you begin stacking sound layers. Under software emulation mode, performance decreases linearly to the number of channels.