Speed-Up Techniques | Core Techniques and Algorithms in Game Programming2003

Any hardware is condemned to be performance limited, so our systems' goals must be reasonably limited.

But sometimes we can reach higher and faster by careful planning and analysis of the system we are to model. Many techniques can be used in order to speed up our calculations, and thus allow us to be able to create unique, complex particle systems. In this section, I will detail a variety of tips and tricks that can multiply the richness and quality of your results. Note, however, that there is no silver bullet in this advice. Only a deep knowledge of what you are modeling can help you determine which of these techniques might work best for you.

Avoid Malloc and Free

The first optimization for particle systems is based on the observation that allocating and freeing memory is a slow process. Thus, calls to memory management routines inside the system's loops should be avoided as much as possible. Although very efficient memory managers have a significant overhead for each allocation or deallocation of memory. They must scan the free blocks of memory, select the best-suited one (according to different policies), and so on. So, taking some time to plan your code is the best way to achieve top performance.

For example, freeing memory each time a particle dies (or allocating more memory for newborn particles) can have a significant performance hit, especially in those systems with large particle numbers or short life spans.

An alternative to malloc and free (or new and delete, if you lie in C++ land) must be found in order to avoid their cost. The easiest approach is to map our system to a static array of particles. Then, each time a particle gets killed, we will mark it (using one Boolean flag). This way the deletion routine can be skipped. We do not need to be allocating memory for new particles all the time. They are placed in the same array positions as dead particles instead. Using this technique, the particle system can contain a constant number of particles, avoiding calls to malloc and free.

From the previous discussion, you might think we need to loop the array to determine where to put a newborn particle. But it needn't be the case. One loop can recalculate, kill, and respawn in an efficient manner. Here is the code idea:

 for each particle in the system    if particle is dead          respawn the particle[i]       else          recalc particle[i]       end if end for

Spatial Indexing

Spatial indexing (explained in detail in Chapter 4, "Design Patterns") allows us to quickly perform queries that have to do with spatial locations. Tests like calculating the distance from one point to a set of objects, selecting closest pairs in a spatial set, and so on can be accelerated considerably by using them. Thus, using a spatial index should allow us to speed up global particle systems significantly.

Remember that, essentially, a particle system with global effects has a worst-case cost of

 O(# of particles^2)

This comes from the fact that each particle must be tested against the others to account for interdependencies. So, a spatial index allows us to keep track of neighboring relationships, reducing the cost to

 O(# of particles*k)

where k is a constant that depends on the quality of the spatial index. For example, for a gridded, 2D system with a homogeneous distribution of the particles, we have

 K= # of particles / (Xs*Zs)

where Xs and Zs are the number of cells in the grid in X and Z respectively. Clearly, a fine-grained grid can help a lot in reducing a particle system's recalculation cost.

LOD Particle Systems

Particle systems often consist of several hundreds or thousands of elements, each one requiring dynamic updates. Moreover, the methods used to render them are often costly (alpha blending, multipass techniques). But we can achieve significant improvements in performance by using the level-of-detail (LOD) paradigm.

Imagine a smoke plume from a campfire modeled using particle techniques. Viewed from a distance, we might be able to skip some calculations and still get a realistic result. Then, as we move closer, more and more particles can be spawned, and a better interaction model can be applied for extra realism.

Clearly, the system must be carefully tested to ensure a visual consistency along the approach. But the added performance will likely compensate for any small artifacts that might appear.

Now, let's take a look at LOD particle systems in a general manner. Clearly, there are two areas we can work on in terms of LODs. First, we could try to optimize the rendering code. Second, we could try to optimize the recalc/respawn loop.

Optimizing the rendering code for a particle system can be tricky. Additionally, given the speed of the current accelerators on the market, rendering is usually not the bottleneck. Updating the particles and creating new ones as the old ones die is significantly more expensive. Thus, I'll concentrate on the second technique.

Optimizing recalc() by means of LODs is relatively simple. We only need to define what will vary depending on the distance/resolution and how it will vary. The easiest technique is to vary the number of particles, interpolating from a basic particle system with only a few particles when we are far away, to a full-blown system when we are up close. The sequence we will follow will be

Calculate the ideal number of particles depending on distance.
Recalculate the existing particles.
If a particle dies, respawn it only if we are below our ideal particle number.
If we are still below our ideal particle number, scan the empty particles in the array to respawn accordingly.

So, we have an ideal system size that we converge on by means of not respawning particles (to decrease the size) or by creating new ones (to increase detail). A caveat on this technique is that it requires the viewer's speed to be slow compared to the respawn frequency of the particles.

As a counterexample, imagine a jet fighter simulator where we use this technique to simulate smoke coming from ground explosions. The plane "sees" the particle system five miles ahead, where it consists of about 10 particles only. Because the jet fighter is very fast, and the smoke plume is comparatively small, the plane can approach too fast for the system to create new particles, resulting in a poor-looking system.

Another way of optimizing recalculations involves simplifying the inner calculations performed for each particle. Some systems calculate complex equations to ensure a physically and visually correct look. But for distant systems, some calculations can be switched off, and thus save precious clock cycles. I will provide two examples here.

Imagine a physically based particle system, such as smoke plumes interacting with wind. Essentially, the plume rises using a sine wave to displace in X and Z, and some Perlin noise is added to the mix to provide a nice, rich swirling. Noise influence is increased with height, so the top of the plume is more chaotic.

But remember that Perlin noise is a costly function: Each evaluation requires nontrivial math such as trilinear interpolations and Spherical Linear Interpolators (SLERPs). Clearly, not the kind of math you want to do standing five miles away from the system. So, our decision could be to switch off the Perlin noise with distance.

A different example would be global or interactive particle systems, where each particle interacts with the others or with the environment to create a complex behavior. Many times, these interparticle interactions provide subtle detail, such as particle collisions, attractions, and repulsion: leaves falling and colliding with each other, raindrops colliding with the scenario, and so on. On the other hand, they are very costly. Remember that the classic particle recalc loop has a cost of

 O(number of particles)

But an interactive system raises that to

 O(# of particles * # of interactors)

We have seen how spatial indexing techniques can be used to reduce the second member of the cost equation. But many global or interactive calculations can be scaled down (or even completely eliminated) with distance. It is unlikely that the viewer will notice if the leaves are not colliding when standing sufficiently far away from the system.

Shader-Based Particle Systems

One of the best ways to speed up a particle system is to implement the expensive physics simulator on a shader (shaders are covered in Chapter 21, "Procedural Techniques"), so it can be executed in parallel to the CPU. This imposes some restrictions on what can actually be computed, but most effects can be achieved this way.

Shaders are an advanced subject, and covering them here would interfere with the structure of the book. But a complete section on particle systems and shaders can be found in Chapter 21. I suggest you browse that section to learn how to speed up the computation of a particle system using shaders.