Texture Optimization | Core Techniques and Algorithms in Game Programming2003

Using many quality textures is essential to creating believable, good-looking levels. But textures are costly. A single 256x256 RGB texture map takes as much as 192KB. If you want mipmapping on top of that, your map's memory footprint will double. And, let's face it, you will need dozens of different maps to create each level. Stop for a second and try to estimate how much memory you are going to need to store all your texture maps. A fellow developer once told me he was using between 40 and 100 maps per level, and that was an indoors game! At some point in time (better sooner than later), you will discover you have less memory in your target platform than you actually need. This is especially true in consoles, where memory is a precious resource, but is valid for PC game development as well. PCs have larger memory pools on the graphics subsystem, but buses are slower, and thus reducing memory sizes is of primary importance here as well. You could try to implement a caching policy, which is a good move, but as you will discover in the next section, that does not solve all your problems. Moving textures up and down the bus will then become a performance killer.

Clearly, you need to think very carefully about how much memory you will dedicate to texture maps. In this section, we will first explore several simple ideas that can greatly reduce your texture's footprint. Most of the time, we waste precious bits by storing textures inefficiently. Once you are able to keep memory use to the bare minimum, we will go on to explore real-time texture compression techniques.

The first option you can explore is to reduce color depth in your maps. Most APIs support this feature internally, so you only have to specify this at load time, and the GPU will take care of it. The advantage is clear: smaller maps. The downside is a perceived quality loss, which in most cases will be unnoticeable anyway. One of the most popular reduced-color formats is RGB4 (RGB, four bits per component), which needs 12 bits per pixel (and 16 bits per pixel in RGBA4). This mode works out very well in textures that do not have subtle color gradients, which luckily are the vast majority. The uniform gradient in the left picture is lost in the picture on the right, so this would be a very good example of when this technique should not be used. Subtle color changes do not reduce well. But most textures will yield better results. In Figure 18.4, you can see two maps that look almost exactly the same. One map is encoded using RGB4, and the other is standard RGB. Can you tell the difference? Even if you can, remember that graphics cards perform filtering in RGB space anyway, so they will help even more to make the color reduction unnoticeable. By the way, the RGB4 map is the one on the right.

Figure 18.4. A gradient seen in RGBA8 (left) and RGBA4 (right) formats.

graphics/18fig04.jpg

Several reduced-color formats exist, and RGB4 just happens to be very popular. You can, however, use other formats in specific situations. For example, you can store illumination-only textures by using ALPHA4 format (alpha only, four bits per pixel). In the end, the rule of thumb is to examine your API and select the format that gives the right visual appearance using less space. DirectX is the winner here because it provides many exotic formats that are very useful in specific situations.

Taking this color-reduction scheme to the extreme, we can directly use palletized texture maps. These are 8-bit bitmaps where each pixel represents not the RGB value, but the index to a 256-entry table, much like in classic quantization schemes. Palletized maps thus occupy approximately one-third the size of the RGB texture map (plus the space for the palette). Again, deciding when to use palettes is a matter of knowing your textures. Gradients and rich-color maps might degrade significantly, but many maps will look great in 8 bits. Also, 8-bit terrain textures usually look great when applied to some kind of detail textures and lighting. As usual, having both options in your game engine is surely the best policy.

Texture Compression

Once all your options have been considered and tried, you might still need to reduce texture sizes even further. If so, it is then time to implement texture compression. Most likely, your textures will have already shrunk using color-reduction techniques, so compression will be icing on the cake. Using compression when most of your texture space is poorly managed is absurd.

Texture compression takes maps in compressed formats and only decompresses them internally on the GPU for display. Thus, both the file size and the memory footprint are smaller than with uncompressed maps. The disadvantage is that compressed textures have a small performance hit, because the GPU needs extra clock cycles to map the texels to the screen. Several compression schemes exist. Some of them are proprietary, and some are public. Similarly, some provide lossless compression, whereas others degrade the texture quality in the process.

DirectX provides very good support for texture compression, which is elegantly built into the core API. For example, the following line loads a texture map with full properties included:

 D3DXCreateTextureFromFileEx(pd3dDevice, strPath, D3DX_DEFAULT,  D3DX_DEFAULT, D3DX_DEFAULT,  0,  D3DFMT_UNKNOWN, D3DPOOL_MANAGED,  D3DX_FILTER_TRIANGLE|D3DX_FILTER_MIRROR,    D3DX_FILTER_TRIANGLE|D3DX_FILTER_MIRROR, 0, NULL, NULL, ppTexture );

The beauty of this approach is that on a single call we are setting the texture object we want to load, the source data, the filtering options and mipmapping options, and texture compression, which is autodetected from the file we pass as a parameter. Thus, the programmer does not need to worry about such details. OpenGL developers will have a slightly more involved time because they will need to manually handle compression. This used to be done through the ARB_texture_compression extension, but is now part of the OpenGL core. The main call sequence to this extension is

 ddsimage = ReadDDSFile("flowers.dds",&ddsbufsize,&numMipmaps); glBindTexture(GL_TEXTURE_2D, already_compressed_decal_map); glCompressedTexImage2D(GL_TEXTURE_2D, 0, ddsimage->format, ddsimage->width, ddsimage->height,      border, image_size, ddsimage->pixels);

which directly downloads a compressed texture in Direct Draw Surface (DDS) format to OpenGL. Remember that because DDS is part of DirectX, we need to do an include if we want to access textures in this format. DDS files for both DirectX and OpenGL can be created using DxTex, a tool included in the DirectX SDK.

Texture Caching and Paging

Despite all color reduction and compression efforts, in many cases you simply will not be able to keep all the textures in video memory. Remember that your video memory is a limited resource, and it must also accommodate the different buffers (two for double buffering, plus Z-buffer, plus stencils, and so on) as well as resident geometry. As a result, most games use more maps than they can fit in the GPU. So clearly, some caching policy must be implemented.

A commonsense rule of thumb is that the set of textures required to render any given frame (sometimes called the working set) must fit in GPU memory completely. Swapping textures should be kept to a minimum (such as your character traveling along the level and discovering a new area with different maps). Swapping on a frame-by-frame basis is a performance nightmare and should be avoided at all costs.

If this rule is followed, we will have textures that spend a number of frames in video memory; and as they become unused, get swapped out to provide room for newer ones. With this in mind, we can build a caching engine.

The engine will have a memory pool consisting of N textures (the value of N depends on the available physical memory). Each cached texture must consist of a structure like this:

 typedef struct    {    texturedata *texdata;    unsigned long timestamp;    } cachedtexture;

We need the timestamp to control when textures were last accessed, so we can implement some kind of swapping policy. So, for each rendered frame, we must update the timestamp attribute for each texture map that's actually used. This has a very low CPU cost and allows us to keep track of which textures are "fresh" (are being queried all the time) and which are "getting old."

Whenever we get a texture request, we need to do the following:

 look if the texture is in the cache if it is, use it and update the timestamp else    // we have a miss    scan the cache, looking for the oldest texture    swap-out the selected texture    swap-in the new one end if

Notice that we are implementing a last-recently used (LRU) policy. This means we throw away the map that was last used, and use that space to allocate the new map. An alternative policy, first in, first out (FIFO), ensures that the oldest map in the cache is discarded, no matter how recently it was used. Because this cache engine will have a very low performance hit (except for the actual swapping, which will depend on your bus speed), we can use LRU, which is the best performing algorithm because it generates the least number of misses.

In addition, notice that we need a cache that is at least able to hold the textures required to render a single frame (plus some extra space to have some margin). Cache engines cannot do their job on a per-frame basis. The number of misses must be one in every N frames, with larger N values yielding better performance.