Art and Sound Formats | Game Coding Complete

Left to their own devices, artists would hand you every sprite and texture they create in a TIF or TGA file. The uncompressed 32-bit art would look exactly like the artist envisioned. When you consider that a raw 32-bit 1024x768 bitmap tips the scales at just over 3Mb you'll quickly decide to use a more efficient format. The same is true for sound files and cinematics.

As always, you'll generally need to trade quality for size, and sometimes load time will need to be considered. The best games choose the right format and size for each asset. You'll be better at doing this if you understand how bitmaps, textures, and audio files are stored and processed, and what happens to them under different compression scenarios.

Bitmaps and Textures

Different bitmap formats allocate a certain number of bits for red, green, blue, and alpha channels. Some formats are indexed, meaning that the pixel data is actually an index into a color table that stores the actual RGBA values. Here's a list of the most common formats:

32-bit (8888 RGBA): The least compact way to store bitmaps, but retains the most information.
24-bit (888 RGB): This format is common for storing backgrounds that have too much color data to be represented in either 8-bit indexed or 16-bit formats, and no need for an alpha channel.
24-bit (565 RGB, 8 A): This format is great for making nice looking bitmaps with a good alpha channel.
16-bit (565 RGB): This compact format is used for storing bitmaps with more varieties of color and no alpha channel.
16-bit (555 RGB, 1A): This compact format leaves one bit for translucency, which is essentially a chroma-key.
8-bit indexed: A compact way to store bitmaps that have large areas of subtly shaded colors; some of the indexes can be reserved for different levels of translucency.

DirectX supports virtually any combination of pixel depth in each red, blue, green, and alpha channel. While it would be possible to store some oddball 24-bit format like 954 RGB and 6 bits for alpha, there's no art tool in the world that would let an artist look at that art and edit it.

Best Practice

If you write your game to make it difficult for artists to do their job, your game art will suck. Instead, spend some time to make your game use the same art formats used by popular art tools like Photoshop. Your game will look exactly the way the artists intended it to look. You'll also be able to find artists who can work on your game if you stick to the standard formats and tools.

Which Is Better: 24-, 16-, or 8-Bit Art?

It's virtually impossible to choose a single format to store every bitmap in your game and have all your bitmaps come through looking great. In fact, I can assure you that some of your bitmaps will end up looking like they should be in your laundry pile.

Figure 8.1 shows three different bitmaps that were created by drawing a grayscale image in Photoshop. The bitmap on the far left uses 256 colors. The center bitmap is stored using 32 different colors, while the one on the right is stored using 16 colors. If you attempt to store a subtly shaded image using too few colors you'll see results closer to the right bitmap, which looks crummy.

click to expand
Figure 8.1: Grayscale Banding Patterns for Different Bit Depths.

The image in the middle is analogous to a 16-bit image, in that these images have five or six bits to store their colors, and you'll likely see some banding. One the other hand, if you can use an 8-bit color range for each color channel, you'll see the best result, but you'll trade this quality for bigger art assets. Needless to say, if your artist storms into your office and wonders why her beautiful bitmaps are banded all to hell you've likely forced them into a bad color space.

Using Lossy Compression

A discussion of art storage wouldn't be complete without taking a look at the effects of using a lossy compression scheme, such as JPG. The compression algorithm tweaks some values in the original art to achieve a higher compression ratio, hence the term "lossy." It's not a mistake that if you spell-check the word lossy you get "lousy" as one of your choices. Beyond a certain threshold, the art degrades too much to get past your QA department, and it certainly won't get past the artist that spent so much time creating it.

Perhaps the best approach is to get artists to decide how they'll save their own bitmaps, using the highest lossiness they can stand. It still won't be enough, I guarantee you, but it's a start.

Sound and Music

Sound formats in digital audio are commonly stored in either mono or stereo, sampled at different frequencies, and accurate to either 8 or 16 bits per sample. The effect of mono or stereo on the resulting playback and storage size is obvious: Stereo sound takes twice as much space to store but provides left and right channel waveforms. The different frequencies and bit depths have an interesting and quite drastic effect on the sound.

Digital audio is created by sampling a waveform and converting it into discrete 8- or 16-bit values that approximate the original waveform. This works because the human ear has a relatively narrow range of sensitivity: 20Hz to 20,000Hz. It's no surprise that the common frequencies for storing WAV files is 44KHz, 22KHz, and 11KHz.

It turns out that telephone conversations are 8 bit values sampled at 8Khz, after the original waveform has been filtered to remove frequencies higher than 3.4Mhz. Music on CDs is first filtered to remove sounds higher than 22Khz, and then sampled at 16-bit 44Khz. Just to summarize, Table 8.1 shows how you would use the different frequencies in digital audio.

Table 8.1: Using Different Audio Frequencies with Digital Formats.
Format	Quality	Size per Second	Size per Minute
44.1 KHz 16 bit stereo WAV	CD quality	172Kb / second	10Mb / minute
128 Kbps stereo MP3	Near CD quality	17Kb / second	1Mb / minute
22.05 KHz 16 bit stereo WAV	FM Radio	86Kb / second	5Mb / minute
64 Kbps stereo MP3	FM Radio	9Kb / second	540Kb / minute
11.025 KHz 16 bit mono WAV	AM Radio	43Kb / second	2.5Mb / minute
11.025 KHz 8 bit mono WAV	Telephone	21Kb / second	1.25Mb / minute

Best Practice

Use lower sampling rates for digital audio in your game to simulate telephone conversations or talking over shortwave radio.

Video and Cinematics

Animated sequences in games goes as far back as Pac Man, when after every few levels you'd see a little cartoon featuring the little yellow guy and his friends. The cartoons had little or nothing to do with the game; but they were fun to watch and gave players a reward. One of the first companies to use large amounts of video footage in games was Origin Systems in the Wing Commander series, and more than giving players a reward they actually told a story. Introductions, midgame, and endgame cinematics are not only common in today's games, they are expected.

There are two techniques worth considering for incorporating cinematic sequences. Some games like Wing Commander III will shoot live video segments and mix them into 3D rendered backgrounds using digital editing software like Adobe Premiere. The result is usually an enormous .AVI file that would fill up a good portion of your optical media. That file is usually compressed into something more usable by the game.

The second approach uses a game engine. For example, Grand Theft Auto: Vice City used the Renderware Engine from Criterion Studios to create all their sequences in 3D Studio Max or Maya and exported the animations. The animations can be played back by loading a tiny animation file and pumping the animations through the rendering engine. They only media you have to store beyond that is the sound. If you have tons of cinematic sequences, doing them in-game like this is the way to go.

The biggest difference your players will notice is the look of the cinematic. If a animation uses the engine your players won't be mentally pulled out of the game world. The in-game cut-scenes will also flow perfectly between the action and the narrative as compared to the pre-rendered cut-scenes which usually force some sort of slight delay and interruption as the game-engine switches back and forth between in-game action and retrieving the cut-scene from the disc or hard drive. As a technologist, the biggest difference you'll notice is the smaller resulting cinematic data files. The animation data is tiny compared to digital video.

Sometimes you'll want to show a cinematic that simply can't be rendered in real time by your graphics engine—perhaps something you need Maya to chew on for a few hours in a huge render farm. In that case you'll need to understand a little about streaming video and compression.

Streaming Video and Compression

Each video frame in your cinematic should pass through compression only once. Every compression pass will degrade the art quality. Prove this to yourself by compressing a piece of video two or three times and you'll see how bad it gets even with the second pass. If you can store your video completely uncompressed on a big RAID array on your network you'll also get a side benefit. Unless you have some hot video editing rig, it's really the only way you can manipulate your video frame-by-frame, or in groups of frames. Here are some settings I recommend:

Source Art and Sound: Leave it uncompressed, 30fps; store art in TGAs or TIFs, and all audio tracks in 44KHz WAV.
Compression Settings: Balance the tradeoff between data size and accuracy.

One drawback to storing video uncompressed is the size. A two-minute, 30fps video sequence at 800x600 resolution and 24-bit color will weigh in at just under 5Gb. You'll realize that simply moving the frames from one computer to another will be a royal pain in the ass.

Best Practice

If you need to move a large dataset like uncompressed video from one network to another, use a portable computer. It might make security conscious IT guys freak out, but it's a useful alternative if you don't already have a DAT drive or DVD burner around. This is modern day "Sneakernet."

Don't waste your time backing up uncompressed video files. Instead, make sure you have everything you need to recreate them such as a 3DStudio MAX scene file, or even raw video tape. Make sure the source is backed up, and the final compressed files are backed up.

Compression settings for streaming video can get complicated. Predicting how a setting will change the output is also tricky. Getting a grasp of how it works will help you understand which settings will work best for your footage. Video compression uses two main strategies to take a 5Gb two minute movie and boil it down into a 10Mb or so file. Just because the resolution drops doesn't mean you have to watch a postage stamp sized piece of video. Most playback APIs will allow a stretching parameter for the height, width, or both.

The first strategy for compressing video is to simply remove unneeded information by reducing the resolution or interlacing the video. Reducing resolution from 800x600 to 400x300 would shave 3Gb from a 4Gb movie, or 25% of the original size. An interlaced video alternates drawing the even and odd scanlines every other frame. This is exactly how television works; the electron gun completes a round trip from the top of the screen to the bottom and back at 60Hz, but it only draws every other scanline. The activated phosphors on the inside of the picture tube persist longer than 1/30^th of a second after they've been hit with the electron gun, and can therefore be refreshed or changed at that rate without noticeable degradation in the picture. Interlacing the video will drop the dataset down to one-half of its original size. Using interlacing and resolution reduction can make a huge difference in your video size, even before the compression system kicks in.

Video compression can be lossless, but in practice you should always take advantage of the compression ratios even a small amount of lossiness can give you. If you're planning on streaming the video from optical media you'll probably be forced to accept some lossiness simply to get your peak and average data rates down low enough for your minimum specification CD-ROMs. In any case, you'll want to check the maximum bitrate you can live with if the video stream is going to live on optical media. Most compression utilities give you the option of entering your maximum bit-rate. The resulting compression will attempt to satisfy your bit-rate limitations while keeping the resulting video as accurate to the original as possible. Table 8.2 shows the ideal bit rate that should be used for different CD-ROM and DVD speeds.

Table 8.2: Matchine Bit Rates with CD-ROM/DVD Speeds.
Technology	Bit Rate
1x CD	150 Kbps
1x DVD.	1,385 Kbps
32x CD	4,800 Kbps
16x DVD	2.21 Mbps