2D Drawing and DirectX | Game Coding Complete

I mentioned DirectX 8 a moment ago and you may have wondered why. 2D drawing and DirectX have had a shaky relationship for the past few years. Since DirectX version 8 has been introduced, the entire drawing pipeline is 3D. The DirectX team at Microsoft was motivated to simplify the DirectDraw and Direct3D APIs, specifically concerning textures and DirectDraw surfaces, both essentially 2D arrays of pixels. DirectX 8 eliminated the entire interface for DirectDraw, leaving any 2D games to stay with DirectX 7 or port their code to a 3D engine. DirectX 9 corrected this problem, but only somewhat. The DirectDraw API is part of DirectShow, and can't be readily used with Direct3D.

When I wrote this chapter I considered writing it using DirectX 9, since it is the API that is the latest and greatest from Microsoft. After about 50ms, I decided to use the DirectX 7 interface with the examples. The DirectX 7 code in this chapter is well tested and shows important concepts that will be useful for any game that needs to manipulate pixel surfaces. Just so you know, DirectX 7 is old, but these calls will still work fine on any machine with DirectX 7, 8, 9, and so on.

Pixels and Video Hardware

Here's a bit of inane trivia: "Pixel" is shorthand for "picture element." The term was coined when video displays were more than oscilloscopes but something less than flat plasma displays. It's not surprising that since humans developed video displays they are uniquely tuned to human physiology, but governed by physics. What humans perceive as color is light, or more specifically, electromagnetic radiation, in a narrow band of wavelengths. Humans can perceive color from deep reds at longer wavelengths to violet at shorter wavelengths. Not surprisingly, the color we call yellow is almost in the exact center of these two extremes, and it is almost exactly the color of our sun.

Two or more different colors of light can blend to form a new color. To represent the widest possible color range, you need to choose three colors: red, green, and blue. Mix these colors in varying proportions and you'll cover most, but not all, of the perceivable colors visible to the human eye.

Color depth is the term that describes the number of different values that can be assigned to a single pixel. Table 6.1 lists the three common formats for color depth.

Table 6.1: The Three Common Formats for Color Depth.
Channel Depth	Colors	Bit Depth
R-8 G-8 B-8 Masks: 0x00FF0000, 0x0000FF00, 0x000000FF	16.7 million	32-bit
R-5 G-5 B-5 Masks: 0x7C00, 0x03E0, 0x001F	32,768	16-bit
R-5 G-6 B-5 Masks: 0xF800, 0x07E0, 0x001F	65,536	16-bit

The sequence of bits represent the intensity of the different colors. For example, the 32-bit pixel contains 8-bits each for red, green, and blue. If all of the bits for red are set to zero, the resulting color will have absolutely no red component at all. If the bits were set to a binary value of 10000000b, the red component would be at exactly 50% intensity.

The number of bits for each component affect the number of discrete values of intensity from 0% to 100%. If the red component has 8 bits, red can have 256 intensity settings. If the red component had 5 bits, it could only have 32 different values. Another way to describe this is that pure black and white can always be represented, since all the color bits can be either zero or one, but if you want to get an exact match for Barbie pink, it might be troublesome if you only have 5 bits for red.

Believe it or not, there are some video cards that support a 48-bit mode, 16 bits per color channel. This gives you more than 281 trillion colors, and an uncompressed 800x600 screen would take 2.74Mb! I'm sure an artist had something to do with the design of that video card.

Gotcha

We also still have palettized, or indexed formats to deal with. Here programmers and artists are forced to choose 256 colors, each color 24-bit, with which they can use to draw their screen. Palettized modes are a pain to work with and generally don't look very good if they have to be applied to an entire screen. When we moved to 16- and 32-bit displays all you heard was cheering from every programmer and artist I knew. It is still common practice to store individual pieces of art for backgrounds, textures, or sprites in an 8-bit palletized format, since it keeps a high degree of accuracy in the art and saves space at the same time.

Now that you know how each pixel is stored, let's move on and learn a little about video hardware.

Video Hardware and Buffers

A video card is essentially a tool to manipulate and examine two-dimensional arrays of pixels. At their simplest, video cards simply scan the contents of their memory and convert each pixel to an electrical signal sent to a CRT. The scan rate is usually 60Hz or faster; 50Hz if you happen to live abroad in Europe, which uses the PAL standard. The scan rate is timed to coincide exactly with the scan rate of the electron guns in the CRT. When the contents of the video memory change, the results are displayed the next time the pixel is scanned.

You might have heard the term "vertical blank" and wondered what it meant. The vertical blank is a signal sent from the monitor to the video card each time the electron guns skip back to the top of the display. It's important to change the screen pixels only when the electron guns aren't drawing pixels, and the vertical blank is your signal to make any screen changes. If you don't, you'll witness an effect called "tearing."This is what happens when you change pixels in video memory while the electron guns are scanning the image. You'll notice a small discontinuity in your shapes as shown in Figure 6.1.

click to expand
Figure 6.1: How Tearing Can Impact a Display.

This happens because you're not changing all the pixels fast enough. Since you can't change all the pixels at once, the electron guns will inevitably draw some pixels of the old frame and some pixels of the new frame. This drawing technique is called single buffering, since there's one and only one buffer that holds your pixels. In a single buffering solution, your only hope of avoiding tearing is to change all of the pixels in the video buffer in one lightning quick move, while the electron guns are moving back to the top of the screen. Even in the dark ages (circa 1990), an off-screen buffer held the contents of the next frame. This buffer was stored in regular system memory. All the draw code changed the off-screen buffer while the video card drew the contents of video memory over and over again.

When the off-screen buffer, sometimes called a backbuffer, was completely rendered it was copied to the video memory or frontbuffer. Copying the bits from the backbuffer to the front buffer is fast enough to happen completely during the vertical blank, so all the programmer had to do was wait for the vertical blank signal and copy the bits. Old Origin lingo called this process "slamming" but today's vernacular calls it "blitting."

DirectX's WaitForVerticalBlank() method is used to tap into the monitor signal:

 m_pDD->WaitForVerticalBlank(DDWAITVB_BLOCKBEGIN, NULL);

The problem with this method is that copying the bits from the backbuffer to the front buffer was dependant on waiting for the vertical blank signal. On average, the system would wait 1/120^th of a second every frame, if the monitor scanned at 60Hz. A few hardware solutions attempted to minimize this problem by providing faster bit transfer rates over better bus architectures or even direct memory access—video cards could "pull" the bits out of main memory without the CPU's involvement.

These solutions were ok, but still fell short. Video cards soon installed more memory on their boards, enough to keep more than two full screens of pixels. The pixel scanning could be programmed via the video driver to begin scanning at any point in video memory. Thus it became possible to construct the backbuffer on the video card and when it was done, change the scanning start address to the backbuffer. The area of memory that was previously displayed was now free to be the scratchpad for the next frame.

Reassigning the front and backbuffers each frame is known as flipping. The best part about it is how fast it is; the huge bit transfers from the back buffer to the front buffer are a thing of the past. The CPU does have to wait for the go ahead to begin modifying the backbuffer. The vertical blank signal is still the green light to change pixels, until the flip actually occurs you might be changing pixels the player can see.

Gotcha

One quick aside for beleaguered DirectX programmers: The flipping chain is smart enough to reassign internal data so you don't have to. The member variable that points to your front buffer always points to your front buffer; it doesn't matter how many times you call Flip(). I think this confused me when I first saw it. I had an incredible urge to do something like this:

 hr = m_pddsFrontBuffer->Flip( NULL, 0 ); LPDIRECTDRAWSURFACE7 temp = m_pddsFrontBuffer; m_pddsFrontBuffer = m_pddsBackBuffer; m_pddsBackBuffer = temp;

Don't go down that road because it is completely incorrect. DirectX flipping chains take care of this for you—your back buffer is always your back buffer, and your front buffer is always your front buffer, no matter how many times you flip.

Now you know about single buffers, double buffering, and double buffered flipping, but there's still one more technique: triple buffering. Triple buffering adds one more screen to the flipping chain, so that after three flips the original buffer is the front buffer. The advantage to triple buffering is that after the flip, the CPU can move directly to constructing the pixels for the next frame. You don't have to wait for the vertical blank. The newly constructed frame and the previously constructed frame exist on the other two buffers—the scratch buffer hasn't seen the light of day for two flips.

What's the downside? Double and triple buffered surfaces require more memory, that's pretty clear. Since they take more video memory, there's less left over for other things such as commonly used sprites or textures that need to exist in video memory for ultra fast drawing.

Best Practice

One more note about flipping surfaces: the Windows GDI only draws to one surface. If you ever call a GDI function, such as MessageBox(), you must call the DirectX method FlipToGDISurface() beforehand. If you don't, you take your chances that the GDI will draw to the wrong surface. One more thing: The surface that you are about to flip to, the one that GDI draws on, had better look right. If it has an old frame from two flips ago, your player will notice.

Video and Memory Surfaces

Video and memory surfaces are two-dimensional arrays of pixels. They differ only in the details of their internal structure, which is usually hidden from the programmer by a class. While the structure may be, and should be, hidden it's important to understand their differences so you'll recognize the source of problems.

Video surfaces don't organize pixels for memory efficiency; they organize them for speed. This effects two things: the effective space taken up by a pixel, and the effective space used by a scan line. 24-bit color isn't really 24-bit when it gets to the video card. Video hardware stores 24-bit pixels in 32 bits, or four bytes. It may seem like a waste of memory but the added speed is well worth it. Dividing by three is a lot slower than shifting by two, which is what the hardware would have to do to find pixel addresses in a 24-bit system.

Scan line width is not necessarily the same as the number of bytes per pixel multiplied by the number of pixels on the scan line. Scan lines are aligned on convenient boundaries, defined by the video card engineers. This boundary might be four bytes, eight bytes, or more. The number of bytes actually used by a scan line, which includes the pixel data and zero or more bytes at the end, is called pitch. Figure 6.2 provides an image to help you detect the difference.

click to expand
Figure 6.2: The Difference between Width and Pitch.

Our friend from Diablo II is sitting on a DirectDraw surface with a width of 78 bytes. When the surface was created, the DirectDraw system saw fit to allocate the surface with a pitch of 128 bytes. If you think this is a colossal waste of memory, you're right. Depending on your display driver, it might assign surface memory only in byte widths that conform to powers of two. In any case, always be aware that your pitch might be vastly different than your width.

Gotcha

If you ever see a piece of art that looks like an old TV set with a scrambled and slanted picture, you've copied the pixels without taking the pitch into account. The slant will go from top-left to bottom-right if the destination pitch is bigger than the source, and from top-right to bottom-left if the destination pitch is smaller than the source.

Window's BMP format also does this. Each line of the BMP aligns on four byte boundaries. If you can't seem to parse bitmap files and all you get is funny slanted art, check your code and I'll bet it assumes that the pitch is equal to the width, which it's not.

Memory surfaces are organized any way the programmer chooses. They might reflect the bigger but speedier arrangement of video memory or they might compress the image into as small a memory space as possible. How do you go about creating a surface? You can look at DirectX samples all day if you just need to know how to fill in a DDSURFACEDESC but there are a few tricks you should know. The following code should puts these tricks to work:

 LPDIRECTDRAWSURFACE7 CreateSurface( DWORD dwWidth, DWORD dwHeight,                                     DWORD dwCaps, BOOL favorVRAM ) {    if( NULL == g_pDD )                     // pointer to the direct draw object    return NULL;    HRESULT        hr;    LPDIRECTDRAWSURFACE7 pSurface = NULL;    DDSURFACEDESC2 ddsd;    ZeroMemory( &ddsd, sizeof( ddsd ) );    ddsd.dwSize         = sizeof( ddsd );    ddsd.dwFlags        = DDSD_CAPS | DDSD_WIDTH | DDSD_HEIGHT;    if (favorVRAM && dwHeight<=g_ScreenHeight && dwWidth<=g_ScreenWidth )    {      ddsd.ddsCaps.dwCaps = DDSCAPS_VIDEOMEMORY;    }    else    {      ddsd.ddsCaps.dwCaps = DDSCAPS_SYSTEMMEMORY | DDSCAPS_OFFSCREENPLAIN;    }    ddsd.ddsCaps.dwCaps |= dwCaps;    ddsd.dwWidth = dwWidth;    ddsd.dwHeight = dwHeight;    if(FAILED (hr= g_pDD->CreateSurface( &pSurface, &ddsd, NULL ) ) )    {    ddsd.ddsCaps.dwCaps =      dwCaps | DDSCAPS_OFFSCREENPLAIN | DDSCAPS_SYSTEMMEMORY;    if(FAILED (hr= g_pDD->CreateSurface( &pSurface, &ddsd, NULL ) ) )    {      return NULL;      }    }    return pSurface; }

There are a few critical things in this function. First, the function is smart enough to check the validity of the width and height of a surface that wants to live in video memory. There are a few video cards out there that freak out when you allocate a surface wider or higher than the current screen resolution. Second, if an attempt to create a surface in video memory fails, the function will attempt to load the surface into system memory. The only problem with system memory surfaces is that they'll run slower on some machines, sometimes much slower. Your game might follow the above code sample and fail over to the system memory surface, but another valid solution is to fail entirely. You might have a video memory leak, use too much video memory, or are running on a machine with a lame video card.

Now that you've got a surface to draw to, you'll want some nice drawing functions.