Overview of the TL Pipeline | Inside Direct3D (Dv-Mps Inside)

[Previous] [Next]

The process of transformation and lighting is often thought of as a pipeline (though perhaps a factory assembly line would be a better metaphor), in which untransformed and unlit vertices enter one end, several sequential operations are performed on the vertices inside, and then the transformed and lit vertices exit from the other end. Your application sets up the T&L pipeline by specifying several matrices, the viewport, and any lights you want to use. Then the application feeds vertices into the pipeline, which transforms the vertices, lights them, clips them, projects them into screen space, and scales them as specified by the viewport. Vertices exiting the pipeline are considered to be "processed" and are ready to be handed to the rasterizer. The T&L pipeline is depicted in Figure 5-2.

click to view at full size.

Figure 5-2 The T&L pipeline

It is possible to configure the T&L pipeline to skip some or all of the steps shown here. Programmers who have implemented their own transformation and lighting algorithms can disable parts of the pipeline and send vertices that are already transformed and lit to Direct3D. In most cases, however, it's best to use Direct3D's full T&L pipeline. This code has been optimized to make use of all the latest CPU extensions, and on some 3D graphics cards, T&L can be performed extremely quickly by dedicated hardware. For this discussion, let's assume that the programmer wants Direct3D to perform the full T&L process, and let's watch what happens to a vertex as it passes through the various stages of the pipeline.

World Transformation

During the T&L process, all the coordinates of the various objects to be rendered need to be converted into a common coordinate system, known as world space. But it is usually convenient for your program to express the coordinates of each object in its own, local coordinate system, known as model space (or local space). The matrix defining the transformation between model space and world space is called the world transformation matrix. Using model space makes life easier in several ways. For example, it's easier and faster to move an object by simply redefining the transformation from model space to world space than it would be to manually change all the coordinates of the object in world space. Model space also allows instancing, in which you can draw an object, such as a sphere, using one model-to-world transformation, and then draw it again somewhere else using a different model-to-world transformation. The model space also allows more natural local transformations. For example, rotating a sphere around its center is easiest when the origin is at the center of the sphere, regardless of where in world space the sphere is positioned.

The first stage of the T&L pipeline uses the world transformation matrix that you specify to transform the location coordinates in the object's vertices from local space into world space. The world transformation matrix can use any combination of rotations (rotating the object about the x-axis, y-axis, or z-axis), translations (moving the object along the x-axis, y-axis, or z-axis), and scaling (enlarging or shrinking the object). Figure 5-3 shows the relationship between the world coordinate system and a model's local coordinate system.

The most important aspect of the world coordinate system is that it provides a coordinate space that all the 3D objects share instead of requiring a unique coordinate system for each 3D object. Once the vertices are specified in world coordinates, Direct3D doesn't have to remember any local coordinates or deal with any model-to-world transformations. The conversion from the local coordinate system to the world coordinate system is analogous to converting various objects expressed in pounds, kilograms, and tons into grams, providing a common denominator for all the world's objects. If you don't need to use a separate model space and want to specify an object's vertices directly in world coordinates, you can just make the world transformation matrix an identity matrix. This indicates that the object's model space is equivalent to the common world space.

click to view at full size.

Figure 5-3 World transformation (world and local coordinate systems)

View Transformation

World space is still fairly abstract in the sense that the location of the origin and the axes of the space are completely up to the programmer and have no significance to Direct3D. The second stage of the T&L pipeline transforms the vertices from world space into camera space, in which the virtual camera is at the origin and is pointing directly down the positive z-axis. Lights (which are specified in world space) are also transformed into camera space at this stage. Figure 5-4 illustrates this concept.

click to view at full size.

Figure 5-4 View transformation

Lighting

At this point, the effect of any current lights is calculated and applied to the vertices. The lighting code looks at the position of each vertex, its normal vector (the vector pointing away from the polygon containing the vertex), its color, and the current material properties. It calculates the effect of each light on the vertex based on all these factors and on the properties of the light, and stores the resulting color of the vertex back into the vertex structure. From this point forward, Direct3D doesn't need to deal with lights or materials.

Projection Transformation and the Viewing Frustum

The next stage of the pipeline scales the objects in the scene based on their distance from the viewpoint specified. This scaling, called projection transformation, produces the appearance of depth in the rendered scene by making objects in the distance appear smaller than those closer to the viewpoint. After the projection transformation is applied, the vertices are considered to be in projection space.

To understand how the projection transformation works, it is helpful to think about the viewing frustum. A frustum is the geometric term for a pyramid with its pointed tip removed. In computer graphics, the viewing frustum is formed by conceptually placing a pyramid such that its pointed tip is at the camera position, with the camera pointing down the middle of the pyramid, projecting the pyramid's four "walls" through the four sides of the screen and chopping off the front and back of the pyramid at the near and far clipping planes (explained below). The resulting viewing frustum represents the volume of the camera space that will be visible in the rendered scene. Although you can use many types of projections, which affect how the 3D models are projected from the camera space onto the screen, this book describes the most common type: perspective projection, which means that objects farther from the camera are made smaller than those close to the camera. Another type of transformation, which doesn't scale the size of the objects in the scene, is orthogonal projection. Although orthogonal projection is useful for some applications, you'll probably want to use the standard approach of scaling objects based on distance for most first-person-perspective games.

To perform perspective projection, the projection transformation converts the viewing frustum into a cuboid, a cubelike shape in which the dimensions are not all equal. Because the near end of a viewing frustum is smaller than the far end, closer objects appear larger than farther ones, generating the perspective effect in the scene.

Figure 5-5 illustrates the viewing frustum's components.

Figure 5-5 The viewing frustum

In Figure 5-5, you also see the front and back clipping planes represented. The clipping planes are used together to define what is visible to the viewer when the scene is rendered. The front clipping plane defines the closest distance an object can be to be included in the rendered scene, and the back clipping plane defines the farthest distance included. Any objects outside the frustum are not rendered. The near and far clipping planes are necessary because they're used to set the minimum and maximum values in the z-buffer. If there were no far plane, the renderer wouldn't know what value to map to the maximum z value. Keep in mind that the range you set for the far plane will affect the speed and visual quality of the scene. If you set the plane too close, you'll end up with popping, which is the perception that objects are jumping into view suddenly rather than transitioning in gradually from far out when they are small and thus less noticeable. If you set the plane too far away, you're making clipping ineffective because more objects will be rendered, causing the rendering to take longer.

The viewing frustum is described by the field of view (fov), which is the angle formed by the planes coming out of the camera and by the distances from the viewpoint to the front and back clipping planes. These distances are defined as the z coordinates of the front and back planes. The D variable is the distance from the camera to the front clipping plane (the origin of the space defined by the viewing transformation).

Figure 5-6 illustrates how the projection transformation converts the viewing frustum into a new coordinate space. Because we're using a perspective projection, the frustum becomes a cuboid. After projection is complete, the limits of the x dimension are -1 for the left plane and 1 for the right plane, the limits of the y dimension are -1 for the bottom plane and 1 for the top plane, and the limits of the z dimension are 0 for the front plane and 1 for the back plane.

click to view at full size.

Figure 5-6 Perspective projection

Clipping

At this point, clipping takes place. Clipping is the process of ensuring that objects that are completely outside of the viewing frustum don't get rendered, and those objects that intersect the viewing frustum are drawn in such a way that no pixels are drawn outside the rectangle specified by the viewport. Clipping is the only part of the T&L pipeline that needs information about the primitives that connect the vertices. For example, if one vertex of a triangle is outside the view frustum, the clipping code needs to determine the two points at which the edges of the triangle intersect the frustum, and it needs to break the triangle into two triangles because the outline of the clipped triangle now has four sides. So if primitive information is available to the T&L pipeline (as with a DrawPrimitive call), full clipping takes place. If the T&L pipeline is just transforming and lighting a vertex buffer (which has no associated primitive information), as with a ProcessVertices call, the pipeline just determines and records which vertices are outside each plane of the viewing frustum. This information is used in a later DrawPrimitive call to clip the primitives using that vertex buffer.

Dividing by w, or Nonhomogenization

At this point, Direct3D needs to convert the vertices from being homogeneous to nonhomogeneous. When you specify Direct3D vertices, you provide x, y, and z coordinates for each one. But in order to perform transformation and lighting, an additional coordinate, called w, is added and given an initial value of 1.0. As the vertex is passed through various matrices in the T&L pipeline, the w coordinate changes its value. A vertex with a w value other than 1.0 is called homogeneous. Once transformation is finished, the vertex is restored to nonhomogeneous form by dividing the x, y, and z coordinates by w. The reciprocal of the w coordinate is stored as well. This process is also known as "dividing by w." This conversion is necessary because the rasterizer expects to receive vertices in terms of their nonhomogeneous x, y, and z locations as well as the reciprocal-of-homogeneous-w (RHW).

Viewport Scaling

The final step of the T&L pipeline is to adjust the vertices to fit the viewport. The viewport lets you specify how to map the rendered image onto the render target surface. You can specify both a translation and a scaling operation. Generally, you want to fill the entire render target, so you specify no translation, and scale the vertex coordinates so that an x coordinate of -1 maps to the left edge, an x of 1 maps to the right edge, a y of -1 maps to the bottom edge, and a y of 1 maps to the top. You can also specify a scaling of the z coordinates if you want to render into a particular depth range. To use the full range, you set the viewport to use a minimum z of 0.0 and a maximum z of 1.0.