Although the rendering hardware is built to primarily draw 3D triangles, how you set them up and manipulate them has evolved into a rich system of algorithms using 3D mathematics and other techniques. This section is meant to be a broad overview of these 3D concepts. This material is considered the foundation for many of the techniques in later chapters but is in no way all-inclusive of the world of 3D graphics. See the resources for more information.
Everything in 3D graphics takes place in mathematical or virtual space.
A coordinate is a series of numbers that describes a location in that space.
The space used in most 3D graphics is called the 3D Cartesian coordinate system, after Ren Descartes (1596–1650). It uses a series of intersecting line segments to describe a location relative to the origin. The origin is the point in space where all coordinates in a coordinate system are 0. The intersection lines are orthogonal or perpendicular to each other.
By convention, the intersecting lines are named the x-, y-, and z-axis, and the standard order is to use a right-handed orientation (see Figure 12.4).
  
 
 Figure 12.4:  Coordinate triad and origin. 
Cartesian coordinates are written as an ordered set, often called x, y, z, corresponding to the coordinate axis. These coordinates can then be graphed or rendered, for example, (5.0, 4.0, –1.5).
A 3D model is composed of relational information and geometric information.
This information is usually stored in the form of polygons or triangles and vertices.
A polygon is a multisided, closed surface composed of vertices connected by closed, chained lines (see Figure 12.5).
  
 
 Figure 12.5:  Polygon. 
A vertex is where the coordinates of a polygon are actually stored. For mathematics this may be all a vertex needs, but for 3D graphics a vertex still has much more data associated with it, for example, color.
A triangle is simply the most basic form of a polygon with exactly three vertices.
A triangle is useful because it is always planar and convex, which can be important for lighting as well as for collision detection, as we will see in later chapters.
3D objects don’t have to be made of only triangles, but they are often triangulated, or converted into triangles.
Another important thing to note about polygons and triangles is the winding order. Winding order determines what is the front and what is the back of a polygon. OpenGLs default winding order is counterclockwise, that is, when looking at a polygon, the front will be facing the viewer when the vertex order is counterclockwise. The winding order can be changed when rendering, but typically it is left as counterclockwise. Counterclockwise winding order will be assumed from now on unless otherwise specified (see Figure 12.6).
  
 
 Figure 12.6:  Polygon winding order. 
3D objects are rendered from meshes of triangles. Often a geometric object can be thought of as a set of coordinates or points that have a common origin and a set of triangles made with those coordinates. Because they share a common origin they can also be thought of as translation vectors from that common origin.
Current 3D hardware and OpenGL can process this geometric data for rendering if it is properly formatted.
A note on 3D object or content creation: creating 3D objects in code can be very challenging and sometimes disappointing. Some objects, such as terrain, lend themselves to programmatic generation. Others, such as fantasy characters, do not. In addition, just as artists don’t tend to make great programmers, programmers don’t tend to make great artists. In any case, sophisticated software applications have been created over the years specifically for assisting in 3D content creation. Maya from Alias and 3d studio max from discreet are two of the most popular commercial packages, each costing thousands of dollars. Shareware and freeware 3D-modeling software is also available, such as Milkshape and Blender. Any professional-level 3D game development will be using some form of 3D content-creation software with skilled artists creating the bulk of the content. The programmers’ job at a minimum will be to give that content life, by first loading the content into the game engine, and second, giving it motion appropriate for the games. Building content loaders and animating the content will be examined in later chapters.
To create a change or “motion” in a 3D object or a coordinate system, we use a transformation. A simple definition of a transformation is any operation that uniformly changes the coordinates of a piece of geometry. By uniformly, we mean that the same operation is done to each coordinate and, thus, the overall shape is preserved.
In 3D graphics, the transformations are typically stored as 3D matrices, but often, due to wrappers and tool APIs, the matrices are not processed directly by the code but are abstracted by some form of a transformation class. Java3D does this, as do many other 3D APIs and tools.
There are three major types of transformations in a 3D graphics system as shown in Figure 12.7, but many others can be applied:
  
 
 Figure 12.7:  Example object transformations.  
Translate: Moving in the x-, y-, and z-direction
Rotate: Rotating around the x-, y-, and z-axes
Scale: Scaling in the x-, y-, and z-direction
A 3D matrix multiplied to a vector or coordinate applies that matrix to the vector or transforms that vector by the matrix. For example, if the matrix represents a 45-degree y-axis rotation, multiplying it times the vector will return the transformed vector rotated 45 degrees around the y-axis.
Transformation matrices can be associated with 3D objects, and then the renderer can apply the transformation to the 3D object for rendering when processing the graphics pipeline.
Derived from linear algebra, performing the matrix multiple on a single vertex involves 16 multiply and 12 add operations. That is quite a few operations in a single mathematical operation, but when this is multiplied by the thousands of vertices that may need to be transformed in a single frame, it really starts to add up. Luckily, this is yet another one of the operations that the 3D hardware has been designed to handle.
Associating a 3D object with a single transform matrix for transforming its vertices works well for simple objects. However, more complex objects such as characters and vehicles that are made up of more than one object often need transform hierarchies, such as in Figure 12.8
  
 
 Figure 12.8:  Transformation hierarchies. 
This situation leads to much more complex scene representations and rendering pipelines, but often this hierarchical representation actually makes managing the scene simpler. Hierarchical and graph-based scene representations are often called scene graphs and will be discussed further in Chapter 15, “3D Render Using JOGL.”
The final render of our 3D object is a 2D image, but so far we have been learning about 3D data. To get from the 3D virtual space to the 2D screen display involves two more important concepts. One is the camera, and the other is projection.
The idea of a camera is that the developer can place a camera virtually in the 3D space and have the system render from that point of view. However, a camera is really an illusion in which the world’s objects are really inversely transformed by a camera transform. In OpenGL, there is not a real camera but a ModelView matrix that is used to create the effect of a camera in the world. To make the controls easier to use, setting up a camera object is one of the first things a 3D game engine provides on top of OpenGL. Manipulating the camera is a critical part of 3D games, and each game has its own needs for what the camera should do. Often this is one of the most difficult parts of a final game to get right. Anyone who has played 3D games knows camera motion can make or break a game.
A projection is performed on the 3D polygons to make 2D polygons. This process is done using a projection matrix that the rendering system creates based on the state of the camera and other factors, such as the field of view. In general, projections transform points in a coordinate system of dimension n into points in a coordinate system of dimension less than n. The good news is that the 3D hardware handles this fairly automatically, so rarely nothing more than the initial setup is needed.
Lighting is one of the most important elements of the visual appeal of a 3D game and is currently one of the hottest topics in real-time 3D graphics. Simulating real light accurately is a difficult problem. Although real light is made up of photons, the fundamental particles of light, trillions of photons interact in any simple lit surface. Light also has special behavior characteristics such as refraction, reflection, and obstruction that can further complicate the render. Because accurately simulating light is far too computationally complex, lighting models have been developed to create the illusion of illuminating the surfaces of 3D models. These models are approximations of the real effect of light but are often good enough to fool the human eye.
One model, ray tracing, is often used in computer animation. Ray tracing uses the computer to create a realistic graphic image by calculating the paths taken by rays of light hitting objects from various angles, creating shading, reflections, and shadows, which give the image a convincing look. Figure 12.9 shows an image produced by ray tracing.
  
 
 Figure 12.9:  Ray tracing producing accurate shading, reflections, and shadows. 
Unfortunately, ray tracing is computationally intensive, too much for real-time rendering currently. Therefore, other models have been developed for real time that are faster but less accurate.
Real-time lighting can be split into two categories: static and dynamic.
Static shading is when the shading effect is meant to be permanently colored into the 3D object itself and does not change at runtime. This technique is often used on world models and typically relies on the 3D content-creation software to create the effect.
Dynamic shading is computed during the rendering of each frame and is used on moving 3D objects and lights. Dynamic shading uses 3D vector math and incorporates ambient illumination as well as diffuse and specular reflection of directional lighting.
The basic shading function for static or dynamic is a modulation of the polygon surface color based on its angle to a light direction. These parameters are expressed as vectors and vector operations. After a short vector refresher, the shading function will be examined in greater detail.
Vector: A quantity that has magnitude and direction is known as a vector. For example, force, displacement, velocity, and acceleration are vectors. A vector can be represented by a directed line segment in space. Usually vectors are denoted by a letter with an arrow over it, such as A, a, x. In the component form it is represented by a series of numbers, just like a coordinate, x, y, z. See Figure 12.10.
  
 
 Figure 12.10:  Vector types. 
Unit Vector: A vector with a magnitude or length of 1.0. A unit vector is said to be normalized and can be created by the normalization vector operation, which is found by dividing each of the vector components by the original vector’s magnitude.
Ray: A translated half-line. We call x the root, and we say that the ray is rooted at x. Another way to think of this is to imagine a ray as a vector with an infinite length. Because we often mean to use a ray as a representation of direction, a unit vector in the same direction as the ray is substituted in computer algorithms.
Normal: A vector that represents the orientation of a plane or surface. The normal of a surface is perpendicular to that surface. Often a normal is also normalized.
The Scalar Multiply: The following equation computes a vector A by S:
Ax' = Ax * s, Ay' = Ay*s, Az' = Az *s.
The Dot Product: The following equate computes the two vectors A and B:
Computing the shading function for real-time 3D graphics is a feature built in to 3D hardware and would not need to be reimplemented by a game developer, but it is still useful to understand the process because it directly affects how 3D models are made.
First, the dot product of the direction to the light in the form of a unit vector and the surface normal is computed. The resulting scalar value will be between –1.0 and 1.0.
Next, the value is clamped to the range 0.0 to 1.0 and now represents the intensity of the light for that surface. The surface color is multiplied or scaled with that intensity. Because the intensity is between 0.0 and 1.0, the color will become darker accordingly. See Figure 12.11
  
 
 Figure 12.11:  Simple shading function.  
Finally, the given triangle is then rendered with the resulting color for each surface.
The shading function is one of the simplest vector operations done in 3D graphics, but because it is done so many times in a typical scene, the hardware needed to be built to accelerate its computation.
There are two main types of hardware accelerated shading. Both compute shading using the shading function, with the difference being what normals they use and how the shading is applied to the surface.
The simplest shading is called Lambert or flat shading. The shading function is calculated and applied for each polygon surface of a given object. Each polygon is uniformly colored based on its single-surface normal and the direction to the light.
Flat shading is simple and quick but gives a faceted appearance. See Figure 12.12.
  
 
 Figure 12.12:  Shading types comparison. 
Gouraud developed the technique to smoothly interpolate the illumination across polygons to create shading that is smooth and continuous, even for triangle-based objects that are not actually smooth and continuous.
A vertex normal must be added to each vertex in a 3D object. This new normal at the vertices can be calculated by averaging the adjoining face normals. The vertex intensity is then calculated with that new normal and the shading function. The intensity is interpolated across the polygon based on the vertices just like vertex color.
OpenGL and modern 3D hardware support both shading models. Either may be chosen, but smooth shading is usually more pleasing and can be more realistic. Smooth shading is usually used unless flat shading is needed to represent a 3D object more accurately. Remember, smooth shading will not work correctly unless the appropriate vertex normals for the 3D object are specified. It is a common mistake to have incorrect normals for a code-produced model. Quality 3D modeling software will usually create the proper normals for shading.
There are three types of hardware-supported lights (the effects are shown in Figure 12.13), with many names:
  
 
 Figure 12.13:  Light types comparison. 
Ambient light comes from many directions, as a result of multiple reflections and emission from many sources. The resulting surface illumination is uniform.
Directional light comes from a source that is located an infinite distance away. Thus, directional light consists of parallel rays coming from the same direction. The intensity of directional light doesn’t diminish with distance, so identically oriented objects of the same type will be illuminated in exactly the same way.
Position lights originate from some specific location. The light rays that emanate from the source are not parallel, and they typically can diminish in intensity with distance from the source. Identically oriented objects of the same type will be illuminated differently, depending on their position relative to the light source.
When first working with real-time 3D graphics, one disappointing realization happens. Because the lighting and shading models are local, shadows are not automatically rendered as with ray tracing.
Hardware-based real-time lights have no complete shadowing or light-occluding features built in. The 3D objects are lit regardless of other objects blocking the light, because each triangle’s shading is computed independently of any other triangles in the scene. Taking into account all of a scene’s geometry for each shaded triangle would be prohibitively expensive on current graphics hardware, besides the fact that the graphics hardware doesn’t necessarily have access to all of the scene’s geometry. However, shadow effects do exist in many 3D games, but they are additional effects added to the scene and not completely supported in 3D hardware. There are several ways to create the effects of shadows, which is usually the combination of a 3D hardware feature and a clever 3D engine. Shadow effects are currently a developing topic in real-time 3D graphics, but classic techniques include simple textured polygons, projected geometry, and more advanced techniques, such as using a hardware feature like the stencil buffer and the depth buffer.
Alpha blending is the technique for creating transparent objects. It works by blending an object as it is rasterizing with whatever has been rendered to the frame already. That way the foreground object appears see-through and the background object appears behind the foreground object (see Figure 12.14). Proper handle transparency is an important, but an often difficult to get correct, part of 3D engines. Transparency will be examined in more detail in Chapter 15, “3D Render Using JOGL.”
  
 
 Figure 12.14:  Opaque and transparent objects. 
A texture or texture map is an image (such as a bitmap or a GIF) used to add complex patterns to the surfaces of objects in a 3D scene without adding any geometry. Textures enable developers to add much greater realism to their models. Applying a texture to the surface of 3D models can better create the look of walls, sky, skin, facial features, and so on (see Figure 12.15).
  
 
 Figure 12.15:  Girl geometry with texture. 
Another way to consider how textures can improve a 3D object is in the way that textures can lower a model’s polygon count. Often a model’s shape can be simplified to contain fewer polygons when an appropriate texture is applied to that model. The model will still look as good (probably better) than the original, untextured higher-polygon model.
Because the texture adds so much more detail to the surfaces without costing more polygons, texturing is heavily used in 3D games, to the point where nearly every triangle that makes up a typical game scene has one or more textures rendered on it.
Texture mapping is the process of applying a texture to the surface of 3D models. Because real-time 3D models are made of vertices, the texture will be mapped to that part of the model. Therefore, to map a texture to the vertices of the 3D object, each one must have extra data added that contains the mapping at that vertex.
In a 3D object, each vertex already has a set of coordinates that represents the vertex’s location in 3D Cartesian coordinate space. To support texture mapping, each vertex can also have texture coordinates. Texture coordinates are 2D image coordinates in texture space stored in the associated vertex that correspond to the part of the texture meant to be rendered at that vertex in the final 3D render. These coordinates are often called U-Vs, and sometimes S-Ts.
Additionally, the term texture mapping can mean the ability of a particular piece of graphics hardware to support accelerated rendering of texture objects. We would say that XYZ’s latest graphics card has texture mapping.
One way to apply a texture is to program or compute a 3D object’s U-Vs. However, as was discussed earlier in this chapter, 3D modeling software is the preferred process for most objects. 3D content-creation software allows artists to apply a texture to a face and then set corresponding U-V coordinates in each vertex on the face using many different tools. Figure 12.16 shows an example of mapping between vertices and UV coordinates.
  
 
 Figure 12.16:  Texture map and 3D object showing corresponding image U-Vs and object vertices.  
Textures can also have alpha, just like untextured geometry. However, the alpha textures contain the transparency information per texel. This arrangement increases the detail possible for certain types of models that can use this effect, such as fences, grading, trees, and plants, as shown in Figure 12.17.
  
 
 Figure 12.17:  Textured plant with regular and alpha texture maps (outlined). 
As mentioned earlier, images are made up of pixels, whereas textures are made of texels. The names are just shortened forms for their full names—a texel is a texture element, and a pixel is a picture element.
They are essentially the same thing in computer memory. Both represent one unit of an image, and both have red, green, and blue components. However, in the final render they aren’t always the same size on screen, which creates some render problems. We can see why this happens by further examining the process of rendering textures.
The simplest method of texture map rendering is known as point sampling. For each final pixel on screen from a polygon, a simple and fast lookup is performed for the corresponding single texel in the texture image, based on the polygon’s vertices U-V coordinates, treating them as a point falling in texture space. Whichever texel the U-V point falls in will be the returned color.
Unfortunately, this simple method can lead to problems when the screen pixels and the final rendered texture texels don’t match up closely. When zooming in or out on a texture, the number of pixels on screen representing each texel of a texture changes, depending on how the final object is scaled in screen space. However, the number of texels in the texture does not change. Because of this, a simple lookup does not yield the best results when the texels don’t match closely to the pixels on screen and unwanted texturing artifacts can appear in the final render.
Texturing artifacts—or any kind of graphics artifacts, for that matter—are simply rendering errors that distort or otherwise affect the final image in an unwanted way. For texturing, there are two main types of artifacts: pixelization and texel swimming.
Pixelization or blockiness happens when the texture on screen is rendered with more pixels in the final display area than texels in the original texture. Texel swimming and Moir patterns happen in the opposite case, that is, when the texture is rendered in fewer pixels in the final display than texels in the original texture. Pixelization is fairly obvious in a single frame, as shown in Figure 12.18, but texel swimming, shown in Figure 12.19, is largely a dynamic effect, because the change from frame to frame creates a distracting moving pattern in the textured triangle.
  
 
 Figure 12.18:  Texel pixelization. 
  
 
 Figure 12.19:  Texel swimming. 
Again, hardware solutions in the form of texture filters come to the rescue and help reduce these artifacts. Texture filters are added processing done to the texture data at the time of rendering to alter the final render, usually in the effort to improve it but sometimes for special effects as well.
For these two texture artifacts, there are two texture filter techniques.
The first texture filter is called magmapping, and it is used to remove pixelization. Magmapping blends pixels together by interpolating between texel colors when one texel maps to many pixels as seen in Figure 12.20. The effect is usually an improvement, but sometimes, if the texture is too low resolution or if the camera view can move too close to the texture triangle, the blurring effect of magmapping is just as undesirable. In these cases, higher-resolution textures, more frequent texture repeating, or camera-motion limits will need to be made in addition.
  
 
 Figure 12.20:  Magmapping texture filtering. 
The texel swimming effects can be reduced completely with the texture filter technique called MIPmapping. The MIP in MIPmapping comes from the Latin Multum In Parvo, meaning a multitude in a small space. MIPmapping attempts to reduce texture swimming by switching between preblended, lower-resolution textures, in which many texels map to one pixel.
The problem is that to accurately render each pixel when many texels map to one pixel, the renderer must combine the contribution from all the texels that fit in the pixel. Using only one texel lookup is what produces the texel swimming effect. However, accurately finding all the texels that fall in a single pixel’s space for each and every final image pixel is too much work to do on current graphics hardware.
MIPmapping reduces the task of correcting texel swimming by pre-generating and storing multiple versions of the original texture image, each with lower and lower resolution (thus effectively larger texels for the same U-V set). When a pixel is to be textured, the version of the texture image or layer with the right-size texels relative to the final onscreen pixel size is selected. Figure 12.21 shows a render with and without MIPmapping.
  
 
 Figure 12.21:  MIPmapping texture technique removes texel swimming. 
To improve this process even further, several texels can be averaged together, similar to magmapping. The three most frequently accelerated options follow:
Linear MIPmapping: Averages the texels from the nearest two texture layers; one from the larger, and one from the smaller
Bilinear MIPmapping: Averages the four texels on the nearest texture layer
Trilinear MIPmapping: Averages all eight texels from the nearest two texture layers each; four from the larger, and four from the smaller
The averaging that’s done usually uses the same interpolation used in magmapping.
MIPmapping can greatly improve the final image by preventing texel swimming, but it comes at a cost of one-third more texture memory usage. Figure 12.22 shows how this can be visualized.
  
 
 Figure 12.22:  MIPmapping requires one-third more texture memory. 
New texturing techniques for image correction are always available, for example Anisotropic MIPmapping and even new pixel shaders, which further improve upon these basic texture filters techniques.
Because game textures are using hardware acceleration and not software rendering, the texel color formats and texture size are limited to those specifically supported by the hardware. The texel color formats have a fairly large range and using them doesn’t affect texture creation adversely. However, texture size requirements are quite limiting and must be followed for the best results. Specifically, texture made for real-time hardware must be sized to a power of 2—1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, and sometimes larger counts. Note this is not multiples of 2. (That would allow many more options than are possible with the power of 2 sizes.) For the most part, this is an industrywide standard, and violations can cause all sorts of problems to occur, from MIPmapping problems to rendering slowdown to even crashes. There does exist some specialized extension on the latest graphics cards for non-power-of-2 sizes, but that is still fairly rare.
Because graphics memory is limited and textures tend to use large amounts of memory, techniques for conserving texture memory can be important for getting the most out of 3D hardware. Briefly, a few of the most popular and useful techniques follow:
Reusing textures: The most obvious and used technique because it is the easiest to implement. Unfortunately, it can lead to monotonous-looking models, so it must be used wisely.
Shrinking textures: A simple process of reducing a texture to the next smaller power of 2 size. The tradeoff is less-detailed textures.
Lowering color bit depth: You can often get back lots of memory with almost no visual cost. Use 8-bit or 16-bit instead of full 24-bit color images, where appropriate.
Crop textures closely: This process can help you avoid wasting texels. When making a texture and applying it to a model, care must be taken to be sure that as much of the texture is used as possible, with as few unseen texels as possible.
Collaging: The technique of combining many different textures onto a single image (see Figure 12.23). This texturing technique is used mostly in characters and vehicles but can be applied to just about any kind of object.
  
 
 Figure 12.23:  Example of texture collaging. 
Proper texture application: Further extends reuse. Applying nature textures such as rock patterns, slightly tilted (see Figure 12.24), as opposed to horizontally aligned textures tricks the eye somewhat and hides the fact that the texture is repeating more often than it appears.
  
 
 Figure 12.24:  Aligned versus tilted texture. 
Texture compression: Textures are stored in a compressed format in memory to conserve memory. This is now directly supported in 3D hardware. Texture compression allows games to have greater effective texture memory by making more efficient use of the available texture storage. In addition, texture compression maximizes AGP transfer performance because each texture can be smaller in size, minimizing the bandwidth impact, which can make games run faster as well as allow more textures to be used in a scene.
Texturing is another area of great activity in real-time 3D graphics, and many amazing effects can be created by manipulating textures. Some involve animating object’s U-Vs to create motion, simulated reflection effects, and shadows, as well as other shading effects such as cel-shading, bump mapping, and many, many others, with new ones being developed all the time (see Figure 12.25).
  
 
 Figure 12.25:  Monk character rendered three ways: cel-shaded, environment mapped, and bump mapped. 
