Composition is the blending of registered images to produce a final, augmented image. The television industry refers to this process as keying. A variety of methods exist for image composition, depending on the source of the real and virtual content. This section describes common composition methods.
Composing two image sources can be thought of as an image "mixing" process, wherein each pixel of the result image is a linear combination of corresponding source pixels, typically with an "alpha" weighting:
In this equation, i1 and i2 represent input images and o is the composited output image, α is an alpha map or matte image, a monochrome image with each pixel defined in the range [0,1]. Alpha maps are common in computer graphics applications. Alpha maps, called mattes in the entertainment industry, can be traced back to early photographic methods that combined images using black and white matte negatives. This process is equivalent to determining which portion of an overlay image is transparent, and which portion is opaque. For more detail on composition, see Blinn [13, 14], Brinkmann  and Kelly .
If the image being overlaid has a static position and size, a static fixed matte can be defined. A static matte is simply a fixed alpha map. Computer graphics systems such as OpenGL can directly generate alpha maps as an additional color plane for explicit determination of transparency. Systems can also generate depth maps, wherein each pixel indicates the distance to the lit pixel. Depth maps can be rescaled into alpha maps to allow blending based on depth.
Beyond applications of the simple static matte, the goal of compositing systems is the automatic generation of an alpha map. The most basic approach for alpha map generation is luma-keying, wherein the alpha map is linearly determined by the luminance of the source image. This is the basic equation for luma-keying:
In this equation, α is the computed map and i1 is the source image used to determine the alpha map, typically the foreground image. a1 is a parameter that controls the intensity of the alpha map. a1 works similarly to a threshold, though the range allows for smooth transitions between the foreground and background regions. It is a convention in alpha map generation that the map is clamped to the range [0,1].
It should be noted that most source-dependent compositing systems depend on human operators to adjust parameters such as the a1 value in this example. Smith and Blinn point out that automatic generation of alpha maps is a hard problem (particularly color-based matte generation), and human determination of parameters for best appearance is nearly always required . These parameters can be easily stored with the source video.
Given an image i1 and an image of just the elements of i1 to be removed (i0), a difference matte can be generated. As an example, i0 might be a fixed image of the background of a scene, while i1 is the same image with a foreground that needs to be extracted. This is a common problem in augmented imagery where existing content needs to be replaced, but may be subject to occlusion. The generated alpha map should have values of 1.0 for pixels in the occluded regions and 0.0 where the source image matches the background image. In the generated image, the source content in the occluded area (someone standing in front of a sign for example) should be used intact, while the content matching the background is replaced. Section 6.3 describes an example of this process in virtual advertising. The equation for a difference matte is:
The result is, of course, clamped to the range [0,1]. a1 and a2 are parameterizations of the matte generation process.
Difference mattes are difficult to use. The alignment between the source image and the background must be very exact. Dynamic differences in lighting can confuse the generation process. Most applications for difference mattes assume automatic registration.
One of the classic methods for combining two video images into one is the constant color matting method (also known as "blue-screen compositing" or "chroma-keying"). This technique is widely used in the film and television industry to generate special effects, where a foreground layer of video is filmed with a special constant color background (usually chroma-key blue or chroma-key green), and then composited with the new background video. On the upper layer, every pixel in the image within a range of predefined brightness level for one color channel is defined as transparent. Vlahos developed the original methods for constant color keying and has a wide variety of patents in the area [5–12]. Smith and Blinn analyze the Vlahos methods and describe problems with color-based matting and propose some solutions .
The basic Vlahos equation is:
B and G are the blue and green planes of the source image and a1 and a2 are parameterizations of the matte generation. Background pixels are indicated by high blue intensity relative to green. In most Vlahos-derived systems, parameters are assigned to physical knobs that are adjusted to achieve the best result.
A common problem in constant color matting is pixels in the foreground layer that match the color range being assigned to the background. This is sometimes evidenced by transparent regions in newscasters who have worn the wrong tie, for example. The Vlahos equation can be swapped to key on green rather than blue, though neither is a complete solution and color problems often exist. Objects on the upper layer need to be carefully selected so that their color does not conflict with the backing color.
Photographing foreground content in front of a blue screen can also result in blue tinges on the edges of the foreground image or blue shading of the image. This phenomenon is referred to as blue spill and is quite common and very annoying. A simple solution for blue spill is to replace the blue component of the foreground pixels with min(B(x, y), a2G(x, y)). Often this function is parameterized independently.
Most augmented imagery applications can be roughly classified as either adding foreground or background elements to an image. Applications that add foreground elements are often uncomplicated because the alpha map can be directly generated by the graphics system. Adding or replacing background content, or any content subject to potential occlusion by real objects, requires segmentation of the occluding content in an alpha map. Often an alpha map is built from multiple alpha maps generated by different stages of the augmentation process. Section 6 describes case studies of augmented reality applications, two of which have complex alpha map generation processes.
As an example, a virtual first-down line in a football game is composited with the real camera content using an alpha map generated by combining three computed alpha maps: a map generated by the graphics system, an inclusion map of color regions where the line is valid and an exclusion map of color regions where the line is not valid. Inclusion regions are determined by the colors of the grass field. Exclusion regions are determined by player uniform and other field colors that are close to the inclusion region and must be subtracted from the alpha map to ensure the line does not overwrite a player.