Chapter 13: Augmented Imagery for Digital Video Applications | Handbook of Video Databases: Design and Applications (Internet and Communications)

Charles B. Owen, Ji Zhou, Kwok Hung Tang, and Fan Xiao
Media and Entertainment Technologies Laboratory
Computer Science and Engineering Department,
Michigan State University
East Lansing, MI 48824
<cbowen@cse.msu.edu>

1. Introduction

Augmented Reality (AR) is the blending of computer-generated content with the perceived reality. Traditionally, virtual reality (VR) has existed to provide an alternative to reality. Virtual presentations provide users with a view of the world that may not exist for them elsewhere. However, a criticism of VR has been that it cannot hope to completely supplant reality given the incredible power of human senses. And, disconnecting the user from reality is not appropriate for a wide variety of applications. For this reason, researchers have begun to explore the vast range of possibilities between actual reality and virtual environments. Milgram and Kishino proposed a virtuality continuum, illustrated in Figure 13.1, to describe concepts beyond these simple ideas of reality and virtual reality [1, 2]. At one end of the scale lie real environments, the world with no computer augmentation. At the other end lies conventional virtual reality, the world replaced by the synthetic. Augmented reality dwells between the two extremes, typically closer to the real than the virtual, modifying the real world by adding virtual elements.

click to expand
Figure 13.1: The Milgram virtuality continuum (VC)

Augmented reality seeks to blend computer-generated content with reality so as to create a seamless whole, in which the virtual elements have spatial and visual attributes as if they were natural elements of the source. A variety of augmented reality technologies exist. Head mounted displays can augment the vision field and enhance vision. Processing methods can enhance the perception of sound. Haptics can be used to amplify the feeling of touch. This chapter discusses augmented imagery, the enhancement of digital imagery, with specific application to digital video.

Augmented imagery is the modification of images, either still images or video, through the addition of virtual computer-generated elements. Variations on augmented imagery are common in films, video games, TV shows and commercials. Televised coverage of football games now includes on-screen yellow lines indicating the first-down line. Clearly, the referees are not rushing out onto the field with cans of yellow spray paint each time a team achieves a first-down and, in fact, the line is not picked up by the camera at all. The video captured by the camera has been augmented with this new visual feature in real time. Augmented imagery is a powerful tool for film makers, or so-called "movie magicians." From Star Wars (1977) to the Lord of Rings series (2001), films have been relying on computer-generated imagery (CGI) technologies to create fantastic effects, imaginative figures and non-existent scenes that appear realistic and completely integrated.

Augmenting an image sequence or digital video is, in general, much more complex than simply summing the source images arithmetically. This chapter introduces the concepts, components, and applications of augmented imagery, describing how systems process the real and virtual elements to build a desired result.

Augmented imagery systems create imagery by combining 2D or 3D computer-generated elements with original images as illustrated in Figure 13.2. Given this basic system design, an augmented imagery system can be subdivided into three major components: modeling, registration and composition. Modeling is the mathematical description of content, both real and virtual. It is crucial that each environment can be described in a fixed frame of reference. A system adding first-down lines to a football field needs to know the dimensions of the field including any surface curvature. Modeling can include 2D and 3D representations. Registration is the determination of the relationship between the real and virtual models and the computation of appropriate transformations that align the two so as to coincide in the same frame of reference. For many 3D applications, registration consists of determining camera parameters for the virtual model that duplicate those physically realized in the real model. Some applications only require a 2D to 2D transformation that places content in the right location as tracked in the reality image.

click to expand
Figure 13.2: Architecture of an augmented imagery system

Composition is the blending of the real and virtual elements into a single output image. Composition can be as simple as overlaying the virtual elements or as complex as blue-screen matting or segmentation of the football field from the players.

The terms augmented imagery and augmented reality are often used interchangeably. The authors feel that augmented imagery is a sub-category of the more general augmented reality, though the term augmented reality is most used to refer to real-time applications involving head mounted displays or projection systems [2, 3]. Indeed, augmented imagery in real-time using a single camera source is effectively identical to monitor-based augmented reality [4].

Augmented imagery is a powerful tool for enhancing or repurposing content in video databases. Virtual sets can be blended with real actors to localize presentations, advertising can be added to existing content to enhance revenue or objectionable elements can be masked or hidden selectively for specific audiences. Advances in processing power and graphics subsystems make it possible for systems to perform many augmented imagery operations in real time on video feeds or while streaming content from video databases.

Research in the field of augmented imagery has been active for a considerable time, with early work primarily focusing on analog and digital composting technologies. Petro Vlahos was one of the early researchers on matting and compositing techniques, and is recognized for his many patents on "Electronic Compositing Systems" [5–12]. Survey papers from Blinn and Smith provide additional reading in this area [13, 14]. Kutulakos and Vallino describe mixing live video with computer-generated graphics objects using affine representations [15]. A number of books, such as The Art and Science of Digital Compositing by Ron Brinkmann [16] and Digital Compositing in Depth by Doug Kelly [17], give comprehensive descriptions and technical details of digital compositing for augmented imagery.

This chapter is structured as follows: Sections 2, 3 and 4 introduce the three major elements of an augmented imagery system: modeling, registration and composition. Section 5 explores techniques for visual realism, such as antialiasing and lighting consistency. Some example applications are described in detail at Section 6, and Section 7 contains conclusions.