Chapter 6: A Generic Event Model and Sports Video Processing for Summarization and Model-Based Search | Handbook of Video Databases: Design and Applications (Internet and Communications)

Ahmet Ekin and A. Murat Tekalp
Department of Electrical and Computer Engineering
University of Rochester, Rochester, NY, 14627 USA
{ekin,tekalp}@ece.rochester.edu

1. Introduction

In the last decade, several technological developments enabled the creation, compression, and storage of large amounts of digital information, such as video, images, and audio. With increasing usage of Internet and wireless communications, ubiquitous consumption of such information poses several problems. We address two of those: efficient content-based indexing (description) of video information for fast search, and summarization of video in order to enable delivery of the essence of the content over low-bitrate channels. The description and summarization problems have also been the focus of recently completed ISO MPEG-7 standard [1], formally Multimedia Content Description Interface, which standardizes a set of descriptors and description schemes. However, MPEG-7 does not normatively specify media processing techniques to extract these descriptors or to summarize the content, which are the main goals of this chapter. We also propose a generic integrated semantic-syntactic event model for search applications, which we believe is more powerful than the MPEG-7 Semantic Description Scheme.

Although the proposed semantic-syntactic event model is generic, its automatic instantiation for generic video and automatic generation of summaries that capture the essence for generic video are not simple. Hence, for video analysis, we focus on sports video; in particular soccer video, since sports video appeals to large audiences and its distribution over various networks should contribute to quick adoption and widespread usage of multimedia services worldwide. The model should provide a mapping between the low-level video features and high-level events and objects, which can be met for certain objects and events in the domain of sports video, and the processing for summary generation should be automatic, and in real, or near real-time.

Sports video is consumed in various scenarios, where network bandwidth and processing time determine the depth of the analysis and description. To this effect, we classify four types of users:

TV user: The bandwidth is not a concern for the TV user. However, the user may not afford to watch a complete game due to either being away from home or having another game on TV at the same time. A user with software-enhanced set-top box and a personal digital recorder (PDR) can record a customized summary of a game. Furthermore, TV users with web access, which is one of the applications addressed by TV Anytime [2], may initiate searches on remote databases and retrieve customized summaries of past games through their TVs.
Mobile user: The primary concern for the mobile user is insufficient bandwidth. With the advent of the new 3G wireless standards, mobile users will have faster access. However, live streaming of a complete game may still be impractical or unaffordable. Instead, the user may prefer receiving summaries of essential events in the game in near real-time. The value of sports video drops significantly after a relatively short period of time [3]; hence summarization of sports video in near real-time is important. In this case, the service provider should perform the video summarization.
Web user: The web user may or may not share the same bandwidth concerns with the mobile user. A web user with a low bitrate connection is similar to the mobile user above. He/she may receive summaries of essential events that are computed on the server side. On the other hand, a web user with a high bitrate connection is similar to the TV user category. He/she may initiate searches on remote databases and retrieve either complete games or summaries of past games.
Professional user: Professional users include managers, players, and sports analysts. Their motive is to extract team and player statistics for developing game plans, or to assess player performance, or scouting. These users are interested in performing searches on past games based on low-level motion features, such as object trajectories or semantic labels. The time constraint for processing is not very critical, but the accuracy in descriptor extraction is important; hence, semi-automatic video analysis algorithms are more applicable.

The next section briefly explains related works in video modeling, description, and analysis. We present a generic semantic-syntactic model to describe sports video for search applications in Section 3. In Section 4, we introduce a sports video analysis framework for video summarization and instantiation of the proposed model. The summaries can be low-level summaries computed on the server (mobile user) or customized summaries computed at the user side (TV user). The model instantiation enables search applications (web user or professional user). Finally, in Section 5, we describe a graph-based querying framework for video search and browsing with the help of the proposed model.