2. The Application Context

The actual architecture of a system supporting video annotation and retrieval depends on the application context, and in particular on end users and their tasks. While all of these different application contexts demand for a reliable annotation of the video stream to effectively support selection of relevant video segments, it is evident that, for instance, service providers (e.g., broadcasters, editors) or consumers accessing a Video-On-Demand service have different needs [16].

In the field of supporting technologies for the editorial process, for both the old and new media, automatic annotation of video material opens the way to the economic exploitation of valuable assets. In particular, in the specific context of sports videos, two possible scenarios can be devised for the reuse of archived material within broadcasting companies: i) posterity logging, which is known as one key method of improving production quality by bringing added depth and historical context to recent events. Posterity logging is typically performed by librarians to make a detailed annotation of the video tapes, according to standard format; ii) production logging, where broadcasters use footage recorded few hours before, that may be even provided by a different broadcaster, and thus is not indexed, to annotate it in order to edit and produce a sports news program. Production logging is typically carried out live (or shortly after the event) by an assistant producer to select relevant shots to be edited into a magazine or news program that reports on sports highlights of the day (e.g., BBC's "Match of the day" or Eurosport's "Eurogoal"). An example of posterity logging is the reuse of shots that show the best actions of a famous athlete: they can be reused later to provide an historical context. An example of production logging is the reuse of highlights, such as soccer goals or tennis match points, to produce programs that contain the best sport actions of the day.

In both scenarios, video material, which typically originates "live," should be annotated automatically, as detailed manual annotation is mostly impractical. The level of annotation should be sufficient to enable simple text-based queries. The annotation process includes such activities as segmentation of the material into shots, grouping and classification of the shots into semantic categories (e.g., type of sport), supporting query formulation and retrieval of events that are significant to the particular sport.

In order to achieve an effective annotation, it is important to have a clear insight into the current practice and established standards in the domain of professional sports videos, particularly concerning the nature and structure of their content. Videos comprising the data set used in the experiments reported on in the following include a wide variety of typologies. The sport library department of a broadcaster may collect videos from other departments as well as other broadcasters, e.g., a broadcaster of the country that hosts the Olympic Games.

Videos differ from each other in terms of sports types (outdoor and indoor sports) and number of athletes (single or teams). Also, videos differ in terms of editing, as some of them represent so called live feeds of a single camera for a complete event, some include different feeds of a specific event edited into a single stream, and some others only feature highlights of minor sports assembled in a summary. Very few assumptions can be made on the presence of a spoken commentary or super-imposed text, as their availability depends on a number of factors, including technical facilities available on location, and on the agreements between the hosting broadcaster and the other broadcasters. As shown in Figure 5.1, the typical structure of a sports video includes sport sequences interleaved with studio scenes, possibly complemented with superimposed graphics (captions, logos, etc.).

click to expand
Figure 5.1: Typical sequence of shots in a sports video.