Among the many sports types, soccer is for sure one of the most relevant and worldwide diffused. In the following sections we report on our experience in the classification of soccer highlights, using an approach based on temporal logic models. The method has been tested using several soccer videos containing a wide range of different video editing and camera motion styles, as produced by several different international broadcasters. Considering a variety of styles is of paramount importance in this field, as otherwise the system lacks robustness. In fact, videos produced by different directors display different styles in the length of the shots, in the number of cameras, in the editing effects.
We review hereafter previous work related to soccer videos. The work presented in  is limited to detection and tracking of both the ball and the players; the authors do not attempt to identify highlights. In , the authors rely on the fact that the playing field is always green for the purpose of extracting it. Successive detection of ball and players is limited to the field, described by a binary mask. To determine position of moving objects (ball and players) within the field, the central circle is first located, and a four-point homographic planar transformation is then performed, to map image points to the model of the playing field. Whenever the central circle is not present in the current frame, a mosaic image is used to extend the search context. In this latter case, the mosaicing transformation is combined with the homographic transformation. This appears to be a fairly expensive approach. In  a hierarchical E-R model that captures domain knowledge of soccer has been proposed. This scheme organizes basic actions as well as complex events (both observed and interpreted), and uses a set of (nested) rules to tell whether a certain event takes place or not. The system relies on 3D data of position of players and ball, which are obtained from either microwave sensors or multiple video cameras. Despite the authors' claim that, unlike other systems, their own works on an exhaustive set of events, only little evidence of this is provided, as only a basic action (deflection) and a complex event (save) are discussed. In  has been proposed the usage of panoramic (mosaic) images to present soccer highlights: moving objects and the ball are super-imposed on a background image featuring the playing field. Ball, players and goal posts are detected. However, despite the title, only presentation of highlights is addressed, and no semantic analysis of relevant events is carried out.