The specification and detection mechanism introduced in the previous section constitutes the syntactic portion of a video organisation system. The semantic part depends on the particular context in which the summarisation is required. The crucial difference between the syntactic and the semantic specification is that, in the syntactic specification, one is only interested in isolating and categorising occurrences within a particular medium (in the present case: video); in the semantics specification, one is interested in describing as precisely as possible a whole domain and the role played by events in relation to the other elements of the domain.
Figure 22.6 shows a diagram of a possible domain specification. Events are created through the satisfaction of a generating condition, which is syntactic in nature. A number of necessary conditions can be specified, together with time relations with the generating conditions, so that an event will actually be instantiated only when the generating condition, as well as all necessary conditions will be satisfied. Notice that the necessary conditions are not necessarily drawn from the same medium as the generating condition, nor are they required to be events. To make a simple example, an event like "people near the Picasso painting" is a syntactic event that will generate the event "possible theft threat" only if the condition "night-time" is verified. The condition "night-time" per se is not, obviously, an event.
Figure 22.6: Diagram of domain specification.
Users of the system are called actors in the diagram, and they participate in certain roles. The domain specification will contain additional information about actors and their role, such as relations between different roles, or additional information about actors, which is irrelevant in the present context. Events become such only through association to a role (role events), which incorporate them into the activities and requirements of that role. Role events can specify additional conditions that an event must satisfy in order to be recognised and additional information that needs to be aggregated around that event.
The general concept of role event goes, obviously, beyond the organisation of video: in this scenario role events are focal points for the aggregation of information that can be contained in a variety of media and derived from a variety of sources.
The conditions that lead to the instantiation of an event are, in this case, simpler since in general it is convenient to implement the more complex event composition rules in the syntactic layer.
The semantic level is, however, the ideal location for more "fuzzy" specifications of the time relations between events. For instance, the specification that event E2 should take place "shortly after" event E1 should be made into this semantic layer and translated into a suitable time bound for the syntactic layer.
In order to exemplify the matter, I will assume here that all the syntactic events, once detected, are stored in a database that the semantic layer can query. In the example above, then, part of the role of the gift shop manager could be specified as follows:
<role name="gift-shop-manager"/> <event_track> <event type="purchase" name="p"/> <event type="cash-register-transaction" name="r"/> <event type="painting-visit" name="v"> </event-track> <event name="big-buy/> <detection> r.amount > $100 and coincide(p.time, t.time) </detection> <associations> <association name="painting-visits" type="video"> <detection> select t.video_link from VideoEvents t where t.type = "visit" and t.id = p.id </detection> <association> </associations> </role>
The detection rule looks for purchase videos that coincide with high value transactions (the predicate "coincide" matches two times within a certain error). One association of this event is shown: it looks for all the video fragments of the visit that the same person who bought the souvenirs made to the gallery.
I am using a notation based on XML and interspersed with some SQL query only because these two formalisms are very well known and, I believe, self-explanatory. They are by no means the only or even the best choice.