Video Indexing and Key Framing Service Interface | Handbook of Video Databases: Design and Applications (Internet and Communications)

Regarding the implementation of our video processing service, we have implemented the video cut detection, media key framing and text-caption detection subsystems as a given service over the distributed environment. This service is developed and run on a PII personal computer that has 266 MHz CPU speed, 128 MB RAM and runs over a Windows NT 4.0 operating system. We integrated those subsystems within our agent-based testbed (please refer to [15] for more details about our agent-based infrastructure, components, security, policies and protocols of negotiations). It is used as a video web service for authorized mobile users of an enterprise over WWW.

Generally, the user can select a portion of the video file he/she is interested in. In addition, we handle different video formats such as AVI, MOV and MPEG file formats seamlessly. The service is structured upon modular components so that we could adopt new video analysis algorithms or handle new video formats with minimum changes. As an example of overall scenario conclusion, Figure 14.11 presents the output user interface to an end-user. The figures present an adapted result that corresponds to different circumstances as follows:

The service could render a normal video streaming result to the user given the availability of communication resources such as the link's bandwidth, CPU power, etc. Also, the system provides the complete stream only if the user has the device capability to browse video contents.

Otherwise, in the case of lack of resources or less device capabilities, the distributed architecture could choose after a negotiation process to furnish automatically only few key frames of the selected video segment to the user through some negotiations between the resources management, available service agents and device handling modules. The system supplies this summary in two defined quality levels.

The first option is to use color and high quality JPG image files. The second possibility is to provide only a gray scale version of these key frames with lower JPG quality. We select the key frame as the 3/4^th frame of selected recognized shots. We select this frame to represent the focus of the corresponding shot. In addition, we left 1/4^th of the shot because of the possibility of having a gradual transition video editing between two consecutive shots. The result report, delivered to the user, shows the processing time to select and extract these key frames along with the size of the original video segment and the total key frames size for possible corresponding billing procedures.

Various examples of this service output interface are shown in Figure 14.11. In this case, a video documentary about different tourist destinations in Egypt is used, that has no text-caption or audio content. Thus in this case, we utilize only the scene change feature using the binary penetration algorithm. Here, similarly for the same given request but within different environments (resources and device capabilities), in Figure 14.11-a, the system provides the whole stream of about 2.3 MB of video content. However, in the case of medium quality level in Figure 14.11-b, only 18 KB in total size of color and good quality key frames is transferred over the network. In the lowest quality level adopted by the system, Figure 14.11-c represents a total size of just 6.3 KB (of gray scale images and with lower image quality) of key information for the same request.