HotStreams™ was originally designed for a desktop environment and was therefore designed to utilize high-resolution displays. The system was also targeted at customers of broadband networks expecting high quality content. The video content was encoded at high data rates (typically 2–5 mbps). The wireless computing and network technology was changing significantly as the project developed: Powerful PDAs came to market, wireless LANs and other "high" bandwidth wireless networks were installed extensively, and streaming media players became available on PDAs. As a result, wireless devices had become potential end-user devices for the system. The system, however, had to be modified to be able to manage the differences in network and device capabilities.
Any system serving a variety of devices connected to heterogeneous networks needs to find a way to deliver the content in a comprehendible form that will satisfy the end-user. This challenge can be approached in different ways - as illustrated in Figure 16.6. (The term document is used in a generic form in the figure and might, for instance, represent a complete video or multimedia production.)
Figure 16.6: Alternative Ways to Deliver Content in the Right Form
The simplest solution is to prepare different versions of the same document during the authoring phase where each version is targeting a given network and device configuration. The process of creating the right form of the content is then reduced to selecting the pre-authored version that best fits the given network and device configuration. This solution, however, puts an additional burden on the process and does give a recipe for how to handle combinations of network and device configurations that do not exist currently but that might be used in the future.
The second alternative, document adaptation, involves converting and possibly transcoding the contents of the original document to adapt it to the end-users current environment. Ma et al. [10] propose a framework for adaptive content delivery designed to handle different media types and a variety of user preferences and device capabilities. The framework combines several means, such as information abstraction, modality transform, data transcoding, etc., for content adaptation. Mohan et al. [11] propose a multi-modal, multi-resolution scheme called the InfoPyramid that uses transcoding to accomplish multimedia content adaptation. The adaptation may happen at the server or in adaptation proxies in the network, as Fox et al. [12] argues.
One advantage of the adaptation approach is that it stores only one version of the content and thereby keeps the storage requirements to a minimum. The other main benefit is that this approach provides an easy path to incorporate support for new clients. A major disadvantage is the computational resources required to implement a large-scale system that can convert many parallel streams on the fly. HotStreams™ and TEMA are taking the third approach shown in the figure: The document assembly alternative where a unique document is put together on the fly in response to each end-user request. The server will select the best-fit version of each multimedia object to be used in the document at the time of the assembly. The rest of this section discusses how network and device differences and mobility can be addressed in this case.
Displays come in a great variety ranging from small, black-and-white, text-only mobile phone displays to true-color, high-resolution computer displays. The screen size is one of the most challenging factors when adapting media-rich presentations. A large-screen interface usually cannot be mapped onto a small-screen device in a satisfactory way without major changes to the layout and without resizing the rendered objects. Most of the existing research has been working on solutions for single-frame interfaces. HotStreams™ and TEMA offer multi-frame interfaces as seen on Figure 16.1. Multi-frame interfaces such as these cannot be nicely displayed on the small screens that PDAs use. The interface needs to be divided over multiple small-screen windows that are displayed one at the time. Figure 16.7 illustrates how a document assembly process will use the meta-data (the model) to generate different application interfaces (views).
Figure 16.7: Device-dependent View of Interactive Content
The figure shows the model of a video consisting of sections that further consist of clips. Some clips contain hyperlinks that when activated can retrieve additional information related to the given clip. The desktop view shows the original HotStreams™ interface that defines three frames: the video frame located in the upper left corner, the electronic table of content located in the lower left corner, and the more information frame located on the right-hand side. The video frame contains the RealPlayer playing back an MPEG-1 version of the video.
The PDA view shown on the right-hand side of the figure illustrates how the interface has been changed from a single-page, multi-frame view to a multi-page single-frame view. The screen size of the PDA is too small for the simultaneous display of table of content and the video. The figure clearly illustrates that adaptation of multi-frame interfaces for interactive content is not only a page-by-page conversion. In some cases the workflow may need to be modified during the adaptation as well. The desktop workflow, for instance, generates and presents the table of content and the video content as one single task in the workflow. In the PDA workflow the user will go back and forth between the generation and display of the table of contents and the video playback.
High-quality video streams such as MPEG-1 and MPEG-2 require network and data transport bandwidths in the order of megabits per seconds and require specialized hardware or processing resources that are normally not available on PDAs and mobile devices. Some video file formats, such as the RealNetworks SureStream format [13], for instance, can store data for multiple bandwidths in one file. A video may be encoded in the SureStream format at 56 kbps, 64 kbps, 128 kbps, 256 kbps, and 750 kbps, for example, to accommodate different devices and network bandwidths. Such content could not, however, be delivered to a PDA or a desktop not capable of playing back SureStream encoded videos.
The systems developed at SCR followed a document assembly approach where a single video would be represented by a set of files - each file targeting a given network and device configuration. Figure 16.8 illustrates how the relationships between elements within the interactive video and the corresponding media files are represented in the system. The figure also shows that the systems support multiple streaming servers.
Figure 16.8: Multi-format Video Content Adaptation
Multiple client devices are supported in the meta-database by the presence of footage objects. Footage is an abstract representation of one specific video that will have one or more physical instantiations in the form of media files. Footage 1 in the figure, for instance, exists in four different files and in four different encoding formats, and can be played back at 7 different data rates. Clips of an interactive presentation are bound to the abstract footage object and not directly to the media files. New media files can then easily be added to the system whenever there is a need to support additional bit rates or encoding formats without changing any of the productions.
The difference in the devices' system and software capabilities is also a challenge for adapting rich, interactive content delivery. The HotStreams™ system, for instance, is generating personalized presentations in the form of SMIL and is utilizing the RealPlayer API to integrate the video player tightly into the web browser. No SMIL-enabled player was available on Microsoft Windows CE-based PDAs when the system was developed. The Microsoft Windows Media player, on the other hand, was available for several Microsoft Windows CE devices. The Windows Media Player defines an API similar to the one offered by the RealPlayer and is interpreting a scripting language similar to SMIL.
Figure 16.9 illustrates the adaptive HotStreams™ content personalization workflow. The first task, Content Personalization, retrieves content meta-data from the database and matches these with the end-user election and user profile. The personalized production is represented as a composite JAVA object and passed to the next step, Advertisement Insertion. In this step, the system retrieves video commercials according to the end-user location and content type and matches these against the end-user's interest profile and corresponding content clips and hyperlinks are inserted into the production. The production is then passed on to the final step, Presentation Script Generation. This step binds the footage objects to media files according to the device profile. Similarly, this step selects the best-suited image, text, and audio files to use if the presentation includes such content. Lastly, this step determines how hyperlinks are to be added to the presentation depending on the device capabilities. A SMIL enabled device may, for instance, have multiple hotspots located at different areas of the presentation simultaneously, while the Windows Media Player can only deal with one MOREINFO hyperlink bound to the BANNER.
Figure 16.9: Adaptive Personalization Workflow
The Presentation Script Generation step also determines in what scripting language the personalized production should be generated (SMIL or ASX). A SMIL script is generated if the end-user device can handle SMIL, and ASX if the end-user device has the windows media player installed. Figure 16.10 shows typical SMIL and ASX scripts generated by HotStreams™.
Figure 16.10: SMIL and ASX Scripts for Personalized Video Content
Knowing the client's capabilities is a prerequisite to offering content adaptation. The systems discussed in this chapter run on standard World Wide Web components and are consequently limited to base content adaptation on device capability data being exchanged within the web. CC/PP [16] was in its initial stage at the time of these projects. Hence, content adaptation was based on knowledge of the client's operating system: A client running Windows CE was receiving a one-frame/ASX version with videos encoded in Windows Media format while other clients would receive a multi-frame/SMIL version with videos encoded in MPEG-1 or RealMedia formats.
Section 5.1 discussed the challenges related to variations in device displays. Devices also differ in what kind of input devices they offer. This further affects the way in which the user may interact with the system. The keyboard and mouse, for instance, encourages a desktop user to be highly interactive, while the remote control interface provided by a set-top box suits better with a laidback interaction style where the user interacts less with the content. The stylus discourages the PDA user from typing in text. An adaptive system will need to take the differences in interaction style into account.
The original version of HotStreams™ could not accommodate nomadic users. A user might, for instance, follow an interactive training class in the office using her high-resolution/high bandwidth desktop. She might want to continue watching the interactive presentation on her small-screen/low bandwidth PDA on the bus back home. Finally, she might want to complete the class at home in front of her TV connected to a medium-resolution/high bandwidth set-top box.
Figure 16.11 shows the workflow for the scenario described earlier in which the end-user can carry along her personalized presentation from one location and device to another. The end-user is ordering her interactive training class using her desktop that is connected to a high bandwidth network. The desktop has a powerful processor that can decode MPEG-1 data in real time. Hence, the user can view the multi-frame/SMIL based interface to HotStreams™ in the office. The user may decide to stop the interactive video production when she is ready to leave the office. In the bus on her way home, she can connect to the system again - this time from her PDA. The PDA has lower bandwidths available and has Microsoft Media Player installed. Hence the system will generate a different presentation script. The user can now continue viewing her presentation but the content is delivered in the single-frame/ASX view this time. At home she would pick up the presentation from her TV/set-top box. This time the content could be delivered in a SMIL-based view optimized for TV with content delivered in high-quality MPEG-2, depending on the capabilities of her set-top box.
Figure 16.11: Carry-along Workflow
The first part of this section discussed techniques for handling differences in network and device capabilities and for supporting ubiquity. The remaining part discusses tools needed for offering such functionality in interactive video database applications.
Within the framework discussed in Section 4, content adaptation is the responsibility of the Presentation Script Generation step shown in Figure 16.4. At this point decisions are made regarding the layout (e.g., one-frame vs. multi-frame layout), media format and encoding, and script file format. Ubiquity, as discussed in Section 5.5, is supported by storing the profiling and selection parameters being used as input to the Content Selection step shown in Figure 16.4 in the database for later "pick up." A new script adapted to the current client is generated on the fly every time the end-user "picks up" the stored production.
A content assembly tool similar to the one described in Section 4.2.2 that allows the content manager to create multimedia presentations without worrying about network and device capabilities is of large value in serving heterogeneous users. There is, in addition, a need for tools to manage the content stored on the various media servers being used and to extract and manage meta-data describing the characteristics of individual media files. The management tool being used in HotStreams™ contains a Media File panel that the content manager can use to view the content on various media servers. In addition, daemon processes run on each media server to monitor the change in availability of various media files and report changes (such as addition and deletion of media files) to the application server.