This section details the effectiveness of the XForms components in realizing effective multimodal interaction conforming to the user interaction principles described in Section 10.2. The XForms model is ideally suited to hold the interaction state that is to be shared among the different interaction modalities (see Section 10.3.1). Abstract XForms user interface controls are well suited to creating user interfaces that are later mapped to different devices (see Section 10.3.2). The XForms user interface constructs encourage intent-based authoring, and the resulting interfaces can be easily refactored for delivery to clients with a variety of different display form factors (see Section 10.3.3). Finally, XForms' use of XML Events makes it ideally suited for attaching event handlers that respond to user input received from different modalities such as speech (see Section 10.3.4). 10.3.1 One Model to Bind Them AllThe XForms model holds instance data and associated constraints. The XForms binding mechanism enables multiple views to bind to this model. A key consequence of this architecture is that multiple views that bind to the single model are automatically synchronized. Thus, in a multimodal interface that integrates visual and spoken interaction, user input that arrives via either interaction modality is placed in the model and becomes immediately available to all available modalities. The ability to hold more than one Since the effect of user interaction in the different modalities is ultimately reflected in the XForms model, higher level software components such as interaction managers that work on integrating across user input received from various modalities to infer user intent can use the XForms model as the central repository that tracks the needed information. As an example, the ( x , y ) coordinates specified via a pointing device and the spoken utterance
by themselves do not fully express the user's intent. However, when the result of both of these interaction gestures is stored in a common model along with the necessary time-stamping information, these utterances can be integrated to derive the expressed user intent. 10.3.2 Abstract Controls Enable Flexible Access Abstract XForms user interface controls when combined with the type information available from the XForms model can be turned into rich user interface controls that are appropriate for the connecting device and interaction modality. We demonstrated this with an example of control XForms user interface controls encapsulate all relevant metadata needed for interaction with a control. A key consequence is that these controls can be turned into appropriate spoken dialogs. As an example, the design of XForms control
10.3.3 XForms UI Creates Synchronized Views XForms user interface constructs are designed to capture the underlying intent of the interaction, rather than a specific manifestation of that interface. Aggregation constructs like We demonstrated an insurance application form in Figure 5.4 and Figure 5.5 that used model-based switching to enable portions of the interface conditionally. This application can be extended to support rich multimodal interaction by allowing the user to answer a set of "yes" or "no" questions that populate fields corresponding to the user's habits. As the user answers these questions using speech input, the XForms model gets updated, and this automatically updates the visual interface to hide portions of the insurance form that are not relevant . Finally, user commands that are available as Finally, notice that just as the XForms user interface can adapt itself based on previous user input, it can also adapt itself based on changes in the user environment. This flexibility comes from the use of XML Events and the ability to dispatch such events into the XForms model or user interface as appropriate. As an example, a smart multimodal device might represent the current user environment in an XForms model and update it accordingly as it detects changes in the environment, for example, increased noise level that makes speech interaction impossible . The device can arrange for appropriate event handlers that dispatch information about the updated device state to various XForms applications running on the device. These applications can, in turn , handle these events by appropriately updating the user interface, for example, by displaying a visual prompt indicating that spoken interaction is unavailable or by turning off the microphone to avoid spurious speech recognition results. 10.3.4 XML Events Enable Rich BehaviorThe use of XML Events enables XForms constructs to be extended with rich behavior. One such example is given by the ability to attach voice dialogs as event handlers at various points in the XForms user interface. Voice dialogs are typically authored in Voice Extensible Markup Language (VoiceXML [1] ). Such dialogs can be attached to the various XForms user interface controls and aggregation constructs to create Web interfaces that speak and listen . This form of multimodal interaction is presently an area of intense activity in the W3C, and these standards are still evolving. For one example of how declarative voice handlers can be integrated using XML Events to produce multimodal interfaces, see XHTML+Voice (X+V [2] ).
The use of XML Events enables the creation of consistent, predictable user interfaces. XForms applications can attach default event handlers for most common cases; the XForms processing model exposes a set of standard events that can be hooked to attach specific behaviors at various points during XForms processing as described in Section 8.3 and Section 8.4. This design encourages the authoring of consistent user interaction when creating XForms applications. The definition of a set of abstract XForms event types also enables a given platform or device to map platform-specific events to these generic XForms events. As an example, a desktop interface might map the enter key to event DOMActivate ; a mobile phone might map the * key to this same event. XForms applications that use this event consistently for attaching specific behaviors will exhibit a predictable interface when deployed to both a desktop client as well as a mobile phone. This architecture enables the creation of Web applications that react as users expect on a given device and can be a key determining factor in the overall usability of an application. ![]() |