10.3 Creating Multimodal Interfaces Using XForms | XForms. XML Powered Web Forms with CD

This section details the effectiveness of the XForms components in realizing effective multimodal interaction conforming to the user interaction principles described in Section 10.2. The XForms model is ideally suited to hold the interaction state that is to be shared among the different interaction modalities (see Section 10.3.1). Abstract XForms user interface controls are well suited to creating user interfaces that are later mapped to different devices (see Section 10.3.2). The XForms user interface constructs encourage intent-based authoring, and the resulting interfaces can be easily refactored for delivery to clients with a variety of different display form factors (see Section 10.3.3). Finally, XForms' use of XML Events makes it ideally suited for attaching event handlers that respond to user input received from different modalities such as speech (see Section 10.3.4).

10.3.1 One Model to Bind Them All

The XForms model holds instance data and associated constraints. The XForms binding mechanism enables multiple views to bind to this model. A key consequence of this architecture is that multiple views that bind to the single model are automatically synchronized. Thus, in a multimodal interface that integrates visual and spoken interaction, user input that arrives via either interaction modality is placed in the model and becomes immediately available to all available modalities.

The ability to hold more than one instance in the model allows XForms applications to store both application state as well as interaction state within the model. The application state holds user input that is to be transmitted to the application; the interaction state can hold intermediate results that reflect user interaction. We demonstrated this form of interaction state in Figure 4.7 and Figure 8.1. This use of an instance within the model to hold interaction state can be used to create smart user interfaces that take the history of user interaction into account. Thus, a multimodal interface might use this XForms feature to track the progress of the man-machine conversation via an appropriately designed interaction state and use this in guiding the user toward rapid task completion. Thus, the XForms model is ideally suited to be used as a building block in implementing various dialog managers .

Since the effect of user interaction in the different modalities is ultimately reflected in the XForms model, higher level software components such as interaction managers that work on integrating across user input received from various modalities to infer user intent can use the XForms model as the central repository that tracks the needed information. As an example, the ( x , y ) coordinates specified via a pointing device and the spoken utterance

How do I get here?

by themselves do not fully express the user's intent. However, when the result of both of these interaction gestures is stored in a common model along with the necessary time-stamping information, these utterances can be integrated to derive the expressed user intent.

10.3.2 Abstract Controls Enable Flexible Access

Abstract XForms user interface controls when combined with the type information available from the XForms model can be turned into rich user interface controls that are appropriate for the connecting device and interaction modality. We demonstrated this with an example of control input bound to a field of type date that gets rendered as a date picker . This form of late binding of user interface widgets is extremely useful when authoring Web applications that are to be accessed from a variety of interaction modalities and end-user devices. Notice that the user interface we created by binding control input to a field of type date degrades gracefully; that is, it uses a date picker widget when delivered to an environment that supports such a widget but can be presented as a simple text field on environments where no such widget is available.

XForms user interface controls encapsulate all relevant metadata needed for interaction with a control. A key consequence is that these controls can be turned into appropriate spoken dialogs. As an example, the design of XForms control select1 enables the generation of smart spoken prompts; for instance, consider the example shown in Figure 3.9. The markup contains sufficient information to be able to generate a spoken prompt of the form

Please specify your gender; default is male .

10.3.3 XForms UI Creates Synchronized Views

XForms user interface constructs are designed to capture the underlying intent of the interaction, rather than a specific manifestation of that interface. Aggregation constructs like group enable the XForms author to provide information that is sufficient to refactor the resulting interface as needed. Dynamic user interface constructs like switch and repeat , in conjunction with model-based switching that is enabled via model property relevant , enable the creation of user interfaces that adjust to the current interaction state. These features can be used to advantage when creating rich multimodal interaction.

We demonstrated an insurance application form in Figure 5.4 and Figure 5.5 that used model-based switching to enable portions of the interface conditionally. This application can be extended to support rich multimodal interaction by allowing the user to answer a set of "yes" or "no" questions that populate fields corresponding to the user's habits. As the user answers these questions using speech input, the XForms model gets updated, and this automatically updates the visual interface to hide portions of the insurance form that are not relevant . Finally, user commands that are available as trigger controls and use one of the XForms declarative action handlers are ideal candidates when speech enabling the interface. Since the effect of these declarative action handlers is predictable, they can be activated via speech input to create simple command and control speech interaction.

Finally, notice that just as the XForms user interface can adapt itself based on previous user input, it can also adapt itself based on changes in the user environment. This flexibility comes from the use of XML Events and the ability to dispatch such events into the XForms model or user interface as appropriate. As an example, a smart multimodal device might represent the current user environment in an XForms model and update it accordingly as it detects changes in the environment, for example, increased noise level that makes speech interaction impossible . The device can arrange for appropriate event handlers that dispatch information about the updated device state to various XForms applications running on the device. These applications can, in turn , handle these events by appropriately updating the user interface, for example, by displaying a visual prompt indicating that spoken interaction is unavailable or by turning off the microphone to avoid spurious speech recognition results.

10.3.4 XML Events Enable Rich Behavior

The use of XML Events enables XForms constructs to be extended with rich behavior. One such example is given by the ability to attach voice dialogs as event handlers at various points in the XForms user interface. Voice dialogs are typically authored in Voice Extensible Markup Language (VoiceXML ^[1] ). Such dialogs can be attached to the various XForms user interface controls and aggregation constructs to create Web interfaces that speak and listen . This form of multimodal interaction is presently an area of intense activity in the W3C, and these standards are still evolving. For one example of how declarative voice handlers can be integrated using XML Events to produce multimodal interfaces, see XHTML+Voice (X+V ^[2] ).

^[1] http://www.w3.org/TR/voicexml20

^[2] http://www.w3.org/tr/xhtml+voice/

The use of XML Events enables the creation of consistent, predictable user interfaces. XForms applications can attach default event handlers for most common cases; the XForms processing model exposes a set of standard events that can be hooked to attach specific behaviors at various points during XForms processing as described in Section 8.3 and Section 8.4.

This design encourages the authoring of consistent user interaction when creating XForms applications. The definition of a set of abstract XForms event types also enables a given platform or device to map platform-specific events to these generic XForms events. As an example, a desktop interface might map the enter key to event DOMActivate ; a mobile phone might map the * key to this same event. XForms applications that use this event consistently for attaching specific behaviors will exhibit a predictable interface when deployed to both a desktop client as well as a mobile phone. This architecture enables the creation of Web applications that react as users expect on a given device and can be a key determining factor in the overall usability of an application.