Why Create Multimodal Applications? | Building Intelligent .NET Applications(c) Agents, Data Mining, Rule-Based Systems, and Speech Processing

< Day Day Up >

Telephony applications are the only choice for companies that wish to offer voice-only access to information through a telephone. There is, however, a growing trend toward creating multimodal applications in which the user can enter information using speech, typed text, or point-and-click methods. Interaction with a multimodal application can also involve text spoken to the user.

Note

In addition to speech and graphical user interfaces, multimodal applications can also involve interaction through other methods, such as pens, vision, and gestures.

The Multimodal Interaction Working Group is a subgroup of the W3C that is responsible for establishing standards for the development of multimodal applications.

In September 2004, the group released a working draft on the use of a markup language for ink known as InkML. Readers interested in learning more about this input method can refer to the draft at http://www.w3.org/TR/2004/WD-InkML-20040928.

As opposed to telephony applications, which have been around for several years in the form of IVR's, multimodal applications have emerged just recently. This development was made possible primarily by the introduction of a wide variety of mobile computing devices, such as PDA's, Tablet PC's, and smartphones. Multimodal applications are very well suited for these types of devices because speech is an easier and more efficient method of input than a small keypad or a stylus pen.

Multimodal applications are useful for Web-based applications that need to integrate speech with traditional point-and-click methods. They are a good way to introduce speech non-obtrusively. Since there is a choice of input mechanisms, users are not forced to use only one modality. A user's decision about which input mechanism to use may change depending on the environment. For instance, if a user is in a noisy environment, speech may not be a practical choice. However, a user trying to juggle multiple items may find a hands-free access method such as speech the obvious and best choice.

The Microsoft Speech Server SDK is ideal for companies that have already adopted Visual Studio.NET as a development tool. As opposed to learning a new language or using some proprietary development tool, the company may be able to utilize code already developed when creating speech-based applications. This would be so for both telephony and multimodal applications. However, for multimodal applications, there is the added benefit of being able to add speech abilities to an existing interface.

Companies that have invested valuable time in creating interfaces that work well for their customers and employees do not want to abandon them. Multimodal applications allow developers to add speech processing to particularly well suited areas of their Web applications. By doing so they can enhance and extend the capability of their current applications.

< Day Day Up >