< Day Day Up > |
The Microsoft Speech Application SDK (SASDK), version 1.0 enables developers to create two basic types of applications: telephony (voice-only) and multimodal (text, voice, and visual). This is not the first speech-based SDK Microsoft has developed. However, it is fundamentally different from the earlier ones because it is the first to comply with an emerging standard known as Speech Application Language Tags, or SALT (refer to the "SALT Forum" profile box). Run from within the Visual Studio.NET environment, the SASDK is used to create Web-based applications only. Speech-based applications offer more than just touch-tone access to account information or call center telephone routing. Speech-based applications offer the user a natural interface to a vast amount of information. Interactions with the user involve both the recognition of speech and the reciting of static and dynamic text. Current applications can be enhanced by offering the user a choice to utilize either traditional input methods or a speech-based one. Development time is significantly reduced with the use of a familiar interface inside Visual Studio.NET. Streamlined wizards allow developers to quickly build grammars and prompts. In addition, applications developed for telephony access can utilize the same code base as those accessed with a Web browser. The SASDK makes it easy for developers to utilize speech technology. Graphical interfaces and drag-and-drop capabilities mask all the complexities behind the curtain. All the .NET developer needs to know about speech recognition is how to interpret the resulting confidence score.
Note VoiceXML, 2.x is simple markup language introduced by the World Wide Web (W3C) Consortium (http://www.w3.org). Like SALT, it is used to create dialogs with a user using computer-generated speech. They are both based on W3C standards. The VoiceXML specification was created before SALT and was designed to support telephony applications. SALT was designed to run on a wide variety of devices, including PDA's, smartphones, and Tablet PC's. SALT has a low-level API, and VoiceXML has a high-level API. This allows SALT a finer-level control over the interface with the user. VoiceXML does not natively support multimodal applications and is used primarily for limited IVR applications. Because of this, Microsoft Speech Server does not support VoiceXML. But everything that can be accomplished with VoiceXML can be accomplished with SALT. Telephony ApplicationsThe Microsoft Speech Application SDK enables developers to create telephony applications, in which data can be accessed over a phone. Prior to the Speech Application SDK, one option for creating voice-only applications was the Telephony API (TAPI), version 3.0, that shipped with Windows 2000. This COM-based API allowed developers to build interactive voice systems. The TAPI allowed developers to create telephony applications that communicated over a Public Switched Telephone Network (PSTN) or over existing networks and the Internet. It was responsible for handling the communication between telephone and computer. Telephony application development would further incorporate the use of the SAPI (Speech Application Programming Interface), version 5.1, to provide speech recognition and speech synthesis services. This API is COM based and designed primarily for desktop applications. Like TAPI, it does not offer the same tools and controls available with the new .NET version. Most important, the SAPI is not SALT compliant and therefore does not utilize a common platform. Telephony applications built with the SASDK are accessed by clients using telephones, mobile phones, or smartphones. They require a third-party Telephony Interface Manager (TIM) to interpret signals sent from the telephone to the telephony card. The TIM then communicates with Telephony Application Services (TAS), the Speech Server component responsible for handling incoming telephony calls (see Figure 2.1). Depending on which version of Speech Server is used, TAS can handle up to ninety-six telephony ports per node, with the ability to add an unlimited number of additional nodes. Figure 2.1. The main components involved when telephony applications are received. The user's telephone communicates directly with the server's telephony card across the public telephone network. The Third-party Telephony Interface Manager (TIM) then communicates with Telephony Application Services (TAS), a key component of Speech Server 2004.Telephony applications can be either voice-only, DTMF (Dual Tone Multi-frequency) only, or a mixture of the two. DTMF applications involve the user pressing keys on the telephone keypad. This is useful when the user is required to enter sensitive numerical sequences such as passwords or account numbers. In some cases, speaking these types of numerical sequences might entail a security violation, because someone might overhear the user. Call centers typically use telephony applications to route calls to appropriate areas or to automate some basic function. For instance, a telephony application can be used to reset passwords or request certain information. By automating tasks handled by telephone support employees, telephony applications can offer significant cost savings. Telephony applications can also be useful when the user needs to iterate through a large list of information. The user hears a shortened version of the item text and can navigate through the list by speaking certain commands. For example, if the telephony application is used to recite e-mail, the user can listen as the e-mail subjects of all unread e-mails are recited. A user who wants to hear the text of a specific e-mail can speak a command such as "Read e-mail." The user can then navigate through the list by speaking commands such as "Next" or "Previous." Multimodal ApplicationsMultimodal applications allow the user to choose the appropriate input method, whether speech or traditional Web controls. The application can be used by a larger customer base because it allows the user to choose. Since not all customers will have access to microphones, the multimodal application is the perfect way to offer speech functionality without forcing the user into a corner. Multimodal applications are accessed via Microsoft Internet Explorer (IE) on the user's PC or with IE for the Pocket PC (see Figure 2.2). Both versions of IE require the installation of a speech add-in. Users indicate that they wish to utilize speech by triggering an event, such as clicking an icon or button. Figure 2.2. The high-level process by which multimodal applications communicate with Speech Server. The ASP.NET application is accessed either by a computer running Internet Explorer (IE) with the speech add-in or by Pocket IE with the speech add-in.The speech add-in for IE, necessary for interpreting SALT, is provided with the SASDK. It should be installed on any computer or Pocket PC device accessing the speech application. In addition to providing SALT recognition, the add-in displays an audio meter that visually indicates the volume level of the audio input. |
< Day Day Up > |