Chapter 2. Creating Applications That Talk | Building Intelligent .NET Applications(c) Agents, Data Mining, Rule-Based Systems, and Speech Processing

< Day Day Up >

In 1939, Bell Labs demonstrated a talking machine named "Voder" at the New York World's Fair. The machine was not well received because the voice was robotic and unnatural sounding. Since then, many advances have been made in the area of speech synthesis specifically in the last five years. Also known as text-to-speech, speech synthesis is one of two key technologies in the area of speech applications.

The second technology is speech recognition. For decades, science fiction movies have featured talking computers capable of accepting oral directions from their users. What once existed only in the thoughts of writers and filmmakers may soon become part of everyday life. In the last few years many advances have been made in the area of speech recognition by researchers such as the Speech Technology Group at Microsoft Research.

Speech-based applications have been slowly entering the marketplace. Many banks allow customers to access their account data through automated telephone systems, also known as Interactive Voice Response (IVR) systems. Yahoo and AOL have set up systems that read e-mail to their users. The National Weather Service (NOAA) has an application that reads the weather.

Speech processing is an important technology for enhanced computing because it provides a natural and intuitive interface for the user. People communicate with one another through conversation, so it is comfortable and efficient to use the same method for communication with computers.

Recently Microsoft released Speech Server as part of an effort to make speech more mainstream. Microsoft Speech Server (MSS) has three main components:

Speech Application SDK (SASDK)
Speech Engine Services (SES)
Telephony Application Services (TAS)

All three components are bundled into both the Standard Edition and the Enterprise Edition. The primary difference between the two depends on how many concurrent users your application must support.

Speech Engine Services (SES) and Telephony Application Services (TAS) are components that run on the Speech Server. The Speech Server is responsible for interfacing with the Web server and the telephony hardware. Web-based applications can be accessed from traditional Web browsers, telephones, mobile phones, pocket PC's, and smart phones.

This chapter will focus primarily on specific components of the SASDK, since this is the component most applicable to developers. The installation files for the SASDK are available as a free download from the Microsoft Speech Web site at http://www.microsoft.com/speech/. Chapters 3 and 4 will expand on the use of the SASDK and will introduce two fictional companies and the speech-based applications they developed.

< Day Day Up >