Speech-Recognition Applications: A Typical Example | The Art and Business of Speech Recognition: Creating the Noble Voice

Before we get into the particulars of the design process, let's examine what a typical speech-recognition application looks like. Here's an example of a person calling the AirTran flight information toll-free number. The following exchange ensues.

SYSTEM:	Welcome to the AirTran Airways Flight Information System. If you've never called before, say "Instructions." Do you know the flight number?
CALLER:	Yes.
SYSTEM:	OK. What is it?
CALLER:	Three forty-one.
SYSTEM:	Thanks. Do you want arrival or departure information?
CALLER:	Departure.
SYSTEM:	Is it departing Saturday, Sunday, or Monday?
CALLER:	Sunday.
SYSTEM:	Great. Hold on while I check on that flight. Here it is. Flight three forty-one departed on time from Boston, Massachusetts at 6:40 A.M . and is scheduled to arrive on time in Atlanta, Georgia at 9:00 A.M . on September 23rd. Please check for specific gate information at the airport. You can say, "Repeat that," "Check another flight," or "Good-bye."
CALLER:	Good-bye.
SYSTEM:	Thanks for calling AirTran Airways. Good-bye.

Let's examine some elements of this conversation.

When the system answers the telephone call, it plays an audio file , a recorded, spoken prompt. The caller then responds. At this and each succeeding turn (also called a state ”in a speech-recognition application where the system asks the caller a question and then listens for an answer), the system analyzes the sound of what it has heard to determine whether or not it was something it expected to hear. So, for example, when the system asks the caller for "arrival or departure information," it is expecting to hear "Arrival" or "Departure" or even similar phrases, such as "Arrival information" or "Departure information."

However, when it asks for the flight number, it's expecting to hear one of many, many different responses. For example, the caller might say, "Three twenty-one," "Three, two, one," "Flight three twenty-one," "Flight three, two, one," or any other number within a particular range of a few thousand possible flight numbers . Because of this, speech-recognition systems have to be designed and tested to ensure that they can "understand" virtually all possible responses that callers to the system are likely to make. That's why the effectiveness of a speech-recognition system can depend greatly on the capabilities of its core technology ”the speech recognizer and the way it is used.