Why Designing a Speech-Recognition Application Is Challenging

Think back to the AirTran example in Chapter 1. It was very short, wasn't it? Look how many points of discussion it raised ”and we're only scratching the surface. Speech-recognition applications are powerful because they are apparently simple. Touchtone systems really are simple because callers know that their responses are limited to pressing any of 12 keys. And in a person-to-person conversation, people can reasonably expect that the person on the other end of the line can understand most spoken ideas. Speech-recognition systems fit somewhere in the middle. They have a seemingly natural interface, but the recognition application doesn't yet converse on a human level.

Callers don't know what the system can and can't understand if the system doesn't let them know the parameters (or degrees of freedom) of the interaction. Until the system tells them, callers don't know if they can ask natural questions, such as "What's the traffic like on I-93 South?" or if they must first indicate they want "Traffic information" and then indicate the route. It's up to the system (via the designer) to inform them.

Even the simplest questions can cause problems in a poorly designed speech-recognition system. For example, a large telephone company in the southern United States deployed a system back in 1997. Whenever this system attempted to confirm a caller's selection, it would ask, "Is that correct?" It was discovered that the system failed repeatedly when it came to that point in the conversation. Its designers were mystified. How could anyone misunderstand a question so simple and unambiguous? They assumed most callers would reply "Yes," "No, "Correct," "Incorrect," or some variation thereof. But even with all of these synonyms and variations programmed into the recognizer, there was still a high failure rate.

After listening to calls, the designers finally figured out the problem. Being gracious and polite southern folks, many of the callers were replying "Yes, ma'am" or "No, ma'am." The designers hadn't programmed that "ma'am" into the recognizer. A change was made to the system so that it recognized the phrases that callers said and within minutes of the change the problem was solved .



The Art and Business of Speech Recognition(c) Creating the Noble Voice
The Art and Business of Speech Recognition: Creating the Noble Voice
ISBN: 0321154924
EAN: 2147483647
Year: 2005
Pages: 105
Authors: Blade Kotelly

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net