What the Recognizer Hears (and the Need for Confirmation)

Telephone audio quality poses additional challenges. Conventional telephone microphones do not capture the full frequency range of spoken language. They often cut off the higher frequency sounds, such as much of the sound produced when someone says the letter s. Furthermore, every time someone speaks into a telephone microphone, the signal is compressed and transmitted across the telephone network, inevitably losing quality along the way. And these days, more people are speaking into tiny microphones on wireless phones in noisy environments and with less than crystal clear reception and transmission. By the time the signal gets to the computer, there's not much left resembling the original utterance. The recognizer has to interpret what the caller is saying using the very limited data that it receives, a job that can be difficult even for people to do well.

We've all become accustomed to listening to voices on the phone and figuring out what people are saying even when a cell phone connection drops seconds of a call. Still, we often confirm information on the phone with each other when it is important to be accurate ("OK, so we're going to meet at eight o'clock at Radius on High Street, right?"). We even may confirm details when talking to someone face-to-face due to the potential for someone hearing something incorrectly. Similarly, speech-recognition systems often confirm things with callers to ensure that it correctly understood what was said.

Confirmation is also used to minimize ambiguity and imprecision. American English is full of homonyms and words with multiple meanings, which can create ambiguity. Take the word "mean," for example. According to the Oxford English Dictionary , that word has a total of 21 meanings ”7 as a verb, 11 as an adjective, and 3 as a noun. Multiple meanings and homonyms are a challenge for computers ”which is why word-processing spelling and grammar checkers often miss mistakes obvious to a human. Add these issues to the ambiguities of the language and the imprecision of colloquial speech, and it's amazing that speech-recognition systems work at all!

For example, say a large software/hardware company uses a speech-recognition system to answer help desk calls and solve basic problems. Imagine that the system has great speech-recognition technology, understands a very large vocabulary of technical terms, and can recognize many types of questions. Here's how a typical call might go.

SYSTEM:

Welcome to the MegaComp help desk line. How may I help you?

CALLER:

Uh, I'm having a networking problem with my new software and server.

Despite all its advanced capabilities, the computer would still have to ask clarifying questions because of the ambiguity and imprecision of spoken language.

In this example the caller could mean

  • There's a problem between the new software and an existing server.

  • There's a problem between the new software and a new server.

  • There's a problem on the network as a result of installing new software and a new server.

  • There's a problem on the network as a result of installing new software on an existing server.

  • There's a problem on the network as a result of installing new software, and there's a separate, unrelated problem with the server.

Designers have to take all of these possible meanings into account and structure the application accordingly . By writing prompts carefully and inserting confirmations at any point where callers provide critical information, designers can ensure a more efficient, friendly, and accurate system.



The Art and Business of Speech Recognition(c) Creating the Noble Voice
The Art and Business of Speech Recognition: Creating the Noble Voice
ISBN: 0321154924
EAN: 2147483647
Year: 2005
Pages: 105
Authors: Blade Kotelly

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net