Ask Dr. Blade

Ask "Dr." Blade

I get a lot of questions about the psychological aspects of speech-recognition systems. Here are some of the more common questions ”and my answers.

Should we use a male or a female voice for our system?

Social psychologists have found that in North America there are stereotypes associated with male and female voices. Male voices are seen as having more authority on technical matters, and female voices are seen as being more nurturing. [1]

[1] Clifford Nass and Byron Reeves, The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places (New York: Cambridge University Press, 1996), Chapters 14 and 15.

If a speech-recognition system were to inform callers about something sensitive or depressing, perhaps a female voice would be more appropriate. Likewise, if we wanted a system to tell people how to fix their computers, they might prefer to hear it from a male voice.

There are other factors, however, a designer should consider when choosing the gender of a client's speech-recognition system. For example, what about the current spokesperson for the company ”is it a woman or a man? What about the user population? Is it primarily male or female?

If we were to design a horse- betting application that needed to understand callers saying "Race three, three dollar trifecta, one-two-seven, over eleven-eight-three, over nine," we might be tempted to choose a male voice, since the information is fairly technical. But I would disagree . I'm told that most people who bet on horse races are older men in their 50s, 60s, and 70s. I'd be inclined to think that these people would rather hear a smart, efficient, yet sexy female voice. So perhaps it doesn't always make sense to follow stereotypes. There are no hard-and-fast rules in choosing the gender of a voice for a particular company, but it is important to consider the psychological issues and not just the marketing ones.

Should we anthropomorphize our system?

People often ask me if a system should say things like "I'd like to do that, but . " or "I'm sorry, I didn't understand you" rather than the more impersonal "We're sorry but ." or "The system didn't understand you."

In all the work I've done for United Airlines, FedEx, and myriad other companies reaching millions and millions of people, the anthropomorphized, first person approach works best.

Speech-recognition systems should talk to people just as other people would talk to them, because ultimately the system is just another social actor, just like anyone else on the other end of the phone. These systems are unlike touchtone systems, which don't have the same necessity to say "I" because they don't really listen to callers in the same way that a speech-recognition system does. Also, it would sound strange for a system to refer to itself in the third person when it apologizes or provides information: "The system will read the following. " Huh? Which system? Some other system? I mean, unless we're talking about a speech-recognition system for Buckingham Palace, using the " royal we" only conveys a feeling of removed formality . This would undermine what speech-recognition systems do best ”which is to work together, collaboratively, with callers to achieve a goal. Have you ever tried to set up a tent with someone who said "we" when they meant "I?" (Quite annoying: "We don't think that you're placing the pole correctly.")

In my experience, callers prefer anthropomorphized systems. For example, after a usability test for a flight information system, one usability test participant commented, "I felt that he (if I may call him that) really wanted to help me get the [flight] information I wanted."

Can a designer go too far with anthropomorphism?

Problems arise when a system acts as if it has feelings, is untruthful simply to manipulate people, acts overly familiar and chatty, preventing callers from completing their tasks in a timely manner. A good speech-recognition system is a lot like P. G. Wodehouse's Jeeves the valet ”efficient, helpful, discreet, and unobtrusive .

Should speech-recognition systems be humorous ?

Humor is fine, in the right context, and done in the right way. In fact, it's a great way to engage and disarm callers if they expect a stern -sounding, formal company. But it has its limitations ”for a variety of reasons.

As every great comedian knows , in comedy , timing is everything. A speech-recognition system cannot be expected to know when a caller wants to hear a joke. And people sometimes see humor in a commercial setting as an indication that the company doesn't take its business or its customers' needs seriously.

To add a more universal form of humor, some systems use the aural equivalent of a sight gag. Wildfire had a humor setting that can be set from "low" to "high." In the high position, when a caller checking messages instructs the system to "Sort [the messages] the other way," Wildfire responds with the sound of cards shuffling before proceeding to the next prompt. It doesn't take much time, and it's evoked so rarely that most callers don't get tired of it quickly.

Also, most applications ”particularly corporate call centers that handle tasks such as flight information, product orders, or rental car reservations ”don't lend themselves well to humor. Actually, Southwest Airlines Co. is one airline that allows flight attendants to use humor when they recite the in-flight information. But consider this: if people call in to the same system often, that one joke is quickly going to become very tiresome ”particularly since callers tend to pay more attention to a system on a phone then travelers do to flight attendants. Another reason to minimize humor is that it is highly subjective . The more diverse the calling population, the less likely it is that all of them will find a particular remark funny . If, on the other hand, the calling population is more homogeneous, targeted humor will be more successful.

What about those words like "Oops," "Oh!" and "OK?" You don't need those, right?

When people read my design documents they usually look at these small words and wonder , "Why are all these little words in here? We never use those in our touchtone system." And perhaps a system doesn't need to say particular words, like "OK" in the context of providing information, but it's a fast, easy, friendly way to let callers know that the system understood what they said before going on to ask the next question.

It's also polite and natural. Imagine a state motor vehicle department where the employees ask several questions in rapid-fire order without acknowledging they heard the answers.

CLERK:

Make of car?

DRIVER:

Uh, Mercedes.

CLERK:

Model year?

DRIVER:

It's a 1970.

CLERK:

Color ?

DRIVER:

Silver ”well, silver-gray.

CLERK:

Gray?

DRIVER:

Well, I guess .

People who are perceived as being polite and actively listening to someone never talk that way. They use linguistic discourse markers, [2] such as "OK" and "I see" to allow other people to better understand that their ideas are being understood. If the motor vehicle department employee were acting more like a human and less like a machine, the conversation would feel more polite, friendly and natural.

[2] Deborah Schiffrin, Discourse Markers (Cambridge, England: Cambridge University Press, 1987).

CLERK:

What's the make of your car?

DRIVER:

Uh, Mercedes.

CLERK:

OK. And the model year?

DRIVER:

It's a 1970.

CLERK:

Got it. What's the color?

DRIVER:

Silver ”well, silver-gray.

CLERK:

All right, looks like we're done.

Words like "Oh!" can be used to reorient someone's thoughts. After a long instructional statement, a designer can add the word "Oh!" to shift the caller's attention to the last statement.

SYSTEM:

and those are the primary commands you'll be using with this system. Oh! And if you ever get stuck, just say "Help" and I'll take things from there.

That one little word recaptures the caller's attention and highlights an important piece of information.



The Art and Business of Speech Recognition(c) Creating the Noble Voice
The Art and Business of Speech Recognition: Creating the Noble Voice
ISBN: 0321154924
EAN: 2147483647
Year: 2005
Pages: 105
Authors: Blade Kotelly

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net