13.3 Recovering from Errors | Voice User Interface Design 2004

One of the fundamental challenges for speech technology (and therefore a challenge for the VUI designer) is the capability to know when there is a problem. When a false accept occurs, the system proceeds, given what it understood the caller to be saying, without realizing that something is wrong. To deal with this, the application must use a variety of techniques to confirm results with callers, when necessary and appropriate, to avoid taking undesired actions.

When the system does know there is a problem (e.g., the recognizer returns a reject), a gap often still exists between what the system knows and the root cause of the problem. In the case of a reject, the system knows only that there was no path through the recognition model that was a good match for the input. However, the reject may be caused by out-of-grammar input from the caller, a noisy cellular connection, a rock 'n' roll band playing in the caller's living room, or a variety of other problems. Despite the uncertainty about the underlying cause, the application must get the dialog back on track.

This section covers confirmation strategies to detect and fix recognition errors when they occur, and recovery strategies to use when the recognizer returns an error condition such as a reject or timeout.

13.3.1 Confirmation Strategies

False accepts may cause serious problems. The system may take an action (e.g., buy 1,000 shares of IBM) or transition to an unintended subdialog. In general, detecting false accepts requires some sort of confirmation dialog in which the system validates its hypothesis with the caller (e.g., "Going from New York to San Francisco. Is that correct?"). There are a number of issues concerning the design of confirmation dialogs, including

When to confirm
How to confirm
Avoiding repeat errors

When to Confirm

Confirmation is necessary if the cost of an error is high, if it would be hard to get back on track, or if recognition confidence is low. For example, before executing a transaction (e.g., transferring money, booking a flight), you would certainly want to confirm the parameters of the transaction with the caller. Failing to confirm in such cases, even if the system is highly confident that it has the right information, not only risks a costly mistake but also tends to reduce caller confidence.

Keep in mind that there are two roles of confirmation: first, to allow the system to validate its recognition hypotheses, and second, to communicate correct recognition to the caller, thereby buttressing user confidence and making transparent the current state of system knowledge.

On the other hand, there are cases when confirmation is not necessary. Many stock quote systems don't explicitly confirm company names. If the company name is misrecognized, the caller loses a few seconds hearing the wrong quote but then can proceed without problem. In fact, callers can barge in as soon as they hear the wrong company name at the beginning of the quote and ask again for the desired company. The cost of the error is low in terms of time lost and in terms of user confusion. It is clear how to proceed. Furthermore, stock quote systems typically have very high accuracy, further reducing the need for confirmation.

Another approach is to use the confidence level returned by the recognizer to decide whether or not confirmation is needed. This is sometimes referred to as three-tiered-confidence: If the confidence is high, don't confirm; if it is low but above the reject threshold, confirm; otherwise, reject.

How to Confirm

When confirmation is clearly needed, there are a number of ways to handle it. You can confirm each item as it is spoken, or you can group related items for confirmation before completing a transaction or other application task. In general, the latter approach makes sense: Confirm a group of related items at the same time.

Let's compare the following two confirmation approaches for a travel application. The first one immediately confirms each item:

(1)

SYSTEM:	What's the departure city?
CALLER:	San Francisco.
SYSTEM:	San Francisco. Is that correct?
CALLER:	Yes.
SYSTEM:	To what city?
CALLER:	New York.
SYSTEM:	New York. Is that right?
CALLER:	Yes.
SYSTEM:	Great. What day do you want to leave?
CALLER:	I wanna leave March fifth.
SYSTEM:	March fifth. Is that correct?
CALLER:	Yes.

This approach can lead to a rather tedious and stilted dialog. In contrast, the following approach groups items for confirmation:

(2)

SYSTEM:	What's the departure city?
CALLER:	San Francisco.
SYSTEM:	To what city?
CALLER:	To New York.
SYSTEM:	And on what date?
CALLER:	I'm leaving on March fifth.
SYSTEM:	Now let me confirm that: San Francisco to New York on March fifth. Is that correct?
CALLER:	Yes.

This approach leads to a much more fluid and efficient dialog. In general, when you group items for confirmation, they should be logically related (e.g., the three parameters for a leg of a trip). Furthermore, a small number should be grouped. A good rule of thumb is to group no more than four items for one confirmation.

Avoiding Repeat Errors

When there is a negative response to a confirmation prompt, the system has some valuable information. It knows what the caller does not want. Systems should make sure not to repeat the same mistake.

Here's an example of this type of mistake:

(3)

SYSTEM:	What's the flight number?
CALLER:	Sixteen fifty-five.
SYSTEM:	Now is this for arrival or departure information?
CALLER:	Arrival.
SYSTEM:	Let me confirm that: Flight 6355 arriving on
CALLER:	No.
SYSTEM:	Sorry, which part should I change, the flight number or the date?
CALLER:	The flight number.
. . .
SYSTEM:	What's the flight number?
CALLER:	Sixteen fifty-five.
SYSTEM:	Let me confirm that: Flight 6355 arriving on
CALLER:	No! <hang up>

In addition to frustrating the caller, risking endless loops through the same error (until the caller hangs up in anger), and wasting time, this mistake hurts the image of the system persona. It appears to lack basic intelligence, memory, and the ability to track the conversation.

To avoid repeating mistakes, disconfirmed items should be put in a skip list. The ensuing recognition should be configured to return an N-best list. If the first item in the N-best list is on the skip list, then the application should skip (ignore) this item because it already knows that the information is incorrect. The next item on the N-best list that is not on the skip list should be the hypothesis.

13.3.2 Recovering from Rejects and Timeouts

Chapter 5 reviews basic error recovery strategies, including escalating detail and rapid reprompt. It also discusses the use of maximum error thresholds, the number of successive retries before transferring to a customer service agent. Here we discuss the design of the error prompts themselves, specific to each dialog state.

When you're designing error recovery prompts, the challenge is that you do not know the root cause of the problem. If the recognizer returns a reject, there can be many underlying reasons. If the reason is that the caller spoke out-of-grammar, that, in turn, may be the result of a number of different problems.

The best approach to designing error prompts is to try to understand the most likely reasons for the problem, considering the context of the dialog state. Given the limits described in Chapter 5 on the number of retries, you get only a few opportunities (typically no more than three) to fix the problem before a caller gives up. Therefore, error recovery prompts should be designed to optimize the chance of successful recovery for the most likely problems. After the application is in pilot and real in-service data become available for analysis, you will have concrete data on the problems that callers are experiencing and may decide to further refine some of the error messages at that point.

Let's look at an example of failed error recovery:

(4)

CALLER:	Order status.
SYSTEM:	To better assist you in checking the status of your order, we need your order number. Please say your order number.
CALLER:	I don't know what it is. <reject>
SYSTEM:	I'm sorry, I didn't understand. Please say or key in your eight-digit order number. You can say, "Help" for more details.
CALLER:	Help.
SYSTEM:	Okay, here's some help. In order to assist you, we need your order number. Please speak naturally and say the eight digits of your order number. You can also enter them on your touchtone keypad.
CALLER:	<hang up>

In this case, the caller hears the initial prompt, the first error recovery prompt, and the help prompt. As he transitions through the various prompts, he receives more instruction about how to say his order number. He even hears that he can key it in.

One implicit assumption behind this design (an assumption that turns out to be false) is that the caller knows his order number or has his invoice and knows where to find it. Other assumptions (which could easily have been false) are that the caller landed in this dialog state for the right reasons (e.g., not because of a recognition error or caller confusion) and that he never changed his mind about what he wanted to do.

Assuming an escalating error handling strategy, here is an alternative set of prompts for that dialog state:

INITIAL PROMPT:	Okay, order status. Please say your order number.
REJECT 1:	I'm sorry, I didn't understand. To check the status, I'll need your order number. It appears in the upper left-hand corner of your invoice. Please say your eight-digit order number now. You can also say, "Help" for more information.
REJECT 2:	I'm sorry, I still didn't understand. If you know your order number, please key it in. Otherwise, say, "I don't know" and I'll connect you to someone who can help you.
REJECT 3:	I'm sorry, I'm still having trouble. Please hold while I transfer you to a customer service representative who can help you.
HELP:	Okay, here's some help. I need your order number so I can check the status. If you know it, you can say it or key it in now. Otherwise, say, "I don't know," and I'll connect you to someone who can help you. At any time, you can go back to the beginning by saying, "Main menu."

This approach accounts for a number of possible problems that the earlier strategy did not. First, we are handling the case in which the caller does not know his order number (assume that this grammar includes phrases such as "I don't know," "I don't know what it is," "I don't have it," etc.). Second, the help prompt lets the caller know about the main menu universal, so there is a way to recover if the caller got to the order status part of the application by mistake or has changed his mind. Third, given that for tasks like this the caller will often have his invoice in hand (assume we learned this from researching usage profiles during requirements definition), he is told how to find his order number.

When there is more than one reject in a state, it is often advisable to try an alternative approach, if one is available, to help the caller complete the task, as when the preceding example allows touchtone input of the order number. The following example, from a flight information system, shows another approach:

(5)

SYSTEM:	What's the flight number?
CALLER:	. . . <reject>
SYSTEM:	What was that?
CALLER:	. . . <reject>
SYSTEM:	I still didn't get it. Let's try another approach. What's the origin city of the flight?
CALLER:	Boston.
SYSTEM:	And what's the destination?
CALLER:	New York, Kennedy Airport.
SYSTEM:	Okay. And what time is it due?
CALLER:	Around noon.
SYSTEM:	Got it. American Airlines flight 135 will be arriving at JFK on time at 12:15 p.m. at gate 3.

You can apply the following guidelines when crafting error recovery and help prompts. These guidelines are based on an analysis of errors observed in deployment data across many applications.

When appropriate, provide more specific detail about what can be said (what is in-grammar).
Always provide an escape (the ability to navigate elsewhere, such as back to the main menu). Callers may be in the dialog state because of a recognition error or confusion, or they may change their minds about what to do.
Teach callers about universals.
After a second error, present an alternative means to accomplish the goal, if there is one.
When requesting specific pieces of information (e.g., account numbers, ID numbers, medical group numbers), accommodate callers who do not have the information. If the information is easily available on something callers are likely to have handy (e.g., an invoice, a medical card), tell them how to find it.

When appropriate, provide examples. Callers often pattern their answers after examples. Therefore, examples can help with types of information that can have many formats (such as dates) and when callers are using unexpected fillers. For example:

(6)

SYSTEM:	What's the date of travel?
CALLER:	Um, I gotta get there by next Saturday. <reject>
SYSTEM:	I'm sorry, I didn't understand. Please say the date of travel for example, "January eighth" or "April twenty-third."

When there are rejects after confirmation prompts, tell callers to answer "Yes" or "No." Often, when callers respond negatively to a confirmation prompt, they try to correct the error in the same response (e.g., "No, not Boston Austin"). Because it is hard to cover the great variety of ways callers may combine disconfirmation and the correction in a grammar, it is best to constrain them to a yes/no grammar and then step them through the correction process.
If lots of no-speech timeouts are observed for a state (either in usability data or pilot data), consider a delayed help approach (see Chapter 12) for the initial prompt.