5.1 Dialog Strategy and Grammar Type | Voice User Interface Design 2004

The basic choice of dialog strategy and grammar type will have significant effects on all other design decisions. In general, you consider dialog strategy and grammar type together, because particular choices of dialog strategy may demand a particular type of grammar (e.g., rule-based versus statistical).

One common way to classify dialog strategies is to separate them into directed-dialog (or system initiative) versus mixed-initiative types. In a directed dialog, the system asks the caller very specific questions and expects very specific answers. In effect, the system initiates, and closely directs, all interaction. Most systems deployed thus far have used a directed-dialog strategy. A directed dialog for travel planning might sound like this:

(1)

SYSTEM:	What's the departure city?
CALLER:	Um, San Francisco.
SYSTEM:	And the arrival city?
CALLER:	I wanna go to New York.
SYSTEM:	OK, what day are you leaving?
CALLER:	Next Tuesday.
SYSTEM:	Great. And what time do you want to go?
CALLER:	Sometime after ten a.m.

This directed dialog is an example of form-filling. The caller is asked a series of directed questions as if the caller were filling out a form.

This is one of the two most common types of directed dialog structures. The other is a menu hierarchy. In menu hierarchies, the caller is presented with a number of options in a menu (e.g., "Which would you like to do: get flight information, make a reservation, or hear about our special offers?"). Once the caller makes a selection, the system offers another menu until the application settles on the one item or action the caller wants. Typical applications use form-filling and menu hierarchies at different points, depending on the nature of the information and the caller's mental model.

With a mixed-initiative dialog strategy, the same travel dialog might allow callers more flexibility in what they can say. First, the initiative comes from the caller. Depending on the caller's response, the system may then take initiative and prompt for missing information:

(2)

SYSTEM:	What are your travel plans?
CALLER:	I wanna go to New York next Tuesday morning.
SYSTEM:	OK, and what's the departure city?

In this second example, the caller provides several pieces of information for the trip, and then the system takes the initiative and prompts for the rest. All mixed-initiative dialogs need to include back-off strategies to capture missing pieces of information.

There are a many shades of gray between directed-dialog and mixed-initiative approaches. In the following example, the system makes a direct request, but the recognition grammar is flexible, allowing the caller to provide additional information.

(3)

SYSTEM:	What's the arrival city?
CALLER:	I want to go to New York next Tuesday.
SYSTEM:	OK. What time do you want to leave?

In this piece of dialog, the system makes a directed request for the traveler's destination. However, the caller replies with both the arrival city and the day of travel. The system is capable of handling this input and proceeds to prompt for the departure time the only remaining piece of information needed. Although this approach is convenient for the caller, it makes the recognition task a bit more challenging because more variation is covered in the grammar, something that increases the size of the recognition model. This kind of strategy makes sense when many callers include additional information in their responses. Some applications support portions of mixed initiative and directed dialog, again depending on the nature of the information and the caller's mental model.

Call routing, as discussed in Chapter 1, is the most commonly deployed type of mixed-initiative system. Call routing is popular because it fulfills a strong business need. Many companies provide telephone access to a wide array of services, transactions, and information sources, often numbering in the hundreds. But previous automated approaches to help callers connect with the appropriate service have been problematic. Some companies have tried to use touchtone systems with long menu hierarchies. These have proven extremely difficult to use because of the difficulty of mapping the large number of diverse services onto an intuitive menu structure. Both the hang-up rates and the misrouting rates have been very high in these systems. Other companies have a pool of 800 numbers, but many callers do not know which number is most appropriate for their issue and end up calling the wrong service. This costs the business money because of transfers, and it wastes the caller's time. Speech-based applications that use statistical grammars and advanced natural language understanding have proven to be very effective at solving this important business problem.

Mixed-initiative systems of the type shown in dialog 2 are relatively rare but are likely to become more common. Although such an approach can greatly improve efficiency and ease of use, the business drivers for deploying such systems are not as strong as for call routers because a directed dialog can often solve callers' problems, even if less efficiently.

Choosing a mixed-initiative approach usually implies the need for a statistical grammar (SLM). The amount of variation that can be expected from callers is simply too great to capture with a handcrafted rule-based grammar. It is hard to predict all the variations, and the grammar would grow huge and complex.

The following simple guidelines will help you decide whether to use a statistical or rule-based grammar. Use a rule-based grammar in the following situations:

When caller input can be constrained
When a handcrafted grammar can achieve high coverage
When free-form speech is not expected

Use a statistical grammar in these cases:

When the preceding guidelines for rule-based grammars do not hold
When an open prompt is required rather than a prompt that explicitly describes what to say (e.g., for call routing)
When a grammar with trained probabilities can perform better because of contextual information (e.g., name spelling, given that certain letter combinations are far more likely than others)
When efficiency is highly important and you wish to minimize the number of steps to complete a task

When you use a mixed-initiative/SLM approach, it cannot be stressed enough how important it is to use appropriate VUI design strategies. A number of aspects of design are significantly different for mixed initiative than for directed dialogs, including prompting, error recovery, and other strategies. The mental model that must be communicated to the caller is different for an SLM-based system than for a directed dialog. With a directed dialog, you can direct the caller quite explicitly. With a mixed-initiative system, the range of possibilities is simply too large. You must find other ways of creating the model of how callers should speak to the system. When there are problems (e.g., a recognition reject), you must carefully design error prompts to get callers back on track and clarify how they should speak. Chapter 12 presents specific guidelines for mixed-initiative prompting.

SLM-based grammars have one other advantage that many people don't recognize. When using an SLM and analyzing data during the tuning stage, you often learn about things callers are asking for that you never would have discovered with a touchtone or directed-dialog speech system. We have added routes to call routing systems based on lessons we learned from the data, often teaching the deploying company things it never knew before about the needs of its customers.

Some practitioners claim that the main purpose of using advanced natural language understanding (mixed-initiative) technology is to make callers believe they are talking to a live human being. That is absolutely not the case! We strongly advise against any attempts to fool callers into thinking they are interacting with a person rather than a machine, for the same reasons stated in the discussion of persona design in Chapter 4. Such an incorrect mental model is likely to lead to problems, given that even the most advanced technology is far less intelligent and capable than a human. The purpose of using natural language technology is to better meet business needs and user needs in those cases when it is the best solution for the application.

As with any design decision, when you are selecting between a directed-dialog and a mixed-initiative strategy, consult all the information gathered in the requirements phase about the needs of the business, user, and application, and keep in mind the key design criteria for the application.