8.1 Anatomy of a Dialog State | Voice User Interface Design 2004

The smallest unit typically represented in a call flow diagram is a single dialog state. Most often, a dialog state involves a single interchange between the caller and the system. However, if the system handles unexpected input, it may reprompt the caller within the same dialog state. Here's an example:

SYSTEM:	How much do you want to transfer?
CALLER:	<unrecognized speech recognizer returns reject>
SYSTEM:	Sorry, how much do you want to transfer? [rising intonation]

In this case, the system is still in the same state, still trying to elicit the transfer amount, and using the same grammar to try to recognize the input.

For each dialog state, you must describe a number of components. A typical case includes the following:

Initial prompt(s): This is the prompt that is played when the conversation first reaches this dialog state. As you will see in section 8.3, it is often important to define more than one possible initial prompt, depending on recent history (e.g., which state preceded the current one). Section 8.3 describes the methodology for defining prompt wording.
Recognition grammar: Although the full development of the recognition grammar is saved for a later step, it is appropriate to create a high-level definition of the grammar as you define a dialog state. This definition will be used as a guide by the grammar developer. The high-level definition should describe the specific items of information to be returned from the grammar (e.g., the destination city for a travel application), possibly by specifying the names of slots the grammar can fill. Additionally, it is often useful to provide a number of sample expressions to give the grammar developer an idea of the range of expression expected. For example, if you provide sample expressions for a yes/no grammar such as "Yup," "Okay," and "Yeah, that's what I want," the grammar writer will be aware of the need to flesh out the grammar to cover the wide variety of ways callers may express themselves to indicate yes or no.
Error handling: Many things can go wrong during recognition. The caller may say something not covered by the grammar, or background noise may prevent accurate recognition. In both these cases, the recognizer is likely to return a reject. The caller may not respond at all, in which case the recognizer may return a no-speech timeout. For each type of error message that the recognizer may return, the designer must specify appropriate handling (such as the reprompt, "Sorry, how much do you want to transfer?" in the earlier example). Chapter 13 discusses error handling in detail.
Handling of universals: Chapter 5 describes universal commands. Occasionally, a universal must be overridden in a particular dialog state.
Action specification: There are a number of actions that may happen during execution of a dialog state that should be specified. For example, the application may access a backend database or other system. If so, you must specify handling for all possible failure modes as well as for success. Additionally, you should specify transitions to other dialog states along with any logic or conditions for the transfer (the call flow diagram, shown in a moment, is a convenient place to specify transitions).

These are the components of a typical dialog state. The next two sections cover the methodology for creating the two primary elements of a design: the call flow and the prompts.