14.1 Call Flow Design | Voice User Interface Design 2004

We start by creating a high-level call flow of the entire system (see Figure 14-1). The call flow shows the basic architecture of the dialog. The major components of the system are encapsulated in subdialogs. Callers first enter the Welcome state, where they hear the branding earcon followed by the welcome prompt. Then they enter the Login subdialog, which we will flesh out later.

Figure 14-1. The high-level call flow for Lexington Brokerage shows the basic architecture of the application.

graphics/14fig01.gif

After successfully logging in to the system, new callers will hear the first-time caller message, which introduces the customer to the application. Rather than use an extremely long first-time caller message or offer a tutorial, we decide to keep the first-time caller message short and use just-in-time instruction to gradually teach users about system capabilities. Just-in-time instructions will be triggered occasionally to teach callers about functionality they are not using and to help them the first time they exercise a feature.

Following login and the first-time caller message, callers reach the main menu. Here they will be given the flexibility to make almost any request. For example, they can say, "Give me a quote on Cisco," "I wanna make a trade," "Buy 100 shares of Apple," "Buy 100 shares of Apple at the market," and so on. The details of each type of operation are then handled in the subdialogs that follow the main menu.

To prepare for our WOZ usability test, we must specify the designs for the Login, Quotes, and Trading subdialogs. Ultimately, all the subdialogs will be fleshed out so that we have call flows that show every dialog state in the system.

14.1.1 The Login Subdialog

Figure 14-2 shows the call flow for the Login subdialog. A number of important dialog strategies are used in this subdialog. First, this is where we execute the approach discussed in Chapter 7 to optimize the accuracy of account number recognition. We do N-best recognition in the GetAccountNumber state, and in the ensuing state we test the checksum digits of the items on the N-best list to see which ones are valid account numbers. For valid account numbers, we query the account number database to find out whether they are actually in use. We choose the account number with the highest confidence (earliest on the N-best list) that has a valid checksum and exists in the database. If no items on the N-best list are valid, we query the caller again.

Figure 14-2. The call flow for the Login subdialog shows how we optimize accuracy and handle login problems.

graphics/14fig02.gif

Another design choice shown in the Login subdialog is what we do when callers do not know their account number. We learned in our observational study that some callers do not remember their account number and do not have an account statement handy. In those cases, the brokers can access their accounts by asking for some personal information (e.g., name, address, mother's maiden name). Rather than terminate calls to our system if the account number is unknown, we decide to transfer callers to an agent who can help them. Furthermore, we include phrases such as "I don't know" in the grammar for the GetAccountNumber state, and when there are rejects or requests for help, we tell callers that they can say, "I don't know" and get assistance.

If callers are having trouble logging in with their account number, we want to avoid hang-ups by transferring them to an agent who can help them. To strike a balance between maximizing automation rates and avoiding hang-ups, we choose to transfer them after the third failure in the GetAccountNumber state. Whenever we transfer callers to live agents, we first play a message clarifying what is happening (e.g., see states LoginFailureandTransferMessage and AcknowledgeandTransferMessage in Figure 14-2). This accords with results reported in CCIR-2 (1999), in which callers expressed a strong preference to be told when and why they were being transferred.

14.1.2 The Quotes Subdialog

Two dialog strategy issues come up in the design of the Quotes subdialog: handling ambiguous company names and the use of just-in-time instruction to teach users about watch lists.

Many company names are ambiguous. For example, Cisco (Systems) and Sysco (Foods) are pronounced the same way. There are five different companies that can be referred to as Genesys. When a quote is requested on an ambiguous company name, the system must figure out which company is intended.

There are a number of possible strategies. One approach is to simply tell callers that there are a number of companies with that name and ask them to use the full company name. This approach has the advantage of efficiency. Another technique is to provide a list of the full names of all the companies and ask callers to repeat the name when they hear the one they want. This approach is less efficient than the first one, but it has the advantage that the caller is told the full names and thus gains greater clarity about what to say.

For the Lexington application we decide on a hybrid approach that maximizes efficiency but ultimately provides a list if necessary. Our approach has three stages (specific prompt wordings for these are worked out later in section 14.2):

If there are two or three companies, say the full names and ask which one is desired (e.g., "Do you want Cisco Systems or Sysco Foods?").
If there are more than three companies, tell callers that there are multiple companies with that name and ask them to say the full name.
If there are more than three companies and if the caller does not provide a successful response to the system's request for the full name (e.g., there is a <reject> or a request for help), then list the full names of the companies. Before presenting the list, ask callers to repeat the name of the company of interest as soon as they hear it. Supporting caller barge-in as soon as they hear the name is effective because it makes the task a recognition problem rather than a recall problem callers respond as soon as they recognize the company rather than try to recall it after hearing the entire list thereby lowering cognitive load. Because of possible timing problems, it works better to have callers repeat the name rather than say, "That one"; by the time the caller says "That one," the system may be listing the next company.

We decide to use just-in-time instruction to teach callers about the watch list feature. From our observational study and from usage statistics of the touchtone system, we know that callers often ask for numerous quotes, so we believe they will find the watch list feature useful. We also want to encourage the use of a watch list, because it personalizes the system to customers' needs and makes the system more sticky.

We want to be careful not to trigger the watch list instruction too often. We set the following criteria:

Do not trigger the watch list instruction on a user's first call to the system. The first-time caller already has a lot to learn.
On the user's second call to the system, if she has not yet set up a watch list and if she has asked for at least two quotes on this call, trigger the watch list instruction.
On the user's fourth call to the system, if he has not yet set up a watch list and if he has asked for at least two quotes on this call, trigger the watch list instruction.

14.1.3 The Trading Subdialog

If a caller specifies a trade while at the main menu, the system transitions to the Trading subdialog. There are two important dialog strategy issues to consider for the Trading subdialog: transitioning between a user-initiative and a system-initiative dialog; and performing careful confirmation before completing any transactions.

At the main menu, callers may provide different amounts of detail about a trade. For example, all the following would be valid requests: "Buy Intel," "Buy a hundred shares of Intel," "Sell all my shares of Intel," "Buy Intel at twenty-eight," "I wanna make a trade," "Sell," and "Buy Intel at the market, good for the day." If the request is incomplete that is, if any information is missing the system must then take over the initiative and query the caller for the missing information.

We decide to query for one item at a time, in a directed fashion, in order to complete the request. For example, if the caller says "Buy Intel," the system will first ask, "How many shares do you want to buy?" and then will ask, "At what price?" We decide to accept complex input even during directed querying for single items; for example, after asking, "How many shares do you want to buy?" our grammar will accept responses such as, "Buy fifty shares at twenty-two point three three." In this way, we are not forcing callers to change their mental model from one that allows them to use complex sentences with multiple information items, even though we have transitioned into a directed prompting style.

Before completing the trade, the system will carefully confirm the details with the caller. We will slow the pace a little and play a prompt such as, "You want to buy two hundred shares of Intel, at the market, good for the day. Is that correct?" We will move ahead with the transaction only if we are highly confident that the caller has confirmed it. Therefore, we use a strict yes/no grammar: one that accepts only "Yes" or "No" and does not accept input such as, "Yeah, that's right," "Okay," and so on. Furthermore, we raise the reject threshold a bit to reduce the chance of false accepts of "Yes." If we do need to go back and correct anything, we use skip lists to make sure we do not make the same mistake twice.

We complete the designs of the Quotes and Trading subdialogs, producing call flow diagrams describing every dialog state.