14.3 User Testing


To prepare for the WOZ test, we start by establishing the goals. First, as described earlier, we want to test the main paths of the design to catch problems early. We also identify two high-risk dialog strategies we want to be sure to test: company name disambiguation and recovery from recognition errors when users specify trades. Additionally, we want feedback on the persona design.

We record a set of preliminary prompts covering the dialogs we have designed, as well as a set of company names that will be used in the test. Although audio production which includes choosing a voice actor, recording, postprocessing prompt files, and creation of NVA is considered part of the development phase of the project, it often begins early.

Choosing the voice actor early gives us an opportunity to get early feedback about the persona design. For the Lexington project, it is important to begin the recording process early to avoid delays, because we will ultimately need audio files to cover approximately 15,000 names of companies, mutual funds, and indexes (handling the NASDAQ, New York Exchange, and American Exchange). The recording of company names can begin early, before final wording for other prompts is determined. For this project, we have already chosen the voice actor, and therefore we can create the WOZ recordings with his voice (the details of choosing the voice actor are covered in Chapter 18).

The designer will act as wizard, using a tool that allows control of the call flow and rapid selection of prompts to play. We also design a number of task scenarios for participants to perform, including logging in, getting stock quotes, and making trades. We make sure to include some companies with ambiguous names in the test of the Quotes subdialog. We also intentionally introduce some "recognition errors" in order to test the confirmation and correction approaches in the Trading subdialog.

We formulate a questionnaire for each participant to complete after talking with the system. In this way, we can measure subjective preferences consistently across participants and get an average score that we can use as a baseline for future usability tests. Our survey includes all the relevant questions in the standard questionnaire Lexington has used to assess its touchtone system. Figure 14-3 shows some of the questions we included in the survey.

Figure 14-3. You can use a questionnaire like this one in early user testing.

graphics/14fig03.gif

Questions about ease of use, efficiency, and flexibility are included because these are important metrics we chose in the requirements phase. We omit questions about accuracy until a later usability test, when callers are interacting with the real speech system. In this test, given that we are purposely introducing recognition errors, measures of callers' perceptions of accuracy would be meaningless.

In addition to the questions shown in Figure 14-3, we include a few questions about the persona. Two are open-ended: "What did you like or dislike about the voice?" and "Please describe the qualities that the voice conveys, as if the voice were a person." A few others are Likert-scaled, to be rated from "strongly disagree" to "strongly agree": "The character behind the voice seemed knowledgeable," " . . . energetic," and so on.

Next, we work with Lexington to get the names of customers who might be willing to participate in the test. The plan is to run the test over the phone; we will call the participants at home. We identify 12 participants.

We ask our usability engineer to run the study because she has not been working directly on the project and will remain an objective evaluator. We get everything scheduled. Before the first day of testing, we run some pilot tests with a few people internally to make sure the WOZ system is functioning properly and also to give our wizard some practice in responding to callers' utterances.

In the course of two days, we run all the test participants. We learn a number of things that help us improve some prompt wording and error messages. The participants complete most of the tasks they are assigned, and they find the system easy to use. They respond positively to the persona.

However, the subjective ratings on question 2 ("The system was quick and efficient") are lower than we had hoped; the average score is 4 (neutral). Upon review of the data, we conclude that most subjects do not take advantage of the mixed-initiative dialog strategy. When placing a trade, most subjects say things such as "Place a trade" at the main menu and then rely on the system to lead them through the steps.

As we review the dialog, it becomes clear that we have not adequately taught users about the advanced capabilities of the system. There are no instructions or examples that encourage them to place trades or get quotes directly from the main menu state. Similarly, users have not been shown that they can describe all the parameters of a trade in a single sentence.

Rather than add elaborate instructions to the beginning of the application, we decide to use additional just-in-time instructions to gradually teach callers about the advanced capabilities. For example, immediately after callers complete a trade step-by-step, we will play the message, "By the way, in the future you can save time by describing your trade in a single sentence. For example, you can say, 'Buy one hundred shares of Cisco at eighteen dollars' or 'Sell all of my shares of Apple at the market.'" A follow-up usability test will be run to verify that the just-in-time instruction approach has worked.

The follow-up usability test will also use a WOZ approach, also conducted over the telephone. Before the follow-up usability test, we complete the designs of the Portfolio and Account Information subdialogs. We create three tasks for each participant. All three tasks will include trading. If they complete their trades step-by-step, they will hear the just-in-time instruction. One of the tasks will include quotes, the second will include accessing portfolio information, and the third will include account information. We use the same questionnaire as in the first usability test.

With the help of Lexington, we line up 10 participants. In the course of two days, we run all the tests. We analyze the data, and the result is positive. The just-in-time instruction approach seems to be working. On the first task, most participants place their trades step-by-step and hear the just-in-time instruction. By the second task, seven out of the ten use more complex and efficient approaches (e.g., "Buy fifty shares of Intel," "I wanna buy fifty shares of Intel at the market"), and by the third task, all of them do. The average score on question 2 (about efficiency) goes up from 4 to 6.2.

After the usability test, we make a few minor changes based on what we have learned, and we complete the design of the system. Our ultimate design is informed by the work we have done thus far in defining call flow, crafting prompts, and testing the design on end users.



Voice User Interface Design 2004
Voice User Interface Design 2004
ISBN: 321185765
EAN: N/A
Year: 2005
Pages: 117

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net