15.2 Testing


Individual components of the application (e.g., dialog code, integration code) are unit-tested (tested in isolation) after they are developed. The testing we discuss in this section involves the fully integrated system. There are a number of tests to perform before release of the system for pilots or deployment, including application testing, recognition testing, and evaluative usability testing. All these tests are performed with the complete working system.

15.2.1 Application Testing

The application must be tested to make sure it meets the dialog design precisely as specified, does not have critical bugs, and is adequately provisioned to meet the expected call volumes. The tests performed at this stage include the dialog traversal test, the system QA test, and the load test.

Dialog Traversal Test

The purpose of dialog traversal testing is to make sure that the system accurately implements the dialog specification in complete detail. You perform the test with the live system, over the telephone, exercising a test script that thoroughly traverses the dialog. The correct actions must be taken at each step, and the correct prompts must be played.

Every dialog state must be visited during the test. Within each dialog state, every universal and every error condition must be tested. For example, you should try an out-of-grammar utterance to test the behavior in response to a recognition reject. You should try silence to test no-speech timeouts. You should impose multiple successive errors within dialog states to ensure proper behavior. If your design associates global behaviors with error counts for example, automatically transferring callers to an operator if they exceed a fixed number of errors in a single phone call you must also exercise that behavior.

Dialog states that have multiple entry prompts should be tested from each possible entry point. For example, if a state has an alternative prompt if callers enter it after a disconfirmation, the test script should include a path covering the disconfirmation. You should test every transition out of a dialog state to make sure that the system transitions to the correct next dialog state. While traversing the dialog, in addition to noting the correctness of system behavior, the tester should note any prompt (or other audio) recordings that are distorted or have other audio problems.

System QA Test

The system QA test is similar to other integration tests of large software systems. A test suite is executed that exercises all integrations, all conditions, and all failure modes.

Load Test

The purpose of a load test is to make sure that the system can efficiently handle the peak loads expected during the busiest hour of usage. The load test is typically run after the software is installed on the final system in the call center or at the service provider. You can run load tests by having numerous people call in to the system simultaneously. More typically, you use a software system that simulates the load by placing multiple calls to the system.

15.2.2 Recognition Testing

In general, reliable testing of recognition accuracy and tuning of recognition parameters can be accomplished only with a substantial amount of data collected from an in-service system. The purpose of the prepilot recognition test is to make sure that recognition accuracy is in the ballpark roughly what you would expect given the complexity of the recognition problem in each dialog state and that the initial values of the recognition parameters are reasonable.

You will fine-tune recognition parameters and accuracy during pilots and early deployment. However, the recognition test assures you that you will not expose pilot callers to unnecessarily poor performance. It is also important not to waste the early pilot fixing bugs and poorly chosen parameters that should have been better adjusted at installation time. The recognition test also ensures that the results of evaluative usability testing (see the next section) will not be tainted by poor recognition.

The recognition test is based on a relatively small number of callers (10 20). To make sure that the grammars are thoroughly exercised, you give each caller a script.

As an example of the value of the prepilot recognition test, occasionally issues with endpointing parameters will surface. One of the endpointing parameters controls how long, after speech has been detected, the recognizer will listen to a silence before deciding that caller speech has ended. From time to time, that parameter will be too short for certain dialog states. In particular, certain types of caller speech may have natural pause points that lead to longer than usual pauses in the middle of an utterance. For example, North American phone numbers are usually spoken with a pause after the three-digit area code and another pause after the three-digit exchange.

15.2.3 Evaluative Usability Testing

Evaluative usability testing is performed before the system is released for pilot testing. The general approach is the same as outlined for the iterative usability testing in Chapter 8 (in the detailed design phase) except that evaluative usability testing is run with the complete working system.

This approach allows you to detect problems with the pacing of the system. For example, you can find out whether there are delays or latencies that cause callers frustration or confusion. At this time you will evaluate the appropriate success criteria (those that can be measured on usability data) that were specified during requirements definition. This will help to determine whether the system is ready for pilot.



Voice User Interface Design 2004
Voice User Interface Design 2004
ISBN: 321185765
EAN: N/A
Year: 2005
Pages: 117

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net