How to Analyze It | Real-World .NET Applications

Although some things are going to be obvious, a formal analysis is necessary to get to underlying causes and to extract the most value from the interviews. Analyzing the output is a three-stage process: collecting observations, organizing observations, and extracting trends from the observations.

Note

The moderator and analyst are referred to as separate people here, but in practice the two roles are often performed by the same person.

Collecting Observations

There are three sets of observations to be collected: the moderator's, the observers', and the analyst's.

Collecting the moderator's and observers' notes is pretty straightforward. Get their notes (or copies), and have them walk you through them, explaining what each one means. In addition, interview them for additional observations that were not in their notes. These are frequently large-scale perspectives on the situation that the person made in the days or hours following the last test.

The analyst's notes are the most important and time-consuming part of the data collection process. The analyst should go through at least four of the videotapes and note down all situations where there were mistakes or confusion or where the evaluators expressed an opinion about the product or its features. He or she should note which features the evaluators had problems with, under what circumstances they encountered those problems, and provide a detailed description of the problem. The majority of the usability problems in the product will likely be found during this phase, as the patterns in people's behavior and expectations emerge.

Quantitative information, although not generalizable to the whole target market at large, is often useful when summarizing and comparing behavior (however, it's fraught with potential problems as people reading reports can latch on to largely meaningless numbers as some kind of absolute truth). To collect quantitative information, first create a measurement range for each question that everyone in the analysis team agrees upon. Don't use a stopwatch, and take exact numbers. The statistical error present in the small sample of people in a usability test swamps out the accuracy of a stopwatch. The most useful metrics are the ones that are the most general. Flow Interactive, Limited (www.flow-interactive.com), a U.K. user experience design and evaluation consulting company, uses the following range to measure how long people take to perform a task:

0—Fail
1—Succeed very slowly in a roundabout way
2—Succeed a little slowly
3—Succeed quickly

Most of the time, this is all the precision you need since an order-of-magnitude measure is all that's necessary to be able to make critical comparisons. Each scale should have three or five steps (don't use two, four, or six since it's hard to find a middle value; don't use more than five because it tends to get confusing) and a separate value for failure.

Make a grid for each participant consisting of the task metrics you're going to collect. As the videotapes are being watched, note the severity in each cell (when appropriate, define severity using the same language and scale that is used by the development team to define how serious code bugs are). For the fork tasks, the following table would reflect one person's performance.

MARLON'S TASK PERFORMANCE

User: Marlon	Time to Read	Errors	Time to Complete

Find Louis XIV	1	3	1
Buy replacement	3	1	2
Find similar forks	1	2	0

Key	0—Don't read	0—Fail because of errors	0—Fail
	1—Read very slowly	1—Many errors	1—Succeed very slowly in a roundabout way
	2—Read moderately slowly	2—Some errors	1—Succeed very slowly in a roundabout way
	3—Read quickly	3—Few or no errors	2—Succeed a little slowly
			3—Succeed quickly

Then, when compiling the final analysis, create a table for each metric that summarizes the whole user groups' experience. For the completion time metric, the table could look as follows.

TASK PERFORMANCE TIME MEASURES

	Marlon	Eva	Marc	Barb	Jon	Avg.

Find Louis XIV	1	2	1	0	2	1.2
Buy replacement	2	3	2	1	1	1.8
Find similar forks	0	0	1	1	0	0.4

The average numbers, although not meaningful in an absolute context, provide a way to compare tasks to each other and between designs.

Note down feature requests and verbatim quotations from the evaluators, especially ones that encapsulate a particular behavior ("I don't understand what "Forkopolis" means, so I wouldn't click there," for example). Feature requests are often attempts to articulate a problem that the evaluator can't express in any other way. However, they can also be innovative solutions to those same problems, so they should be captured, regardless.

2x Video Decks Are Cool

To make the video review process go faster, I recommend using a video deck (or a digital video player) that can play back video and audio at 1.5 or 2 times natural speed. The speech is still understandable (although silly, since people sound like chipmunks unless the voice is pitch-shifted down, as it is done on Sony's professional-grade video hardware), and it's possible to make your way through a tape much faster.

If time and budget allow, a transcription of the whole session is helpful, but it should be used only as an aid in observing the tapes because it misses the vocal inflection and behavior that can really clarify some situations. For example, a confused pause of five seconds while an evaluator passes his pointer over every single visual element on the screen looking for somewhere to click is insufficiently conveyed by his statement of "Aha! There it is."

Organizing Observations

First, read through all the notes once to get a feeling for the material. Look for repetition and things that may be caused by common underlying problems.

Note

Much as with analyzing contextual inquiry information or focus group observations, organizing usability testing information and extracting trends can be done in a group with the development team (and other stakeholders, as appropriate). This allows the group to use its collected knowledge to flesh out the understanding of the problem and to begin working on solutions.

Then put all the observations into a pile (literally, or in a single large document). Opening a separate document in a word processor, go through each observation and group it with other similar observations in the new document. Similarity can be in terms of superficial similarity ("Term not understood"), feature cluster ("Shopping cart problems"), or in terms of underlying cause ("Confusing information architecture"). Group the observations with the most broadly sweeping, underlying causes. Pull quotations out and group them with the causes that they best illustrate.

Extracting Trends

Having grouped all the observations, go through the groups and consolidate them, separating the groups of unrelated topics. Throw away those that only have one or two individual observations. For each group, try to categorize the problem in a single short sentence, with a couple of sentences to fully describe the phenomenon. Explain the underlying cause as much as possible, separating the explanation of the phenomenon from your hypothesis of its cause. Concentrate on describing the problem, its immediate impact on the user experience, and the place where the problem occurred. Be very careful when suggesting solutions. Ultimately, the development team knows more about the technology and the assumptions that went into the product, and the responsibility for isolating underlying causes and finding solutions is theirs. Your recommendations should serve as a guide of where solutions could be found, not edicts about what must be done.

Warning

It's easy to confuse making user severity measures into priorities for the development of the project. This is generally inappropriate. What's most important to a user's success with the product is not necessarily what's most important to the product's success. The product team should be informed of problem severity from the user perspective and then use that to determine project priorities, but the two aren't the same.

Describe the severity of the problem from the user's perspective, but don't give observations numerical severity grades. If a shorthand for the characterization of observations is desired or requested, categorize the observations in terms of the effects they have on the user experience, rather than giving them an arbitrary severity. Such a scale could be "Prevents an activity," "Causes confusion," "Does not match expectations," "Seen as unnecessary."

Once all this is done, you should have a list of observations, hypotheses for what caused the phenomena, and quotations that reinforce and summarize the observations. You're ready to present your results to the team! Effective presentations are covered in Chapter 17.