4-2 GOMS Keystroke-Level Model


Human Interface, The: New Directions for Designing Interactive Systems
By Jef Raskin
Table of Contents
Chapter Four.  Quantification

The aim of exact science is to reduce the problems of nature to the determination of quantities by operations with numbers.

?span>James Clerk Maxwell, On Faraday's Lines of Force (1856)

I will introduce only the simplest yet nonetheless valuable aspect of the GOMS method: the keystroke-level model. We designers who know GOMS rarely use a detailed and formal analysis of an interface design, but that is due, in part, to our having absorbed the fundamentals of GOMS and of other quantitative methods such that our designs inherently incorporate GOMS teachings. We do bring formal analysis into play when choosing between two approaches to interface design in which small differences in speed can have significant economic or psychological effects. We can sometimes benefit from the impressive accuracy of the more complete GOMS models, such as critical-path method GOMS (CPM-GOMS) or a version called natural GOMS language (NGOMSL), which takes into account nonexpert behavior, such as learning times. We can, for example, predict how long it will take a user to execute a particular set of interface actions to within an absolute error of less than 5 percent. In these advanced models, almost all predictions fall within 1 standard deviation of the measured times (Gray, John, and Atwood 1993, p. 278). In a field in which religious wars are waged over interface designs and in which gurus often have widely varying opinions, it is advantageous to have in your armamentarium quantitative, experimentally validated, and theoretically sound techniques. For a good overview and bibliography of the various GOMS models, including her own CPM-GOMS model, see John 1995.

4-2-1 Interface Timings

Numerical precision is the very soul of science.

?span>D'Arcy Wentworth Thompson, On Growth and Form (1917)

When they developed the GOMS model, its inventors observed that the time it takes the user-computer system to perform a task is the sum of the times it takes for the system to perform the serial elementary gestures that the task comprises. Although different users might have widely varying times, the researchers found that for many comparative analyses of tasks involving use of a keyboard and a graphical input device, you could use a set of typical times rather than measuring the times of individuals. By means of careful laboratory experiments, they developed a set of timings for different gestures. In giving the timings, I follow the original nomenclature, in which each of the times is designated by a one-letter mnemonic (Card, Moran, and Newell 1983):

K = 0.2 sec

Keying: The time it takes to tap a key on the keyboard

P = 1.1 sec

Pointing: The time it takes a user to point to a position on a display

H = 0.4 sec

Homing: The time it takes a user's hand to move from the keyboard to the GID or from the GID to the keyboard

M = 1.35 sec

Mentally preparing: The time it takes a user to prepare mentally for the next step


Responding: The time a user must wait for a computer to respond to input

In practice, these numbers vary widely; K can be 0.08 sec for a 135-wpm highly skilled typist, 0.2 sec for a more typical 55-wpm skilled typist, 0.28 sec for a 40-wpm average unskilled typist, or 1.2 sec for a beginning typist. Typing speed is not independent of what is being typed: It takes most people about 0.5 sec to type a random letter, given a set of randomly chosen letters to type. Typing messy codes for example, e-mail addresses takes most people about 0.75 sec per character. The value K includes time it takes the user to make corrections that he has caught immediately. Shift is counted as a separate keystroke.

The wide variability of each measure explains why we cannot use this simplified model to obtain absolute timings with any degree of certainty; by using the typical values, however, we usually obtain the correct ranking of the performance times of two interface designs. If you are evaluating complex interfaces that include overlapping time dependencies or if you must generate accurate absolute times, you should use the more complete models, which are not discussed in this book, such as CPM-GOMS.

Double Dysclicksia

The interface technique called double clicking, that is, tapping the GID button twice within a small time window and without any significant cursor movement between the taps, as an interface technique suffers from problems. You cannot always predict what objects on the display will or will not respond to a double click, and it is not always clear what will happen if there is a response. There is no indication on displayable items that double clicking is supposed to produce a response: The functionality is invisible. The way that double clicking is used in many current interfaces, the user must remember not only which items are double clickable but also how different classes of interface features respond to this action.

The first two burdens on the user could be at least partially alleviated by new screen conventions. The act of double clicking is, however, itself problematical. Double clicking requires operating a mouse button twice at the same location or at two locations in very close proximity and, in most cases, within a short time, typically 500 msec. If the user clicks too slowly, the machine responds to two single clicks rather than to one double click. If the user jiggles the mouse excessively between clicks, the same error occurs. If the user taps the GID button twice in too short a time period, as when trying to select text within a word while working within certain word processors, the machine considers the two taps as a double click and selects the whole word.

A problem arises when the user is trying to select a graphical item that can be repositioned with the GID. Because the GID is likely to move when the user is pressing the GID buttons quickly, graphical applications, instead of reading a double click, may read a drag-and-drop and change the item's position. Similarly, to change the text in a text box, the user may find it necessary to reposition the accidentally moved box and to make the text edit originally intended.

Some of us are unaffected by dysclicksia: These lucky people never miss with the mouse; they single and double click with insouciance and panache, do not suffer from side effects of clicking, always remember what will and what will not respond to double clicking, and can shoot a flying bird with a .357-caliber revolver while driving along a twisty mountain road. But we can't assume that all users are so lucky. We must design for the dysclicksic user and remain aware of the problems inherent in using double clicks in an interface.[1]

[1] The term dysclicksia, a disease for which the only permanent cure is good design, was coined by Pam Martin (personal communication 1997).

The duration of the machine response time, R, can have an unexpected effect on user actions. If a user operates a control and nothing appears on the display for more than approximately 250 msec, she is likely to become uneasy, to try again, or to begin to wonder whether the system is failing.

We cannot build products that can complete any operation within human reaction time, but our interfaces can always, within that time, give feedback that the input has been received and recognized. Otherwise, user actions often flailing at the keyboard, trying to get a response during a delay can start the system off on unintended activities, causing further delay or damaging the user's content. For example, if you try to download a file while accessing America Online from a browser, such as Netscape's, there is often a long delay. No feedback lets you know that progress is being made; a small, static message far from the locus of attention says only that the computer is awaiting reply. After a few seconds, inexperienced users start clicking at buttons on the display, which stops the download again without feedback.

It is important that interfaces provide feedback if delays are unavoidable; display a progress bar (Figure 4.1) that accurately reflects the time remaining. If you cannot predict how much time an operation will take, say so! Do not lie to or misinform users.

Figure 4.1. A progress bar. It is important that it represent time linearly. A textual statement of time remaining, if accurate, is also a humane feature when delays are unavoidable.


4-2-2 GOMS Calculations

We begin the calculation of the time it takes to perform a method, such as "move your hand from the graphical input device to the keyboard and type a letter," by listing the operations from the GOMS list of gestures (see Section 4-2-1) used in this method, in this case H K. Listing the gestures (K, P, and H) is the easy part of creating an instance of GOMS models. The more difficult part of developing an instance of a keystroke-level GOMS model is figuring out at what points the user will stop to perform an unconscious mental operation: the mental preparation (M) times. The basic rules following the methods of Card, Moran, and Newell 1983, p. 265 for deciding where mental operations occur in a method are presented in Table 4.1. In Section 4-2-3, we look at how these rules are applied in practice.

Table 4.1. Heuristics for Placing Mental Operators

Rule 0 Initial insertion of candidate Ms

Insert Ms in front of all Ks (keystrokes). Place Ms in front of all Ps (acts of pointing with the GID) that select commands, but do not place Ms in front of any Ps that point to arguments of those commands.

Rule 1 Deletion of anticipated Ms

If an operator following an M is fully anticipated in an operator just previous to that M, then delete that M. For example, if you move the GID with the intent of tapping the GID button when you reach the target of your GID move, then you delete, by this rule, the M you inserted as a consequence of rule 0. In this case, P M K becomes P K.

Rule 2 Deletion of Ms within cognitive units

If a string of M Ks belongs to a cognitive unit, then delete all the Ms but the first. A cognitive unit is a contiguous sequence of typed characters that form a command name or that is required as an argument to a command. For example, Y, move, Helen of Troy, or 4564.23 can be examples of cognitive units.

Rule 3 Deletion of Ms before consecutive terminators

If a K is a redundant delimiter at the end of a cognitive unit, such as the delimiter of a command immediately following the delimiter of its argument, then delete the M in front of it.

Rule 4 Deletion of Ms that are terminators of commands

If a K is a delimiter that follows a constant string for example, a command name or any typed entity that is the same every time that you use it then delete the M in front of it. (Adding the delimiter will have become habitual, and thus the delimiter will have become part of the string and not require a separate M.) But if the K is a delimiter for an argument string or any string that can vary, then keep the M in front of it.

Rule 5 Deletion of overlapped Ms

Do not count any portion of an M that overlaps an R a delay, with the user waiting for a response from the computer.

In these rules, a string is a sequence of characters. A delimiter is a character that marks the beginning or the end of a meaningful string of text, such as a natural-language word or a telephone number. For example, spaces are the delimiters for most words; a period is the most common delimiter at the end of a sentence; parentheses delimit parenthetical remarks; and so on. The operators are K, P, and H. When a command needs information, such as when you use the command that sets the time for an alarm to go off and have to supply the time, the information you supply is an argument for that command.

4-2-3 GOMS Calculation Examples

An interface design usually begins with a task or a set of tasks that need to be accomplished. A statement of the task and the means available for implementing a solution are often formulated as a requirement or specification. In this example, the user is personified as Hal, a laboratory assistant.


Hal works at a computer, typing reports; he is occasionally interrupted by one or another of the researchers in the room, and is asked to convert a temperature reading from degrees Fahrenheit (F) or Celsius (C) to degrees C or F, respectively. For example, Hal might be asked, "Please convert 302.25 degrees from Fahrenheit to Celsius." Hal must use the keyboard or GID to enter the temperature provided; voice or other input means are not available. Conversions from C to F and from F to C are approximately equally likely to be required. About 25 percent of the temperatures called out are negative, although the digits are unpredictable and equally distributed, and only 10 percent of the temperatures have integer values, such as 37 degrees. The numerical result must appear on the display; no other output means are available. Hal reads to the researcher the converted value from the screen. The input and the output must allow for at least ten digits on each side of the decimal point.

In designing an interface for a system that allows Hal to do his job, your goal is to minimize the time it takes Hal to do the conversion. Speed and accuracy must be maximized; screen real estate is not limited. The window, or area of the display in which the temperature conversion takes place, is already active and waiting for Hal's input via GID or keyboard. The way Hal interacts with the interface to return to his typing on the computer is not your concern; your job is finished as soon as the result is displayed.

In estimating the time it takes Hal to use the interface, assume an average of four typed characters in an entered temperature, including any decimal point and sign. Also assume unrealistically, but for simplicity's sake that Hal's typing is perfect; error detection and notification are not needed.

Now, I would like you to stop reading so that you can design an interface for this simple example. It will not take long to write down your proposed solution, along with sketches of the display that Hal will see; do not just think about this problem but rather write about it as well. (You will be tempted to read on without honoring my request. Please reconsider. The next few sections will make much more interesting reading if you have already tried to solve the problem yourself.) After designing your interface, read the two GOMS analyses that follow. Then you will be ready to analyze your own interface.

4-2-3-1 Hal's Interface: Solution 1, Dialog Box

The instructions in Figure 4.2 are reasonably clear; from them we can write down the method that Hal must use in terms of the gestures of the GOMS model. The GOMS representation is shown growing incrementally as each new gesture is added to the method.

  • Move hand to the graphical input device:


  • Point to the desired radio button:

    H P

  • Click on the radio button:

    H P K

Figure 4.2. A dialog box solution with radio buttons.


Half of the time, the interface will already have the correct conversion chosen, and Hal will not need to click on the radio button. We consider first the case in which it is not the one already chosen.

  • Move hands back to the keyboard:

    H P K H

  • Type the four characters:

    H P K H K K K K

  • Tap Enter:

    H P K H K K K K K

The keystroke for the tap of the Enter key completes the method portion of the analysis. Using rule 0, we add Ms in front of all of the Ks and Ps except those Ps that point to arguments, of which there are none in this example:


Rule 1 tells us to change P M K to P K and to eliminate any other fully anticipated Ms, of which there are none in this example. Rule 2 eliminates Ms in the middle of strings, such as in the string that represents the temperature. Applying these two rules leaves


The M before the final K is required by rule 4. Rules 3 and 5 do not apply in this example.

The next step is to add the times represented by the letters. (Recall that K = 0.2, P = 1.1, H = 0.4, and M = 1.35):

H + M + P + K + H + M + K + K + K + K + M + K =

0.4 + 1.35 + 1.1 + 0.2 + 0.4 + 1.35 + 4 * (0.2) + 1.35 + 0.2 = 7.15 seconds

In the case in which the correct conversion is already selected, the method is


M + K + K + K + K + M + K = 3.7 sec

By the requirements document, these two cases are equally likely. Thus, the average time it will take Hal to use this interface for one conversion task will be (7.15 + 3.7) / 2 5.4 seconds. But, because the two methods that Hal has to use are different, it will be difficult for him to operate this interface automatically. One of the open problems in the quantitative analysis of interfaces is how to estimate error rates from a given interface design.

Next, we explore a graphical interface that makes extensive use of a familiar metaphor.

4-2-3-2 Hal's Interface: Solution 2, GUI

The interface shown in Figure 4.3 uses realistic representations of thermometers to indicate temperature. Hal can lower or raise the pointer on each thermometer in Figure 4.3 by using the drag method with the GID. Hal indicates which conversion he wants by moving the arrow on either the Celsius or the Fahrenheit thermometer. He does not type any characters; he simply selects the temperature on the input thermometer. As he moves one of the pointers, the pointer on the other thermometer moves to the corresponding temperature. To set the required precision, Hal expands and contracts the scales; he can also change the range. When Hal changes the scale or the range on one thermometer, those on the other thermometer change automatically to cover approximately the same set of temperatures. Numerical readouts are provided on the movable arrow. The temperature is indicated both numerically and with a bar, so Hal can use either the graphical or the character-based representations of the data to accommodate his learning style or personal preferences. The Auto-Med feature changes the ranges such that they are centered on 37 degrees Celsius and 98.6 degrees Fahrenheit, in case someone in the lab is working with human body temperatures; this feature is designed to save time.

Figure 4.3. A GUI for Hal's interface.


Clicking on Expand Scales or Compress Scales increases or decreases by a factor of 10 the values at tick marks on the vertical thermometers. To get quickly to a far-distant temperature, Hal expands the scale and scrolls up or down until the desired range is in view, puts the arrow near the desired temperature, and then compresses the scale, adjusting the arrow if necessary, until the desired precision is attained.

A GOMS keystroke-level analysis of this graphical interface is complex because the method Hal uses depends on where the converter is presently set and what range and precision Hal needs. We look first at the fastest case, in which the range and the precision of the C or the F thermometer happen to be already set as Hal wants them to be. This analysis will give us the minimum time needed to use this interface.

  • Write down the gestures Hal uses as he moves his hand to the GID and clicks and holds down the GID button on the desired arrow:

    H P K

  • Continue listing gestures as Hal moves the arrow until it points to the correct value and then releases the GID button:

    H P K P K

  • Place Ms according to rule 0:

    H M P M K M P M K

  • Eliminate two Ms according to rule 1:

    H M P K M P K

There are no cognitive units, no consecutive terminators, and no other reasons to apply rules 2 through 5. We find the total time by adding the times for each gesture:

H + M + P + K + K

0.4 + 1.35 + 1.1 + 0.2 + 1.35 + 1.1 + 0.2 = 5.7 seconds

This calculation applies to the lucky case in which the input thermometer was preset to the appropriate range and resolution. Now consider the case in which Hal wants to expand the scale factor so that he can see the desired temperature, change the range, compress the scale factor to get adequate resolution, and then move the arrow. I will write down the method Hal uses, without going through a step-by-step derivation. (I assume that Hal is a perfect user and does not have to juggle back and forth to find the right places on the thermometer.) Hal has to use the arrows to scroll several times. Each scrolling operation may require several gestures; the computer then animates the scrolling operation, which takes time. To estimate scrolling times for the analysis, I built a similar interface and measured scrolling times, which were all 3 seconds or longer. Using S to represent the scrolling times, we can write the sequence of gestures that Hal uses as follows:


Using the rules to place Ms, we get

H + 3(M + P + K + S + K) + M + P + K + K

0.4 + 3 * (1.35 + 1.1 + 0.2 + 3.0 + 0.2) + 1.35 + 1.1 + 0.2 + 0.2 = 20.8 seconds

Except for the rare case in which the thermometer scales are correctly set at the beginning of the problem, a perfect user will need more than 16 seconds to accomplish a temperature conversion using this method. A real imperfect user would jog the scales and the arrows back and forth and thus take even longer.


    The Humane Interface. New Directions for Designing Interactive Systems ACM Press Series
    The Humane Interface. New Directions for Designing Interactive Systems ACM Press Series
    ISBN: 1591403723
    EAN: N/A
    Year: 2000
    Pages: 54

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net