Chapter 17: Improving Data Entry | About Face 2.0(c) The Essentials of Interaction Design

In Chapter 14, we discussed the need for considerate software that knows when to bend the rules. One of the ways in which software is least capable is in regard to how it handles data entry. This is an artifact of the history of software development—in particular, the development of database software. In this chapter, we'll discuss the problems with existing ways of dealing with data entry and some possible ways to make this process more focused on human needs and less focused on the needs of the database.

Data Integrity versus Data Immunity

The development imperative regarding data entry and data processing is simple: Never let tainted, unclean data get into the software. The programmer erects barriers in the user interface so that bad data can never enter the system. This pure internal state is commonly called data integrity.

The imperative of data integrity posits that there is a world of chaotic information out there, and before any of it gets inside the computer it must be filtered and cleaned up. The software must maintain a vigilant watch for bad data, like customs officials at a border crossing (see Figure 17-1). All data is made valid at its point of entry. Anything on the outside is assumed to be suspect, and after it has run the gauntlet and been allowed inside, it is assumed to be pristine. The advantage is that once inside the database, the code doesn't have to bother with successive, repetitive checks on the validity or appropriateness of the data.

click to expand
Figure 17-1: Underneath the rhetoric of data integrity—an objective imperative of protecting the user and computer with sanctified data—there is a disturbing subtext: that humans are ill-intentioned screw-ups and that users will, given the chance, enter the most bizarre garbage possible in a deliberate attempt to bring the system to its knees. This is not true. Users will inadvertently enter erroneous data, but that is different from implying that they do it intentionally. Users are very sensitive to subtext; they will quickly perceive that the program doesn't trust them. Data integrity not only hampers the system from serving the user for the dubious benefit of easing the programmer's burden, but it also offends the user with its high-handed attitude. It's another case of requiring users to adapt to the needs of the computer, rather than the computer meeting the needs of users.

The disadvantage of this method is simple: It places the needs of the database before that of the user, subjecting him to the equivalent of a shakedown every time he enters a scrap of data into the computer. Note that this isn't a problem with most personal software: PowerPoint doesn't know or care if you've formatted your presentation correctly. But as soon as you deal with a large corporation, whether you are a clerk performing data entry for an enterprise management system, or a Web surfer buying DVDs online, you come face to face with the border patrol.

Humans, especially those filling out lots of forms every day as part of their job, know that data isn't provided to them in the pristine form that their software demands. They know that their information is incomplete, and sometimes wrong. They know that sometimes they need to expedite processing to make their customers happy. But when confronted with a system that is entirely inflexible in such matters, data processors must either grind to a halt or find some way to subvert the system to get things done. If, however, the software recognized these facts of human existence and allowed for them in its interface, everyone would benefit.

Make no mistake: When our software shakes down data at the point of entry, when it strip-searches the user to assure that he isn't carrying any contraband into the high-security depths of the computer, it makes a very clear statement. It tells us that the user is insignificant and that the program is omnipotent—that the user works for the good of the program and not vice versa. This is not the impression that we want to give. We want the user to feel in charge; to feel that the program works for him; that the program is doing the work while the user makes the decisions.

Happily, there's more than one way to protect software from bad data. Instead of keeping it out of the system, the programmer needs to make the system immune to inconsistencies and gaps in the information. This method involves writing much smarter, more sophisticated code that can robustly handle all permutations of data, giving the program a kind of data immunity.

Data immunity

To implement data immunity, our programs must be trained to look before they leap, and they must be trained to ask for help. Most software blindly performs arithmetic on numbers without actually examining them first. The program assumes that a number field must contain a number—data integrity tells it so. If the user entered the word "nine" instead of the number "9", the program would croak, but a human reading the form wouldn't even blink. If the program simply looked at the data before it acted, it would see that a simple math function wouldn't do the trick.

We must train our programs to believe that the user will enter what he means to enter, and if the user wants to correct things, he will do so without our paranoid insistence. But the program can look elsewhere in the computer for assistance. Is there a module that knows how to make numeric sense of alphabetic text? Is there a history of corrections that might shed some light on the user's intent?

If all else fails, the program must add annotations to the data so that when—and if—the user comes to examine the problem, he finds accurate and complete notes that describe what happened and what steps the program took.

Yes, if users enter "asdf" instead of "9.38" the program won't be able to arrive at satisfactory results. But stopping the program to resolve this right now is not a satisfactory process either; the entry process is just as important as the end report. If the user interface is designed correctly, the program provides visual feedback when the user enters "asdf", so the likelihood of the user entering hundreds of bad records is very low. Generally, users only act stupidly when programs treat them stupidly.

Most often, the incorrect data that the user enters is still reasonable for the situation. If the program expects a two-letter state code, the user may enter "TZ" by accident. However, if that same user enters "Dallas" for the city name, it doesn't take a lot of intelligence to figure out the problem. Fixing missing postal codes won't tax our modern, powerful computers. In the rare cases where a postal code locator program might fail, most humans would likely fail, too.

What about lost data?

It is clearly counter to everyone's wishes if information is lost. The data entry clerk who fails to key in the invoice amount and then discards the invoice creates a real problem. But is it really the righteous duty of the program to stop the user and point out this failure? No, it isn't. You have to consider the situation. If the application is a desktop productivity program, the user is interacting with it, and the results of his error will likely become apparent. In any case, the user will be driving the program like a car, and he won't take kindly to having the steering wheel lock up because the Chevy discovered it was low on windshield-washer fluid.

On the other hand, let's say the user is a full-time data-entry clerk keying forms into a corporate data-entry program. Our clerk does this one job for a living, and he has spent hundreds—maybe thousands—of hours using the program. He has a sixth sense for what is happening on the screen and knows at a glance whether he has entered bad data, particularly if the program is using subtle, modeless visual and audible cues to keep him informed of the status of the data.

The program is also helping out: Data items, like part numbers that must be valid, aren't going to be typed in, but are entered through List views or other bounded controls. Addresses and phone numbers are entered more naturally into smart fields that can help parse the data. The program gives the user frequent positive feedback, so the program begins to act as a partner, helping him stay aware of the status of his work. So, how serious is the loss of data?

In a data-entry situation, a missing field can be serious, but the field is usually entered incorrectly rather than just omitted. The program can easily help the clerk detect the problem and change it to a valid entry without stopping the proceedings. If the clerk is determined to omit necessary fields, the problem is the clerk and not the program. The percentage of clerks who fail either because of lack of ability or sociopathic tendencies is likely quite low. It isn't the job of the data-entry program to treat all data entry clerks as though they can't be trusted to do a simple job just because one out of a hundred can't.

Most of our information processing systems are tolerant of missing information. A missing name, code, number, or price can almost always be reconstructed from other data in the record. If not, the data can always be reconstructed by asking the various parties involved in the transaction. The cost is high, but not as high as the cost of technical help centers, for example. Our information processing systems can work just fine with missing data. The programmers who write these systems just don't like all the extra work involved in dealing with missing data, so they invoke data integrity as an unbreakable, deified law. Thousands of clerks must, therefore, interact with rigid fascist-ware to keep databases from crashing—not to prevent their business from failing.

It is obviously counter-productive to treat all your workers like idiots to protect against those few who are. It lowers everyone's productivity, encourages rapid, expensive, and error-causing turnover, and it decreases morale, which increases the unintentional error rate of the clerks who want to do well. It is a self-fulfilling prophecy to assume that your information workers are untrustworthy.

The stereotypical role of the data-entry clerk mindlessly keypunching from stacks of paper forms while sitting in a boiler room among hundreds of identical clerks doing identical jobs is rapidly evaporating. The task of data entry is becoming less a mass-production job and more of a productivity job: a job performed by intelligent, capable professionals and, with the advent of e-commerce, directly by customers. In other words, the population interacting with data-entry software is increasingly less tolerant of being treated like unambitious, uneducated, unintelligent peons. Users won't tolerate stupid software that insults them, not when they can push a button and surf for another few seconds until they find another vendor who presents an interface that treats them with respect.

Data entry and fudgeability

When entry systems work to keep bad data out of the system, they almost never allow the user to fudge. There is no way to make marginal comments or to add an annotation next to a field. For example, a vitally necessary item of data may be missing, an interest rate, say. If the system won't allow the transaction to be entered without a valid interest rate, it stops the company from doing business. What if the interest rate field on the loan application had a penciled note next to it, initialed by the bank president, that said: "Prime plus three the day the cash is delivered"? The system, working hard to maintain perfection, fails the reality test.

If an automated data processing system is too rigid, it won't model the real world. A system that rejects reality is not helpful, even if all its fields are valid. In this case, you must ask yourself the question: "Which is more important, the database or the business it is trying to support?" The people who manage the database and create the data-entry programs that feed it are serving only the CPU. It is a significant conflict of interest that only interaction design, knowledgeable in but detached from development, can resolve.

Fudgeability can be difficult to build into a computer system because it demands a considerably more capable interface. The clerk cannot move a document to the top of the queue unless the queue, the document, and its position in the queue can be easily seen. The tools for pulling a document out of the electronic stack and placing it on the top must also be present and obvious in their functions. Fudgeability also requires facilities to hold records in suspense, but an undo facility has similar requirements. A more significant problem is that fudging admits the potential for abuse.

The saving grace to avoid abuse is that the computer also has the power to easily track all the user's actions, recording them in detail for any outside observer. The principle is a simple one: Let the user do whatever he wants, but keep very detailed records of those actions so that full accountability is easy.