3.2 Intelligent Input Interfaces

3.2.1 Introduction

With the miniaturization of devices, traditional point-and-click interfaces will no longer suffice. We will interact with devices the way we interact with each otherthrough speech, eye movement, gesture, and touch. To make these interfaces useful, they need to accommodate such sensory channels in parallel. Perceptual user interfaces will provide unobtrusive (for example, vision-based) sensing and mimic aspects of human communication, reacting to our identity, posture , gesture, gaze, and even mood and intent.

Humans can already process input on multiple channels much the way we'd hope computers could do. Brain activity and eye movement have been used successfully to enable paralyzed people to use computers. The Air Force uses gaze input as an alternative for fighter pilots to control their cockpits if temporarily paralyzed in high G-force dives.

Interaction methods hampered by intensive processing requirements and noisy signals will be overcome by high-speed Internet appliance processing capabilities. For example, researchers have struggled for years to remove environmental noise from speech signals. With advancements in wireless microphones and the new device capabilities, speech is more realistic. Speech alone will not be reliable as an input means, but combined with an alternative like gaze input, it will enable us to interact with the smaller interfaces. These multi-modal interfaces, capitalizing on multiple senses, will become the norm.

These new intelligent interfaces will make me-centric computing successful, because the devices can adapt to the needs of the user. The user needs to perceive the interfaces as invisible and the service as useful. Although many people do not realize it, using a PC means learning to interact in a totally new way. You have to learn how to use a mouse, a GUI, and cryptic key shortcuts, which interrupt the task you are trying to perform. New devices such as mobile phones or PDAs have no better interface; they often just use a subset of the user interface of a computer, making it even more difficult to interact, as the instructions on screen are even more cryptic. The best-known example for cryptic commands may be a car stereo's clock. In many cases, you have to remember a complex key combination to change the time. The same applies to VCRs, where many people still do not know how to record a program in advance or do anything other than simply play a tape. Partly the interface design is a problem, and partly it is a problem of loose integration into the television system.

In a me-centric computing world, manuals should become obsolete. Using these devices should be intuitive and natural. A secretary, concierge , or restaurant wait-person does not require a user manual to delegate tasks and have desires addressed. If you look at simple tools, such as a screwdriver or a hammer , nobody expects a hundred-page manual for these devices, and this means that me-centric devices need to become as simple to use as these. Inside the devices, complex algorithms may be in place, but for the person using the device, it should not matter.

3.2.2 Input Purposes

In order to decide which input technology is the most appropriate, it is important to understand what type of input is required from the user. Therefore, it is important to design the service first, and then the appropriate device. Many high-tech devices have failed because they were impractical , even though they may have been technically highly advanced.

One aspect of this paradigm change is that a specific device will be used for a specific task only. The personal computer was more like a Swiss army knife that could be used for many different tasks. It handled many different tasks, but was not perfect at any of them. With the introduction of smart devices, this changes dramatically. These devices will be able to do only one thing, but in a highly supportive and effective way.

There are many different types of input purposes and associated services. Just imagine a user who wants to order a pizza through his mobile device. All he needs to do is to select the pizza, and the rest, such as delivery address and next pizza service, should be handled by the device. A driver in a car needs to have continuous control over the car and therefore a totally different type of input device. On the other hand, a chemist trying to create a new formula will need to enter lots of parameters into his device.

For the pizza selection, a simple selection-ball could be used together with a small display allowing the user to scroll up and down and press the ball for the selected pizza.

A steering wheel has been used to control a car for the past hundred years, but more and more automobile manufacturers are looking into alternative control elements, such as joysticks or even automatic steering to make life easier and safer for the driver. The chemist, on the other hand, could work with traditional desktop PC to enter all the complex data, as this may still be the best solution. If he does not have his hands free due to some other tasks, a head-up display with voice recognition could work as well.

With input devices, six basic tasks can be achieved:

Positioning The user chooses a point for an object to be placed in an n-dimensional space.
Selection The user chooses from a set of items.
Orientation The user chooses a direction in a two or higher dimensional space. This is used to rotate a symbol on the screen or to control the operation of a robot arm.
Path The user rapidly performs a series of position and orient operations. The path may be realized as a curving line in a drawing program, such as the route on a map.
Quantification The user specifies a numeric value. It is a one-dimensional selection of integer or real values to set parameters, such as the page number of a document or the size of a drawing.
Text The user enters, moves, and edits text in a 2D space.

These pure input types can be combined to produce the impression of a wide variety of different types of "inputs." For example, in PTC's ^[5] CAD product, one generates solid shapes by applying a wide variety of transforms to chosen objects for a specific period of time or until a particular condition is satisfied. This gives you the impression that you are inputting transformations and control parameters for transforming processes. But, in fact, these are actually being provided by a combination of selection, positioning, orientation, path, and quantification inputs.

^[5] http://www.ptc.com/

3.2.3 Input Technologies

To communicate with computers, different technologies have been established to enable the input of information and commands and the selection of data. The following gives an overview on the most commonly used technologies. The role of interfaces is to communicate the system image to the user. They should teach users about the system and help users to develop skills and achieve their goals. Interfaces should map intent onto results and enable tasks to be performed. They should also provide direct access to system functionality.

Keyboard

Since punch-cards were abolished, keyboards have been the most popular input device for computing devices. Through the keyboard, it is possible to enter commands into a command line, as with a UNIX shell, or to navigate through menus , as with Windows using the Alt key. Many systems are keyboard-driven, such as the terminals at airports that are used for check-in. In other places, such as supermarkets, keyboards have been replaced with scanners , because it is faster and less error-prone to scan the prices instead of typing them.

Mice, Trackballs, and Pens

Mice became increasingly popular in the late 1980s. Operating systems such as Amiga OS, Atari OS, and Mac OS were the first that introduced this concept to a broad audience. Windows followed later, and today it is almost impossible to think of a computer without a mouse or similar device such as a trackball or a pen.

The most common mouse used today is opto-electronic. Its ball is steel for weight and rubber-coated for grip, and as it rotates, it drives two rollers, one each for x and y displacement. A third spring-loaded roller holds the ball in place against the other two.

The mouse can be used to point and click commands or select files, directories, services, or data. It can be used to select areas of an image, for example using rubber- band lines, and it can be used to drag and drop objects from one place to the other. A trackball provides the same features and similar paradigms to operate a system. Pen-based technologies can be used in the same way as a mouse or a trackball, but can provide additional functions, such as handwriting or gestures, which are difficult to achieve with a normal mouse.

Speech

Most people would feel more comfortable if they could communicate with their computing devices via speech. Actually, many do this already today. If something goes wrong, they start shouting at their PC, unfortunately with no response from the system. Speech recognition has been researched for many years. Voice recognition usually refers to systems that identify users by the characteristics of their voices.

Actually, a toy company had its first product decades before major research in the area was even considered . "Radio Rex" was a celluloid dog that responded to its name . Lacking the computational ability that powers recognition devices today, Radio Rex was a simple electromechanical device.

The dog was held within its house by an electromagnet. As current flowed through a circuit bridge, the magnet was energized. The bridge was sensitive to 500 cps of acoustic energy. The energy of the vowel sound of the word "Rex" caused the bridge to vibrate, breaking the electrical circuit, and allowing a spring to push Rex out of his house.

This toy was invented in the 1920s. If you look at speech recognition today, some may wonder if any advances have been made at all. Just imagine the computerized telephone hot desks, where you have to state your name and the purpose of your call, before you can get through to a human. In some cases you have to type numbers , but some systems use "highly advanced" speech recognition systems that still respond to 50 percent of all input with "I do not understand."

In the early 1990s, many thought that all problems with speech recognition were solved . The reason for such excitement was that the technical problems that had bottled up the technology in labs for decades were finally being addressed. The most basic issue had been associating specific vocalizations with specific phonemes. Phonemes are the basic units of speech. By recognizing phonemes instead of words, the complexity was largely reduced for these systems.

Making the associations required compiling huge databases of how the more than 40 English-language phonemes are spoken by those of different ages, genders, and linguistic cultures, and under different phone-line conditions. Developers then had to write programs that could find the degree of fit between a given user's vocalization and one of those samples.

It turned out, however, that speech recognition is only partly a technical problem in phonology. Recognition implies a conversation, and conversations make sense only in the context of relationships. When humans enter relationships, they immediately impose a structure of assumptions and expectations. Is the person smart? Knowledgeable? Nice? Lazy? Snobbish? That structure controls the interaction. If a comprehension problem comes up during a conversation with a smart person, we assume we are at fault and take on the responsibility of working it out. We do the same if we think our respondent is not too bright but basically nice. On the other hand, if we think the other party is lazy, doesn't care, or ( worse ) is trying to manipulate us, we behave very differently.

Those relationship issues are just as important when talking with machines as with people; even more so, since most users were and are uncertain about how to talk to software. To show you that speech recognition is not perfect yet, consider two examples: an automated call-center of a retailer and a navigation system in cars .

"Wood order blue kirk!" That's how a call center equipped with cheap speech-recognition technology might interpret a customer's request "I would like to order a blue skirt." That's because many systems can understand only precise, clear syntax that bears little resemblance to the way most people speak. Computers in cars that are used for navigation, for example, can also be controlled via voice in order to reduce the distraction for the driver.

Most speech recognition systems currently claim accuracies of 90 to 95 percent, but some say that such claims are averages, which hold true in a car at 30 mph but not at higher speeds. At 70 mph, for example, some engineers say that the accuracy figure dips to about 70 percent. Even if you have 90 percent accuracy, one out of every ten phone digits that you dictate is going to be wrong. A human ear can easily distinguish between these different audio sources, but most speech recognition systems have problems with it. Therefore, few applications have really been successful.

During the past few years, the underlying technology has continued to improve. Good speech recognition is perfectly capable of handling a complete sentence , such as "I want to take the TGV from Brussels to Paris a week from Saturday," but most users still want a highly structured interaction that prompts for each element of the transaction. In order to make me-centric computing successful, users will need to relax by having more experience with speech recognition applications, and conversations will get more ambitious and wide- ranging .

IBM Research ^[6] is planning to launch the Super Human Speech Recognition Project that aims to solve common speech-recognition problems and deliver systems capable of not just linguistic comprehension but contextual understanding.

^[6] http://www.research.ibm.com/

The development of software that uses a language model to predict which words are most likely to follow other words is among the numerous approaches the company is taking. IBM Research is also using an acoustic model in which software predicts all the ways a particular word might sound given various pronunciations, cadences, or background interference.

The real challenge is building systems that can understand multifaceted conversations or respond to open -ended questions. To that end, IBM is working on an approach called domain-specific interpretation. Systems designed for use in, say, a travel agency would be programmed to minimize the relevance of conversational elements not related to travel to generate the best response.

One of the first applications that IBM has produced in cooperation with Honda ^[7] is a telematics system for cars. These questions include information about the route and about nearby restaurants and sights to visit. The car will then respond in voice. This system, implemented in the Honda Accord, is planned to be launched in the United States. More "talk-to" applications are on the way. Nokia and Ericsson ^[8] have introduced this feature in their mobile phone. Instead of typing in a number, you can tell the mobile phone the number or a name from the address book.

^[7] http://www.honda.com/

^[8] http://www.ericsson.com/

We believe that we can see a major breakthrough in human-machine conversation. Extempo ^[9] , for example, provides technology to build intelligent characters (Figure 3.1). The tools enable the character to get the gist of a conversation, have emotional reactions , and work towards its own purposes interactively with the human. It doesn't need perfect input. It can be coupled within various sensory inputs and outputs. Max the dog and Erin the bartender, two characters developed by Extempo, have both great tolerance for ambiguity and wicked senses of humor! So we are slowly getting there.

^[9] http://www.extempo.com/

Figure 3.1. Extempo's Characters

graphics/03fig01.jpg

Speech recognition plays an important role in situations where it is impossible to use the hands for typing or for devices that are too small to have a keyboard attached. In our car navigation example, it is actually illegal in many countries to take your hands off the steering wheel and to not look at the road while driving. But there are many other situations where people do not have the chance to use their hands and are forced to communicate orally. Just imagine doctors in an emergency situation or workers on a construction site.

Speech recognition systems will need to understand people in many different languages and have good translations available for different languages. Language-specific expressions need to be matched in order to provide the right level of response.

Touchscreens

A touchscreen is an intuitive computer input device that works by simply touching the display screenwith a finger, stylus, or similar devicerather than typing on a keyboard or pointing with a mouse. Computers with touchscreens have a smaller footprint and therefore can be mounted in smaller spaces; they have fewer movable parts and can be sealed. Touchscreens may be built in or added on. Add-on touchscreens are external frames with a clear see-through touchscreen that mount onto the monitor bezel and have a controller built into their frame. Built-in touchscreens are internal, heavy-duty touchscreens mounted directly onto the CRT.

The user-friendly touchscreen is simple, intuitive, easy to learn, and is becoming the interface of choice for a wide variety of applications. Public information systems, such as information kiosks , tourism displays, and other electronic displays, are used by many people who have little or no computing experience. This makes information accessible to the widest possible audience.

Touchscreens are also commonly used in restaurant point of sale (POS) systems. The overall training time for new employees can be reduced. Work is done faster because employees can simply touch the screen to perform tasks, rather than entering complex key strokes or commands.

The touchscreen interface is also very useful in systems ranging from industrial process control to home automation. By integrating the input device with the display, valuable workspace can be saved. And with a graphical interface, operators can monitor and control complex operations in real time by simply touching the screen.

Self-service touchscreen terminals can be used to improve customer service at banks, busy shops , restaurants, and transportation hubs, for example. Customers can quickly place their own orders or check themselves in or out, saving them time and decreasing wait times for other customers.

Eye Movement

Eye tracking is a technique used to determine where a person is looking. The concepts underlying eye tracking are deceptively simple: track the movements of the user's eyes and note what the pupils are doing while the user is looking at a particular feature. In practice, however, these measures are difficult to achieve and require high-precision instruments as well as sophisticated data analysis and interpretation. Machines used to do this are called eye trackers .

Eye movements made during reading and identifying pictures provide useful information about the processes by which people understand visual input and integrate it with knowledge and memory. Until recently, most applications of eye tracking have been in psychological research for probing into subjects' perceptual or cognitive processes, for example when driving in traffic, reading text, solving problems, looking at pictures, scanning instrument panels, and performing complex tasks. Research into human visual search strategies has been based on tracking subjects' gaze during "target object" search tasks. Based on this research, the evaluation of computer displays has been modeled upon recordings of fixation patterns.

Commercial companies have found eye movement tracking to be of interest. It is used today for market research studies. It is actually used for Web site usability and new product designs. With eye movement tracking, it is possible to see how users react to visual input from these devices or Web sites. It can show if they are distracted or if they are able to understand the input efficiently .

Military research developed head-up displays (displays integrated in the windshields of aircraft, so that instrumental data is displayed "on top of" the surrounding flight scene) combined with eye tracking for guiding the missile system, thus freeing the pilot's hands for other tasks. This naturally required on-line processing of the tracking data, and this processing was not aimed at probing the pilot's perceptual or cognitive processes, but rather letting the pilot use the eyes as an extra manipulation channel. Later, the on-line processing of eye-gaze tracking data was extended to user-interfaces for non-military purposes. It is used, for example, in Formula 1 racing, where it enables the projection of messages into the driver's direct field of vision.

Besides this passive usage of eye movement tracking, it will become an integral part of me-centric applications. Eye movement can be used in various applications in a me-centric world. One is, of course, the passive tracking of the eye movements in order to test if the user is looking in the right direction. Another purpose is to assess the user's state of mind. Is the user angry or happy? Does he or she have trouble with the lighting? Is the user insecure ? This input can help make the device easier to use without having the user state that there are problems. The device would be able to adapt, by adjusting the lighting, for example or adding information to guide the user.

Eye movement also has some medical use for disabled persons who are partially or completely paralyzed. In some cases, these people cannot even speak and can only move their eyes. Through eye trackers, these people are able to communicate with other people or use the Internet. Although it is a very slow communication, it means a lot to these people who would otherwise be completely cut off. And the group of disabled people is not small. In the United States, 22 percent of the population 16 years or older is affected, and more than 50 percent of Americans over the age of 65 report a disability. This is more than 54 million Americans and more than 750 million people worldwide, who could be easily helped, if low-cost devices were available to support their needs. Anyone can be affected by disability at any time.

The success of the computer mouse as a pointing device combined with direct manipulation interfaces comes from the fact that it is based on human abilities . Man has developed the skill of grabbing and moving objects with his hands and working on them. There are some problems with eye-gaze control in comparison to the mouse, however. There is no analogy to the mouse-buttonyou cannot "close your eye around an object" to pick it up, as you can with a mouse. It is not possible to "let go" of the eyes as you can with a mouse when you do not wish to manipulate anything in the display. The human eye-gaze pattern is not calm and controlled like the movement of the mouse. The eyes dart rapidly from spot to spot, and keeping your eyes fixed on a specific point is unnatural and can also perhaps be a strain. And it can be difficult to point at a blank area of the screen, because the eye normally is attracted to information-carrying features.

Although there are still some problems with the correct interpretation of the eye movement, it will become part of the me-centric devices to assist users in what they are trying to do.

Scanners and Digital Cameras

Scanners and digital cameras have become the input devices of choice for images and pictures. While images were drawn or pixeled on the computer screen in the early days of computing, most images we see on the computer today are direct representations of how we see the world. While scanners have been common for over a decade now, digital cameras have become so affordable nowadays that many people have at least one.

While few people use their more advance features, scanners and digital cameras have lots of input parameters that can be tweaked to modify the resulting images. The scanner, for example, enables you to scan only certain colors or regions of a given image. The camera enables you to focus on certain aspects of the image. Figure 3.2 shows a digital photo of a tropical flower taken in Malaysia.

Figure 3.2. Picture taken by Digital Camera

graphics/03fig02.gif

While photography has been a hobby for many people for over a century now, there are many who complain that they can't take quality pictures or that their images always look funny . Many would ascribe these failures to shooting technique or digital technology. But most models of digital cameras are designed to obtain adequate image quality with a relatively simple operation. The problem is that today's digital cameras do not provide agents that take pictures on behalf of the user; the digital camera just replaces the chemical film with a memory card. The images still need to be taken by the human, and the quality of the photo depends on how good the user is.

Gesture

Adam Kendon explained the difference between gestures and speech quite well: "Gesture has properties different from speech. In particular, it employs space as well as time in the creation of expressive forms, whereas speech can use only time. Therefore, the way information may be preserved in speech, as compared to gesture, tends to be very different." ^[10]

^[10] http://www.univie.ac.at/Wissenschaftstheorie/srb/srb/gesture.html

Gesture recognition is human interaction with a computer in which human gestures, usually hand or body motions , are recognized by the computer. Recognizing gestures as input might make computers more accessible for the physically impaired and make interaction more natural for young children. It could also provide a more expressive and nuanced communication with a computer.

In 1964, the first trainable gesture recognizer was developed. A prototype motion processor developed by Toshiba ^[11] allows a computer to recognize hand motions and to display them in realtime on the computer's display. Proposed applications include word processing using input with hand sign language, games and other entertainment, and educational approaches in which hand motion could result in multimedia effects.

^[11] http://www.toshiba.com/

Toshiba's motion processor works by emitting an infrared transmission light near the hand area and "reading" the light reflected back from the hand. Reflections from areas beyond the hand don't occur because the light is quickly dissipated over distance. The reflected light allows the computer to continuously build a 3D motion image of the hand, which can be displayed or not.

Research in gesture recognition can be divided into four areas: hand gesture, body gesture, mouse-based gestures, and sign-language recognition. The gesture recognition is done through various types of sensors.

There are many applications for sensor-based gesture recognition. Navigation commands come to mind, where the user can gesture toward a desired direction. Applications in medicine and industry include precise remote control of surgical robots, or of robots and other machinery in outer space and other locations too dangerous for humans to access. Applications can even be enabled to enhance communications for people who are both deaf and muteinter-communication between themselves and communication with hearing persons.

Imagine a 3D application where a user can move and rotate objects simply by moving and rotating his or her hand. The user can get more than six degrees of freedom simultaneously (X-, Y-, Z-rotation, X-, Y-, Z-translation, and additional degrees from the hand-form, e.g., how far the thumb is spread from the hand). The user gets the impression of an easy and intuitive interface, because she or he can change tools or give commands by showing another hand gesture. For example, moving and rotating is enabled when the hand is open, selecting is enabled when the index finger is shown, zooming when both the index finger and the thumb are shown. Many dialog states can be saved in this way.

The technology also has the potential to change the way users interact with computers by eliminating input devices such as joysticks, mice, and keyboards, and allowing the unencumbered body to give signals to the computer through gestures such as finger pointing.

Unlike touch-sensitive interfaces, gesture recognition does not require the user to wear any special equipment or attach any devices to the body. The gestures of the body are read by a camera instead of by sensors attached to a device such as a data glove. The device, for example, needs to decode facial expressions as displayed in Figure 3.3.

Figure 3.3. Example Gestures

graphics/03fig03.jpg

Haptics is the science of applying tactile sensation to human interaction with computers. A haptic device is one that involves physical contact between the computer and the user, usually through an input/output device, such as a joystick or data glove, that senses the body's movements. By using haptic devices, the user not only can feed information to the computer but also can receive information from the computer in the form of a felt sensation on some part of the body. This is referred to as a haptic interface. For example, in a virtual reality environment, a user can pick up a virtual tennis ball using a data glove. The computer senses the movement and moves the virtual ball on the display. However, because of the nature of a haptic interface, the user will feel the tennis ball in his hand through tactile sensations that the computer sends through the data glove, mimicking the feel of the tennis ball in the user's hand.

Mouse-based gestures have become of interest lately. They allow users to send commands to the system without needing to click through a navigation or select from a list. Depending on the context, a certain mouse movement will execute a command. This reduces the number of steps to get to the goal. One piece of software that uses this technology is the Opera ^[12] browser. For example, holding the secondary mouse button down while sliding the mouse downward will open a new window. Pen-based gestures that fall into the same category as mouse-based gestures are widely used on PDAs, for example. A gesture-based text editor using proof reading symbols was already developed at CMU ^[13] by Michael Coleman in 1969. Gesture recognition has been used in commercial CAD systems since the 1970s and came to universal notice with the Apple Newton in 1992.

^[12] http://www.opera.com/

^[13] http://www.cmu.edu/

Gestures are already widely used in touch-panel displays where fingers are used and on PDAs both for picking with a pen and for "writing" with Graffiti on the Palm. It is our belief that pointing and activating with fingers pointed in the air are likely to be very common. Wireless handheld mice are already used today to control presentations by speakers in audience presentations.

It is important to note that gestures may be specific to a certain culture. This means that it is important to understand who is actually making the gesture and in which cultural context. Otherwise, it may lead to miscommunications that are not unknown in human communications today. Gestures are important, as they are part of non-verbal communication that is important to humans. Non-verbal communication is more than gestures; it includes processes without the use of language proper, e.g., body movements and smells; but also such extralinguistic features of speech as intonation , speed, and pause. Nonverbal communication is expressive and manifest, as opposed to being about something outside the communicator, and tends to provide the context of verbal communication and has the power to disambiguate (but also to invalidate) the content of linguistic expressions.

Multimodal

Most computing interfaces still use only pointing and typing. But some systems like intelligent voice recognition (IVR) use a combination of speech and typing (e.g., for mobile banking), others use pointing and speech, but very few use gestures in combination with other interfacing modes. Few systems pay any attention to what we do with our bodies (e.g., position, pose, orientation, and gaze). So there is room for improvement.

In many cases, when people speak about multimodal systems, they speak about systems that can use only one modality at a time. In some cases, the different modalities are used to adapt to hardware configurations (e.g., using a PC-based Web browser or a mobile phone). In some cases, the user is simply given a choice, such as speech or typing.

In some cases, a multimodal system accepts input from the user through different modes at the same time or nearly simultaneously. This could be, for example, the two-handed input, which combines actions of the two hands to trigger a specific task. The most obvious trigger for multimodal interfaces are speech and hand gestures, because this mimics the behavior of transmitting tasks between humans.

Much research on multimodal input has been done, but little is really being used so far. It is expected that future devices and systems will become multimodal to make it easier for people to use. The device will adapt to the needs of the user and not the other way round.

3.2.4 Input Issues

Although many input techniques have been in operation for decades, there are still some issues that need to be taken into consideration when building human-computer interfaces. While the keyboard provides very exact input, its main issue is that it requires a lot of keystrokes to get things done. Using a command line interpreter to enter commands, for example, is a very effective tool for experienced users who can type very fast. Mice, joysticks, and trackballs are very good at selecting and clicking commands that appear on screen. The only issue with these input devices is exact positioning. If you miss the command on screen (e.g., because the ball inside the mouse is dirty and does not allow exact positioning) and you click on something else instead, you may be doing the exact opposite of what you wanted. Critical commands should be reconfirmed before they are executed.

Speech input unfortunately does not work perfectly yet and can lead to errors in communication. Just imagine that you wanted to go to Paris and your car navigation responded with "I don't understand." But what did the computer not understand? Was it the pronunciation? The usage? The logical thread? Humans react to this message the same way they would in a conversation, with resentment and irritation. They raise their voices and sound out words as if they are speaking to a child. Their voices become stressed. They change their pitch and may even swear. As a result, the program is even more confused . Even worse is a car navigation system that tries to guess, if it does not understand well. This creates even more confusion.

The crux of the problem is that vehicles, unlike desktop PCs, are subjected to a wide variety of noises that can confuse software-based speech recognizers. The systems have to worry about more than just the noise generated by the vehicle. There are many different sources of noise. The road, wind, defroster, fan, radio, windshield wipers, and backseat occupants are just a few.

Me-centric devices will have similar problems as car navigation systems. They will be used in all sorts of environments, such as at the airport, in hotels, at the swimming pool, on the road, on the plane, or anywhere else. With the introduction of me-centric devices, it is important to give the user as much help as possible, which might mean building another database (this time of the most common errors). Even more important is to support the user who gave an unclear instruction by proposing specific solutions to the user's perceived problem. For example, the computer could ask, "Do you want Paris or Brussels?" This does more than locate the problem as a pronunciation issue; it reassures the user that the program is intelligent enough to understand the situation and is willing to help the speaker solve it, which in turn makes users more disposed to working with the program.

Touchscreens are good for simple commands but not suitable for complex input, as it would take too long to key them into a touchscreen. Therefore, one should make sure that the application or service that runs on a touchscreen does not require huge amounts of input.

An important issue when designing interfaces based on eye-gaze control is exactly how to use the gaze direction. The point of regard in a display can be used as it is, for positioning an invisible mouse pointer perhaps, and using this to select from some sort of menu-based system. But it can also be processed further, using the knowledge of the connection between eye-gaze and interest.

One of the biggest challenges with scanners and digital cameras is the correct use of colors. While this seems trivial, it requires quite some technology to make a given image appear on every computer screen in exactly the same way. It becomes even more difficult if you want to print out an image. It's common for an image to look great on the screen, but not so great on the paper.

When designing gesture-controlled interfaces, it is important to understand that gestures are culture-dependent . Therefore, the target group of a given service or application needs to be analyzed quite well as to cultural background, and it needs to be made clear that other gestures that may be used will not interfere with the system.

For multimodal input, it is important to find ways to coordinate the input. It is also important to find out if input from different modes is contradictory; for example, you say "no" and nod your head. In most countries, nodding means "yes," but there are some countries where nodding means "no." In cases where the modes are contradictory, the system should verify the input by reconfirming what it understood .