User Interfaces and Human-Computer Interaction | Business Innovation and Disruptive Technology: Harnessing the Power of Breakthrough Technology ...for Competitive Advantage

User Interfaces and Human Computer Interaction

In addition to developments in core computing and networking technologies such as grid computing and power line networking, and advances in devices and sensors, changes are underway for the way in which we interact with computers. For a long time, we have relied upon the keyboard, mouse, graphical user interface, and the general desktop metaphor. The new advances in human computer interaction will enable faster and more effective forms of interaction that incorporate speech recognition, natural language processing, artificial intelligence, and new forms of visual interfaces that go beyond the traditional desktop metaphor.

Artificial Intelligence

The work on artificial intelligence has historically focused on attempting to make computers think like humans, giving them the ability to learn, reason, and to create. One of the best-known examples from science fiction was the movie 2001 that portrayed HAL, a computer with artificial intelligence that was able to see and hear, talk, reason, plan, and even lip-read. While the industry has not been successful in creating a computer with such all-round capabilities as in the vision laid out by HAL, there have been many successes in focused areas where computers have been able to apply their massive processing capability. For example, IBM's Deep Blue computer system was designed to play chess at the Grand Master level and indeed defeated the reigning World Chess Champion, Garry Kasparov, in May 1997. The Deep Blue computer, a massively parallel RS/6000 SP computer system, was able to examine and evaluate up to 200 million chess positions per second. On the IBM research Web site, the company has pointed out that while Deep Blue is highly effective at solving chess problems, it is not a "learning system" and has limited intelligence when compared to human beings.

Even though computers have not been able to think in the same manner as the human brain and to apply common sense and language understanding, they can be applied using their own strengths and characteristics in order to solve a variety of everyday tasks. In this way, artificial intelligence is increasingly being applied in a number of business scenarios in order to improve productivity and decision-making ability. A well-known example is the Microsoft Office Assistant, the animated character that offers help when performing various tasks within programs such as Microsoft Word. It can aid in troubleshooting by asking a series of questions related to the task at hand. Microsoft Windows XP uses additional artificial intelligence capabilities such as a tool called Search Companion that can aid with searches on the computer, home or office network, or the Internet. More recent examples can be found in some of the projects that Microsoft Research is conducting. The group is applying artificial intelligence to a variety of productivity applications including improved search capabilities, email filtering and prioritization, system troubleshooting, meeting facilitation, data mining, multimodal interfaces, and notification platforms.

Artificial intelligence can be applied to improve worker productivity when using email and other applications by helping to prioritize the most important messages and tasks. Software can look at various aspects of an email message such as the subject line and text within the email, the relationship between the sender and the worker in terms of the company organization chart, and the history of communications between these individuals, including the response times, in order to determine how important a communication may be and the potential costs of a delayed response. Microsoft's Priorities system does just this. The goal is to help users get the right information at the right time on the right device. The Priorities system is part of a larger project named the Notification Platform. This platform, being developed by Microsoft Research, is part of their Attentional User Interface (AUI) project, which focuses on attention- and context-sensitive costs and benefits of information and services. The Priorities system goes well beyond analysis of an email message when determining a priority and deciding when to alert an end user to the arrival of an incoming message. It actually uses a number of HAL-like techniques to determine the user's context and readiness to be alerted to an inbound priority message. Beyond email, the inbound message could be a telephone call, an instant message, or an information feed. The system visually observes the user's activity, listens to the surrounding sounds, checks the user's calendar and makes decisions as to the appropriate timing and manner in which to deliver information. In observing the user's activity, the system uses a Bayesian vision system to determine the context of activity. Thomas Bayes was an English mathematician in the 18th century who established a mathematical basis for probability inference. Bayesian systems are based upon his theories of statistical probability. The context of the user's activity can be determined by observing where the user's attention is focused. If the user is looking constantly at the computer screen, then it is likely he or she is working on some solo activity on the computer. If the user is away from the screen and the system observes several faces in the room, then it is likely the user is in some form of meeting and does not want to be disturbed. By adding information from the user's calendar and audio information to this analysis, the system is able to make even more informed decisions. Should these systems become commercialized in the future, the challenge will be to ensure user confidence in the privacy of information observed. Employees may well fear that this type of information could be used to report back to management on their behavior and general productivity.

Speech Technology

Speech technology incorporates a number of disciplines aimed at spoken language technologies. It can include speech recognition, speech synthesis (text-to-speech), speaker identification, and multimodal technologies that combine speech with other forms of user interaction in order to enhance the computing environment for end users. Multimodal technologies are particularly interesting because they can help make devices more usable. For example, personal digital assistants are useful for highly mobile employees due to their portability and wireless connectivity, but are often difficult to use in terms of data entry. Inputting large amounts of text is difficult using handwriting recognition or the graphical software-based keyboard. In this situation, speech recognition can be applied in order to improve the input capabilities of these devices and make them much easier to use. Combined with natural language processing, end users can essentially talk to their devices and have the device understand the intended meaning of their commands. Since mobile devices are often underpowered to perform continuous speech recognition by themselves, techniques such as distributed speech recognition can be applied to stream the audio signal back to a server computer for processing and interpretation over the wireless network and then to execute the required command back on the mobile device. This type of technique can be applied for general dictation as well as more advanced functionality such as meeting scheduling. Tasks such as scheduling can be made more accurate by combining modalities. For example, a user can verbally request a meeting but tap certain areas of his or her calendar at the same time in order to provide more guidance to the computer.

Text-to-speech capabilities enable a number of useful scenarios as well. For example, individuals driving in their car or using their cell phone can listen to email as it is read by a text-to-speech converter. Based on the current device in use and the user's activity or context, various forms of interaction can be the most optimal mode of human computer interaction.

An interesting extension of speech technology and multimodal technologies is that they can be decoupled from the actual device in use and become part of a computer-based service running on the network that can follow the users around as they move from their homes to their cars and to their offices. It becomes a virtual assistant that follows the user's context and can aid with a variety of tasks regardless of the current device being used.

Microsoft Research has a project called Dr. Who that is investigating these types of opportunities. They see the service becoming a Web service that is specialized in a particular domain such as scheduling and can be looped into human conversations in order to execute a task. For example, the service could be asked to find a certain type of restaurant within close proximity. One can also imagine location-based services being applied to automatically determine the user's current location. It's easy to see that, when location-based services, Web services, and multimodal user interaction techniques are combined, they start to open up powerful opportunities for computers to create new forms of value that can improve productivity.

Visual Interfaces

A final area within the category of user interfaces and human computer interaction is the actual visual interface itself. The screen is the primary way in which we communicate with the computer in terms of our day-to-day activities. These typically involve actions such as reading, writing, managing and organizing content, interacting with others, and receiving notifications related to emails, instant messaging, alerts, and appointments. One of the issues over the past several years has been the ever-increasing stream of information flow in the form of notifications that have the potential to distract us from our primary work activities. A constant flood of email alerts and other forms of notification throughout the day can be a large distraction for most knowledge workers.

Several major companies such as IBM and Microsoft, together with numerous startups, are currently looking into new ways to present information and maximize the quality of information conveyed while minimizing effort on the part of the end user. If smarter visual interfaces can be developed that can present more information more rapidly or in a more accessible manner, then they can have a significant effect on productivity.

One example is the Scope application from Microsoft Research. The scope is designed to summarize a variety of notifications into a glanceable visualization tool. Figure 9-1 shows a sample of the interface. Notifications are grouped into alerts, inbox items such as emails, calendar items, and tasks. Objects near the center of the scope are the higher priority items. The notifications can be prioritized using the Microsoft notification platform and priorities system that was discussed earlier. These systems apply artificial intelligence techniques in order to determine the relative priority of one task over another. Users can select and zoom in on notifications using the Scope and drill down into them in order to gain more details. As we adapt our work habits around computing, research projects such as Scope have the potential to help gain back some of the time and attention that is spent in switching between various activities and in making decisions as to task priorities.

Figure 9-1. The "Scope" Glanceable Notification Summarizer from Microsoft Research. Source: Microsoft.

graphics/09fig01.gif

For many years, we have relied upon the desktop metaphor of Microsoft Windows and the Macintosh. These are two-dimensional metaphors for managing documents and applications. Microsoft Research is also looking into ways we can apply three-dimensional graphics in order to increase productivity around information management and to make the desktop metaphor more intuitive. Their TaskGallery research prototype uses a three-dimensional office metaphor instead of a two-dimensional desktop. Objects can be placed on the walls, ceiling, or floor of this 3D space and can be ordered by depth. The TaskGallery also provides an interesting transition vehicle for 2D to 3D migration in terms of user adoption due to the ability to bring unmodified Windows applications into the environment.

Other technologies in the visual interface category include the Scopeware product from Mirror Worlds Technologies and the Star Tree product from Inxight. Scopeware is a knowledge management solution that locates and presents business information in more accessible formats for end users. One of the Scopeware products is Scopeware Mobile. The solution provides mobile users with a "rolling window around now" a stream of their most relevant information that is updated on a real-time basis. The core platform offered by Scopeware is their Information Management Infrastructure, or IMI. This platform aims to increase the value of information by making it more searchable and accessible. Inxight's products aid in unstructured data management by providing software for analyzing, organizing, categorizing, and navigating information. Their Star Tree product helps companies navigate and visualize large hierarchies of information. Figure 9-2 shows a sample Web site published as a Star Tree. Studies at Xerox PARC have shown the Star Tree technology to be 62 percent more effective than Windows tree controls when navigating collections of Web pages.

Figure 9-2. The "Star Tree" Viewer Technology from Inxight. Source: Inxight.

graphics/09fig02.gif

While visual interfaces are often overlooked due to the dominance of operating system platforms such as Microsoft Windows and the Macintosh, it is important for businesses to stay tuned to some of these emerging developments and alternative solutions. While they may not replace well-established modes of visual interaction with Web pages and with the traditional desktop, they can very well be applied in order to enhance the experience. In certain specialty applications, they can also be the most optimal solution for navigating and visualizing large amounts of data helping to turn the data into meaningful information that can enable business understanding and business decisions.

This snapshot of some of the future trends in computing has aimed to illustrate where some of the developments are actually occurring. There are obviously many other areas that are equally or even more important. The general theme is that the trends cover the network layer, the hardware layer, and the software layer. Gradually, the computing options available to business users are becoming more flexible, more open, more intelligent, and more usable. The way in which we interact with computers and the way in which our customers and business partners interact with computers are changing. We are finally gaining the ability to deliver the right information to the right person at the right time. This is a capability that will be a key competitive advantage for businesses in the future.