Aspects of Research

Performing research for a speech-recognition system is not done in a vacuum ; there is always a goal on which to focus as we gain more information. Elements of this include researching with an eye toward creating an easy-to-use application, where the terminology is audience-appropriate. A helpful technique is to perform a morphological analysis to examine the problem at hand by looking at other similar problems and relating information gleaned from those solutions to our own. Particularly in a speech environment, it's important to look at these situations carefully and understand that there can be a difference between what clients say they want and what they actually want, the speech system to do ”they may be two different things. We also must examine limitations that may limit aspects of the system by performing feasibility and risk assessments.

It's important to remember that when you finish the research phase, you should have a full understanding of the requirements of the system. At that moment the designer and the client will agree on what the completed system will comprise. This will all be documented in a Requirements Specification.

Usability

The most successful products are usually those that are designed for the needs of a specific audience. Apple Computer's original Macintosh computer is a great example. Because the Mac was intended to be "a computer for the rest of us," its developers made it as simple to learn and use as possible. To ensure that you create a system that works for its intended audience, you also need to research the callers ' level of knowledge and sophistication and how they will use the product. Designers who intimately understand the people who will use their product create the most usable systems.

Jargon/Terminology

One of the most vital aspects of designing a speech-recognition system is to learn how people talk about their tasks . For example, did you know that flight attendants almost never "book" flights when they want to travel? Instead, they "list" a flight. To them, booking means paying for a flight ”and free airfare is one of the great perks of working for an airline. So, it wouldn't be appropriate to design an application serving flight attendants that asks, "Would you like to book that flight?" Many of them might hang up, thinking they would have to pay for tickets. If designers of such a system had only read an overly simplified Requirements Specification (perhaps presented to the designer by the client) outlining the functionality of the system ”but had not researched the calling population themselves ”they probably wouldn't have designed the right question (or prompt).

Morphological Analysis

One of the most common ways designers solve problems is by drawing parallels to previous problems or real-life experiences. This approach can be particularly useful when designing a speech-recognition system due to the relatively few deployments of speech systems (in comparison to graphical user interfaces). By performing morphological analyses, the designer compares similar design solutions and how they relate to one another, and can analyze that design space in a structured way.

To perform a morphological analysis literally means to analyze the structure of something, but here the definition is expanded to include the analysis of features as well. For example, if we were asked to design a new VCR, we might look at many of the popular (and not so popular) VCRs and other similar types of video playback/recording systems (like TiVo and DVD players) sold today, and compare them to each other. We would create a grid, listing all the devices on one axis and their features on the other. By placing marks in the grid to specify which models have which features, we could begin to see trends forming. For example, we would learn that all video delivery methods have a play function, but very few of them have one-touch recording of live TV. Table 4.1 illustrates what the beginning of a morphological chart for VCRs and other video delivery systems might look like.

Different designers may very well draw different ideas from these data. For example, one designer might conclude that because the one-touch recording feature is rare, having this feature on a new VCR would help differentiate it in the marketplace and generate sales. Another designer may think that if the feature is so rare, it might not be something that consumers want. I assume that very few designers would think that a consumer VCR shouldn't have a play/pause feature, but perhaps someone could make a compelling case for it. The point is, each designer may draw a different set of conclusions from a common set of data ”and design a different solution as a result.

Table 4.1. A Morphological Analysis of Video Playback/Recording Systems
 

Video Delivery System

Features

VCR Model 1

VCR Model 2

DVD Player 1

DVD Player 2

TiVo

Provide TV programming info

     

Play/pause action

Cable box control

   

One-touch recording of live TV

     

Sometimes it's helpful to expand the morphological analysis beyond the obvious comparisons. A person designing a VCR might learn something important ”or become inspired ”by performing a morphological analysis of audio cassette players or other related devices. For example, some cassette players have built-in headphone jacks with a separate volume control. Perhaps a designer would choose to market a VCR with its own headphone jack, so that people could connect the VCR directly to their headphones instead of their TV (which often don't have headphone jacks) or instead of running the sound through an amplifier . This could be an attractive feature for people who want to watch videos without disturbing others. We might not think of potentially worthwhile design features for speech-recognition systems if we don't look to other systems and solutions for inspiration.

But how can we perform morphological analyses with speech systems? At the moment, there aren't as many types of speech-recognition systems in the world as there are VCRs, so it's not easy to compare them directly to each other. However, we are able to compare the various features of different types of systems that perform similar tasks.

If we wanted to design a telephone-based banking system using speech recognition, we could examine the features of different banking channels (phone, Web, live-teller, ATM, and so on). By comparing these different channels, we could gain an understanding about the kinds of features to put in a speech system. Table 4.2 shows how the beginning of a banking morphological chart might look.

This simple analysis might suggest that the speech-recognition system should perform all the tasks that the touchtone system can, as well as some others that can be automated using a speech-recognition system. In this hypothetical example, we might know that a large number of people reorder checks most often because they move, rather than because they just want the next checks in their series. We might also discover that the reason the touchtone system doesn't have this feature is due to the complexity of collecting an address using only touchtone input. From this we could conclude that the speech-recognition system should have this feature, aiding in offloading this task from the bank representatives.

Table 4.2. A Morphological Analysis of Banking Features
 

Banking Channels

Features

Branch

ATM

Web

Touchtone

Talk to agent

   

Withdraw funds

   

Get account balances

Find a check

 

Reorder checks

   

Every speech-recognition system is a service channel ”and it can provide many types of services. That's why it's important to know not only what services the system will provide, but also what services it will not provide. Besides helping to determine the scope and structure of the system, understanding the limitations of the system can allow you to create a system that prevents caller confusion or frustration. For example, if a system provided a variety of information, such as weather, stocks, horoscopes, and breaking news stories, a caller might reasonably assume that the system would also provide related news-like information, such as traffic reports . If, however, this system does not include that functionality, the design needs to make sure that the caller doesn't expect to find that information there, or accommodates callers by allowing them to ask for that information and then letting them know that the system doesn't perform that function. When you understand the scope of the services to be provided, you can make sure the prompts and menu options are clear, specific, and help callers form an accurate mental-model of the system.

What Clients Say versus What Clients Want

It is the rare client or caller who will tell the designer exactly what he or she wants in a speech-recognition system; most clients simply don't understand the technology and design issues well enough to know its capabilities and limitations ”particularly since technologies and standards change over the course of months and years . However, sometimes people will say they require a particular feature that may not be feasible or even possible for the designer to provide. In these cases, the designer must try to extract the underlying need behind the request for that feature and consider the ramifications of excluding it or think about ways to design around it.

For example, if we were trying to design a flight information system, and we asked several people what they wanted to know about their flights, many of them might tell us that they want to know "when the plane is going to arrive." If we were to interpret this requirement literally, we might have the design provide the exact time of arrival ”"Flight 1687 will arrive at 6:02 P.M. " If, however, those people actually want to know if the flight is on time or delayed, then you might adopt a broader interpretation and change the prompt to say "Flight 1687 is arriving on time, at 6:02 P.M. " This prompt goes one step further by telling the caller that everything about the flight is going according to plan. In the first prompt ”"Flight 1687 will arrive at 6:02 P.M. " ”the system simply reports the information but doesn't tell the user the most important information ”the status of the flight.

Another way to examine this is to think about the delayed flight. Let's say we were about to take a trip and we called the airline to check on our flight, which was scheduled to depart at 1:55 P.M. If the system response only said "Flight 16 is leaving at 4:45 P.M. " ”a three- hour difference from our expectations ”we might think we had received information about the wrong flight. But if the system replied, "Flight 16 is delayed and will be leaving at 4:45 P.M. ," we would gain important additional information to clarify the situation.

An airline might also want to include the on-time information if its high on-time record is one of its key marketing messages and competitive differentiators. Good designers account for the functional considerations of an application as well as the marketing considerations.

As these examples illustrate , it's not enough to ask clients what information they want the system to provide; you must also understand their intention .

Feasibility Research

Although technology is advancing rapidly , all speech-recognition systems have limitations. These may be technical (the capabilities of the speech-recognition engine are limited), time-related (the design schedule may be extremely tight), resource-related (the client database experts are too busy), or language-related (the application is multilingual and simple translation won't ensure a usable system). Before designing the application, the designer needs to understand those inevitable limitations ”and their implications.

But understanding the limitations is only one aspect of a larger issue ”feasibility. A particular design feature may be desirable to the client or callers, but it may not be feasible because of cost, complexity, resource issues, schedule deadlines, or performance requirements. In such cases, you must work with the client to develop a more feasible alternative, or determine whether the feature is essential to the success of the application.

Designers must know the limitations (particularly the technical ones) so they can design something that will actually work. That sounds obvious ”and it is ”but it is extremely important, because as designers start the brainstorming process, they will be better prepared to let their ideas flow if they know the boundaries. And of course, boundaries are not always absolute or even detrimental; some of the best ideas come when designers are confronted with barriers to overcome or work around.

Risk Assessment/Hazard Analysis

Many designs don't pose any risk to callers or clients, but if, for example, a system could mistakenly execute a stock purchase or sale (or worse , options trade!), or if a system could reveal sensitive corporate or personal data, then appropriate security measures must be taken to ensure that certain errors do not occur. There are several ways to secure a system, but during the research phase designers must first determine what risks the system could pose to callers or clients so that these issues can be dealt with in the design.

A simple way to evaluate risk is to determine if a user error is recoverable or unrecoverable. A recoverable error is one that can be easily rectified. For example, if a system were to transfer funds incorrectly from one linked bank account to another (perhaps due to a recognition error), then the caller could simply move the funds back and rectify the situation. However, imagine what would happen if a system were to buy the wrong stock; the caller might not be able to recover from the error before the price of the stock dropped. This unrecoverable error could lead to lawsuits and potential financial ruin (for either the company or the caller!).

By asking clients "What's the worst that could happen?" ” before the system is designed ”you can build in security features that are needed to prevent or minimize the risk of such a worst-case scenario from ever occurring. Some typical security features range from design practices to technologies. For example, after a caller logs in to a stock trading system using an account number and a PIN number, we might assume that the caller is in control for the rest of the call and there is no further need to verify their identity. However, many callers and service providers want to ensure that neither the system nor the caller accidentally executes the wrong trade either due to errors with the speech recognition technology or due to a willful intruder who has taken control of the call.

Several steps can be taken to protect both parties. First, the system can explicitly confirm a trade and not allow a caller to interrupt the prompt while it's playing. Second, the caller can then either be required to enter or say his PIN number again, or enter or say a separate trading pass code to confirm the trade. Sometimes a caller might become familiar with the touchtone equivalents for some of the commands. Typically a system would assign the 1-key to indicate a confirmation, and the 2-key to indicate a negation (as in "Is that correct? Say 'Yes' or press 1 or 'No' or press 2"). However, it would be preferable to assign different keys, such as the 7-key as the confirmation key and the #-key as the negation in an attempt to prevent people from accidentally pressing the 1-key when they mean to press the 2-key, and vice versa.

And last, there is technology known as voice authentication or speaker verification , the best of which enables a system to simultaneously verify that the person is saying a particular pass phrase as well as verifying the biometrics of her voice print as previously registered. And of course, all these methods could be used together to ensure even more security, but that's not always necessary.



The Art and Business of Speech Recognition(c) Creating the Noble Voice
The Art and Business of Speech Recognition: Creating the Noble Voice
ISBN: 0321154924
EAN: 2147483647
Year: 2005
Pages: 105
Authors: Blade Kotelly

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net