11.7 Pauses


Just as the white areas on this page enable you to discriminate letters and words and sentences, brief pauses play an important role in the perception of speech and language. If your goal is to create a natural-sounding, familiar interaction for your users, you must consider the placement and duration of pauses.

Consider the error or help prompt exemplified in (28). Assume that the prompt must be concatenated because the items in the list are generated dynamically.

graphics/sound_icon.gif

(28)

Here's what you can say: | "mailbox," | "address book," | "sports," | "news," | or "weather."


The syntax of (28) requires a pause, as indicated by the colon, following "say." Additional pauses, as indicated by the commas separating the list items, will facilitate naturalness and comprehensibility. In other words, there is a pause at each concatenation break. The duration of these pauses varies according to a number of factors, including the rate of the speaker's delivery and the use of the utterance in context. A good first try might be 400 milliseconds (0.4 seconds) for the colon and 200 milliseconds (0.2 seconds) for each comma, and then increase or decrease from there, letting your ear be your guide.

Example (29) illustrates the case of a recording ending in a period, or full stop, preceding another recording.

(29)

Sorry, I still didn't get that. Tell me your PIN one more time.


Depending again on rate of delivery as well as the desired effect, a period can vary considerably in duration, although it is generally longer than any commas in the vicinity. Here, for example, there's a comma between "Sorry" and "I." A good first try might be in the range of 300 to 400 milliseconds. This pause should probably be adjusted in the code rather than in the actual audio file so that it can easily be tested and adjusted later.

Here are some rough guidelines to ensure proper rhythm. Insert silence in the following cases:

  • After nonrecognition messages such as, "Sorry, I didn't catch that," "Let's try this a different way," or "OK, let's go back to the main menu" (assuming of course that these are followed by another recording)

  • After an audio file when it is desirable to suggest a comma, a colon, or a long dash at the end for example, "Here's the number I heard: . . ." or "This is the account number I've got on file for you: . . ."

  • After a recording that is followed by text-to-speech (to ensure a less jarring hand-off of voices) for example, "The street address is . . . [TTS]

  • Before a prompt if it is preceded by TTS (again, for a less jarring hand-off)

Do not insert silence in these cases:

  • Until you are sure you have identified all the places where the silence will occur in the call flow and are confident that the result will be appropriate

  • At the end of one file and again at the beginning of another file if there exists a point in the call flow where these two files abut

Many of these guidelines come together in example (30), which consists of a sample e-mail header.

graphics/sound_icon.gif

(30)

Here's your first new message. Friday, 9:45 a.m. Subject: [performance reviews]. Message from: [Mitzi Dalton Huntley].


Following is a concatenation plan to realize this example. We are using the dollar sign ($) to indicate "milliseconds of silence" and [square brackets] to indicate TTS recordings. Notice that careful punctuation in the header systematically corresponds to different durations of silence in playback.

Concatenation Plan

"Here's your first new message" + $400

"Friday" + $200

"nine" + "forty-five" + "a.m." + $400

"Subject:" + $200

[TTS]

$400 + "Message from:" + $200

[TTS]

Note that silence should not be inserted between the hour and the minutes. The colon used in times of day ("9:45") is one of the more unusual examples of punctuation that serves a purely graphical rather than prosodic function. Nor should any silence be inserted between the minutes and a.m. or p.m.

Pauses can also communicate other kinds of higher-level information. Consider the messages in (31), (32), and (33).

graphics/sound_icon.gif

(31)

Sure, sports! <pause> Oh, looks like that's not available at the moment.


(32)

Hold on while I get your reservation. <pause> Sorry, but there seems to be a technical problem.


(33)

Just a moment while I look that up. <pause> Actually, I'm going to have to transfer you to a customer service representative because there are more than nine people in your party.


In each case, if the second sentence immediately followed the first, without a sufficiently long pause, these messages would sound absurd. These silences are important because they help the listener reconcile the superficially incompatible relationship between the two sentences. The system response in (31) starts with the implication that the sports option is available. The content of the second sentence, however, contradicts the initial assumption. The function of the pause plus "oh" is to communicate a reasonable change in the speaker's knowledge in this case, concerning the availability of sports information. This creates the sense that the VUI's persona has only just now found out about the unavailability of the sports option and should therefore not be held at fault.

Such pauses are important because they comply with our expectations of the amount of time that elapses when people retrieve information for others in real life. Furthermore, phrases such as "hold on" and "just a moment," as in (32) and (33), induce the caller to take a temporary mental break. Without a sufficiently long pause following such phrases, the caller will be caught off-guard by the next utterance.

The pauses indicated in these examples should be dependent on the sentences that follow them so that they do not add to any latency inherent in the system. The duration of the pause should be relatively long; otherwise, the sequence will seem implausible. Depending on the general tempo of delivery, try one, one and a half, or two seconds. Again, the duration of such pauses should be adjusted in the code, for ease of testing and later adjustability.



Voice User Interface Design 2004
Voice User Interface Design 2004
ISBN: 321185765
EAN: N/A
Year: 2005
Pages: 117

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net