11.6 Minimizing Concatenation Splices


Because concatenation most often relies on splicing together what would otherwise be a smooth phonetic stream, it often creates a sense of phonetic disruption, irregularity, and artificiality a problem that is separate from the problem of intonation. It is therefore important to plan concatenation breaks wisely and to be mindful of the phonetic flow of natural speech. In some cases, as you will see, certain concatenation breaks can and should be avoided altogether.

Example (20) illustrates a scenario in which a poorly devised concatenation plan calls attention to the unnaturalness of concatenation. (Recall that a vertical bar indicates a new sound file.)

(20)

This is the | [first second third . . . thirtieth] | saved message.


The plan in (20) reflects a popular but undesirable method of concatenation. The plan is as follows: "Just switch out the word that varies, and nothing more." This kind of thinking is rooted in a concern for economy and efficiency, and that is fine; but the pronunciation of "the" depends on the word that follows it, at least in Standard English. Before a word starting in a consonant, such as "seventh," we pronounce "the" with a schwa, the reduced-energy, centrally placed vowel in English ("uh"). Before words that start with a vowel, such as "eighth," "eleventh," and "eighteenth," however, the vowel in "the" is pronounced tensely, as in "thee." Notice that in Standard English these words also require the article "an," rather than "a" for example, "an eighth."

Furthermore, for technical reasons that we cannot delve into here, the articles "the," "a," and "an" form a single phonological word with whatever word follows. Planning a concatenation break between "the" (or "an") and "eighth" would be phonologically similar to planning a break between "Se" and "attle" in "Seattle."

A superior concatenation unit would consist of "the" plus the ordinal for example, "the seventh," "the eighth," "the ninth," and so on. Better still, the first concatenation unit could consist of bigger chunks such as, "This is the seventh," "This is the eighth," "This is the ninth," and so on. In addition to respecting the phonological word for example, "the eighth" this plan is also desirable because the fewer the concatenation units, the more natural the result.

The same mistake is made in the concatenation plan of a touchtone prompt of an airline application, as illustrated in (21). The goal here was to recycle the chunk "Please enter the."

(21)

Please enter the | hour, one to twelve, followed by the star key.

Please enter the | minutes, one to fifty-nine, followed by the star key.


The break planned between "the" and what follows is in the middle of a phonological word. Furthermore, because "hour" and "minute" require distinct pronunciations of "the," whichever way "the" is recorded is sure to be the wrong pronunciation in one of the two cases. As in (20), a better strategy would be to record "the hour" and "the minutes" each as a unit. Better yet, do not concatenate; instead, record the sentences whole.

Earlier in this chapter, you saw several concatenation plans in which concatenation junctures align with pauses that would occur in natural speech for example, lists and phone number groupings. By exploiting the naturally occurring pauses that occur in speech, you avoid the physiologically impossible splice effects that frequently mark concatenation junctures and mar the naturalness of prompts. The concatenation break is less perceptible, and the overall result is smoother-sounding. To this end, we now demonstrate how you can make minor changes in wording to produce concatenation that sounds more natural.

The first example is drawn from the design of a weather application in a voice portal. The wordings of (22) and (23) are nearly identical, but (22) will be easier to concatenate for more natural-sounding results.

graphics/sound_icon.gif

(22)

Today, | it'll be partly cloudy, | with a high of | 67 | and a low of | 51.


(23)

Today | will be partly cloudy, | with a high of | 67 | and a low of | 51.


The difference is that the grammatical function of "today" in (22) permits a following pause, whereas in (23) a pause is prohibited. The concatenation splice between "today" and "will" in (23) interrupts what would be a continuous, uninterrupted stream in natural speech.

Mindful of how naturally occurring pauses can mask concatenation splices, consider the error prompt in (24).

(24)

Is 555 593 1367 the number you want me to call?


No matter how the phone number is read back, there are at least two concatenation splices, one between "is" and the first digit of the phone number, and the other between the last digit and "the number . . . ." But in conversation, these are not natural points to introduce an interruption of any sort. The entire question in (24) constitutes a single breath group, describing an intonation contour type 3. It can be rewritten, however, so that the concatenation breaks coincide naturally with breath group boundaries, as in (25).

graphics/sound_icon.gif

(25)

This is the number I heard: 555 847 7037. Did I get that right?


In (25), the phone number is naturally cushioned by silence at either edge. Another wording that obliges a pause to precede the phone number is, "Confirming: . . . ." If the phone number is concatenated by the grouping strategy, listeners probably won't notice that the prompt was spliced together.

It would be difficult, perhaps impossible, to make the concatenation strategy in (24) sound natural, for two reasons. First, splices before and after the phone number will be obvious and phonetically disruptive. Second, the phone number in (24) does not end on any particular boundary tone, because it occurs in the middle of a single, overarching type 3 contour, at least in natural speech. In contrast, (25) naturally comprises three prosodic "islands," each with its own distinct contour:

"This is the number I heard"

contour 1

PHONE_NUMBER

contour 1

"Did I get that right?"

contour 3

The only real concatenation challenge in (25) is the handling of the phone number. The plan in (25) is thus easier to execute than that of (24) and virtually guarantees smooth results.

The next example shows that less (in wording) is sometimes more (in naturalness). Let's first examine the concatenation strategy we recommend. The context in (26) is the read-back of credit card information.

graphics/sound_icon.gif

(26)

Expiration date: October, oh-one.


This is easy to concatenate for natural-sounding results. The recommended concatenation plan is presented in Figure 11-16.

Figure 11-16. Concatenation plan for reading back credit card information.

graphics/11fig16.gif

That is, "expiration date" will be delivered with contour 3, followed by a pause lasting about a quarter of a second, followed by two concatenation units that together express contour 1. This is a natural prosodic pattern that is perfectly suited to the reading aloud of items on a form for example, "Ethnicity: Pacific Islander" on a census form.

An alternative is the wording in (27).

graphics/sound_icon.gif

(27)

Your card expires in October, oh one.


In natural speech, (27) represents a single breath group, but to concatenate this message, three concatenation units must be spliced together, as shown in Figure 11-17.

Figure 11-17. An alternative concatenation plan for reading back credit card information.

graphics/11fig17.gif

Concatenating three units so that they suggest a single breath group is more likely to yield an unnatural-sounding result than concatenating three units to suggest two breath groups. Recall that in Figure 11-16 only the MONTH and YEAR are subsumed by contour 1. Furthermore, a natural rendering of (27) would require that the n of "in" serve as the phonetic onset to the first syllable of "October," thus yielding the syllable "noc" ("in October"). Yet it is in the very midst of this prosodic atom, or syllable, that a concatenation splice has been planned. If the goal is a smooth and natural sound, Figure 11-16 is a safer bet than Figure 11-17.

This credit card example demonstrates that a brief, telegraphic recording, which is acceptable in this context, can offer fewer prosodic risks and can more robustly suggest natural prosody under concatenation than a chattier, prompt-only-in-complete-sentences approach.



Voice User Interface Design 2004
Voice User Interface Design 2004
ISBN: 321185765
EAN: N/A
Year: 2005
Pages: 117

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net