11.5 Concatenating Phone Numbers


When we consider the prosodic structure of phone numbers in natural discourse, we can imagine various ways that they might be concatenated. For example, phone numbers can be concatenated one digit at a time (the digit-by-digit method) or concatenated by groups of digits. The digit-by-digit method is by far the most common, because it requires the fewest number of recordings. In contrast, the grouping method requires more than a thousand recordings, but it can yield results that approach very natural-sounding speech.

Either way, good-sounding concatenation requires a careful plan, to be executed by a team consisting of a director, a sound engineer, and a voice actor. Regardless of the method, attention to boundary tones and pauses is essential for comprehensibility, as well as likeability of the persona who delivers them.

11.5.1 The Prosodic Structure of Phone Numbers

In natural speech, phone numbers are often spoken as sentences consisting of two clauses (for seven-digit numbers) or three clauses (for ten-digit numbers). These cases are depicted in Figures 11-12 and 11-13, respectively. In both cases, the last group is contour 1, and any preceding clauses are contour 2.

graphics/sound_icon.gif

Figure 11-12. Seven-digit phone numbers are spoken as a sentence with two clauses.

graphics/11fig12.gif

graphics/sound_icon.gif

Figure 11-13. Ten-digit phone numbers are spoken as a sentence with three clauses.

graphics/11fig13.gif

Taking the natural prosody of phone numbers as a departure point, we will describe two strategies for concatenation: digit-by-digit concatenation and concatenation by digit groups. Either way, it is critical that you attend to boundary tones, because they are a fundamental element of the prosodic grammar. In the representation in Figure 11-13, for example, the first and second groups are congruent, both exhibiting a type 2 contour, and the last group describes a type 1 contour.

11.5.2 Concatenation Digit-by-Digit

Concatenation digit-by-digit sounds far from perfectly natural, given the inevitable splices that separate one digit from the next, thereby creating phonetically anomalous sound sequences. However, you can build phone numbers with three sets of digits zero through nine so that at least you satisfy users' basic prosodic expectations. By exerting control over the boundary tones, tempo, and pauses of the concatenated phone numbers, you comply with these basic expectations and thus ensure the comprehensibility of your concatenated message.

You can concatenate phone numbers by using three recorded versions of the digits zero through nine, as presented in Figure 11-14.[6]

[6] For the second of the last four digits, instead of contour 3 you can use a neutral, flatter alternative, as long as it is "high."

graphics/sound_icon.gif

Figure 11-14. This plan for concatenating an example ten-digit phone number relies on three recorded versions of the digits zero through nine.

graphics/11fig14.gif

To supply the boundary tones that listeners will naturally expect, the last digit of the phone number must fall to a low tone of 1 (assuming that the phone number is at the end of a declarative sentence), and the last digit of the area code and the last digit of the prefix must end up at the midrange tone of 2, as in contour 2. The remaining digits will be contour 3, as in list intonation, or a flatter, more neutral, but high alternative. In the recording studio, these recordings can be elicited in the context of actual phone numbers. Finally, we insert pauses between the digit groups for proper phrasing.

Again, the two most important considerations for facilitating the comprehension of concatenated phone numbers are pauses and boundary tones. Together, these elements give listeners an implicit sense of exactly where they are in the sequence, as well as what to expect next. The same concerns hold for the concatenation of phone numbers by groups, the topic of the next section.

11.5.3 Concatenation by Groups

For very natural-sounding results but at the cost of having to create a great many more sound files, you can concatenate seven- or ten-digit phone numbers according to the example in Figure 11-15.

graphics/sound_icon.gif

Figure 11-15. Concatenation by groups results in natural-sounding phone numbers.

graphics/11fig15.gif

Here, group A refers to three-digit chunks recorded with contour 2. In other words, both the area code and the prefix should be drawn from the same directory of audio files because they both consist of three digits and conform to contour 2. Group B refers to two-digit chunks of contour 2. Group C also refers to two-digit chunks, but of contour 1. Together, B and C are best recorded as a four-digit unit consisting of two subgroups (e.g., "one three" and "six seven"). There should be a slight pause in the middle, sufficient for the sound editor to make a clean cut.

graphics/sound_icon.gif

Table 11-2. Sound Files Required for Group Concatenation

GROUP

REQUIRED WAVE FILES

DESCRIPTION

A

000, 001, 002, . . . 999

Triplet, nonfinal; for area codes and prefixes

B

00, 01, 02, . . . 99

Pair, nonfinal; for beginning of subscriber number

C

00, 01, 02, . . . 99

Pair, final; for end of subscriber number

An alternative but perhaps more difficult strategy is for groups B and C to form a single type 1 contour. In this case, B must rise from 2 to 3, and C must fall from 3 to 1. Again, group C should fall only to %1 if the phone number comes at the end of an assertive utterance.

This approach to concatenation thus requires the inventory of sound files shown in Table 11-2.

Thus, the total number of recordings needed for this strategy is 1,200. This number may be too time-consuming and unwieldy for constrained development situations, but given the time and resources, you can get amazingly natural results. As in all concatenation projects, natural-sounding results depend on recordings that match in volume, speed, timbre, energy level, and personality.

What makes this particular approach sound convincingly real is that the concatenation breaks coincide exactly with the points at which speakers might naturally insert pauses. Because of this alignment, there are no undesirable concatenation splices to disrupt the phonetic flow. Avoiding such splices is the topic of the next section.



Voice User Interface Design 2004
Voice User Interface Design 2004
ISBN: 321185765
EAN: N/A
Year: 2005
Pages: 117

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net