11.4 Intonation


The most critical aspects of nonverbal meaning are often communicated through intonation. Technically defined, intonation refers to the linguistic use of movement of pitch, but we can think of it simply as melody in speech. Intonational meaning is expressed by different pitch levels, called tones, which are connected to form tonal sequences known as contours. Intonation is rarely represented in writing. One obvious case is the opposition between a statement and a yes/no question; another is the use of italics for emphasis. In general, however, intonational features can be written down only by using special transcription systems (Crystal 1995).

Our discussion of intonation is divided into two parts. First is a brief introduction to the basic intonation patterns or contours of American English. The second discusses the uses of these contours in context.

11.4.1 Basic Intonation Contours

Think of some different ways that you can say the number "four." You can say "four" as an answer to a question: "Four!" The same word can function as a question all by itself: "Four?" Yet another possibility is, "If you guessed 'four,' you're absolutely right."

These three ways of saying "four" exemplify the basic contours that characterize American English intonation patterns. There are many others, but these three are essential for prosody-sensitive concatenation. The basic contours are as follows:

graphics/sound_icon.gif
  • Rising-falling, final (e.g., "Four!")

  • Rising (e.g., "Four?")

  • Rising-falling, nonfinal (e.g., "If you guessed 'four' . . .")


To characterize these three contours, let's assume for simplicity only three relative levels of pitch: Level 1 is the lowest, level 2 is midrange, and level 3 is the highest.

A few disclaimers are in order:

  • These numbers are not in any way intended to represent precise pitch values. They are best regarded as ranges of possible pitches. For example, the rising-falling, nonfinal contour both starts and ends at a midpitch level, but this does not mean that it starts and ends on exactly the same melodic note. Rather, the contour starts and ends in the same relative pitch range.

  • The intonation system proposed here is not to suggest that there are only three levels of tone. For example, a fourth tone, higher than level 3, is often used in particularly emotive discourse, but this is not relevant to the VUIs we have deployed.

  • The inventory of contours presented here is not intended to be exhaustive. They are only the most basic ones.

  • The contours presented here are greatly simplified and are intended to serve as abstractions. For example, there may be a number of intonational peaks in any of these contours, depending on the details of sentence-level stress placement.

Rising-Falling, Final

Figure 11-1 shows the default pattern for simple declarative sentences. However, like many of the patterns that follow, this particular contour has additional uses. These are relevant for concatenation and are described shortly.

graphics/sound_icon.gif

Figure 11-1. The rising-falling, final contour is the default pattern for simple declarative sentences.

graphics/11fig01.gif

Remember that the graphical representations of intonation contours throughout this chapter are not intended for accuracy but serve only as basic sketches. These abstractions are effective in illustrating certain key aspects of intonational grammar, in particular the concept of boundary tones, which is explained shortly.

The delivery of a statement as depicted in Figure 11-1 is neutral. By default, it begins at a midlevel, then rises, and then falls to a relatively low pitch. As noted earlier, the peak, which reaches level 3, predictably falls on the stressed syllable of the rightmost open-class item of the utterance here, "VOICE mail." (The prominence peak is marked by the use of SMALL CAPS.) Of course, because the statement peaks on "VOICE mail," it cannot serve as a response to "WHO is checking voice mail?" or "What are you DOING with your voice mail?" In these cases, the peaks would be "I" and "CHECKING," respectively.

Significantly, in English intonation structure, intonational meaning is signaled at the right edge (the end) of the utterance, and often at the right edge of smaller phrasal constituents. These tonal markers are known as boundary tones (Pierrehumbert 1980). This concept is crucial not only for proper concatenation but also for the proper recording of prompts in general.

For convenience, linguists represent boundary tones by using the percent sign (%). This use of the percent sign has nothing at all to do with arithmetic; for example, "%1" means that a particular utterance or phrase ends at pitch level 1. Because the three intonation contours are distinct at their right edges and because the rising-falling, final pattern ends at pitch level 1, let's refer to this pattern as contour 1.

Rising

The rising contour (Figure 11-2) is most often associated with yes/no questions. This contour also serves other functions, described later.

graphics/sound_icon.gif

Figure 11-2. The rising contour is often associated with yes/no questions.

graphics/11fig02.gif

Like contour 1 in Figure 11-1, the rising contour starts at midlevel but ends in a high boundary tone, %3. We therefore refer to the rising pattern as contour 3.

Rising-Falling, Nonfinal

This pattern corresponds to the first of the pair of clauses (the dependent or subordinate clause) in each sentence in Figure 11-3. (The second in each pair is a type 1 contour, as described earlier.)

graphics/sound_icon.gif

Figure 11-3. The rising-falling, nonfinal contour reflects complex sentences.

graphics/11fig03.gif

The dependent clauses "as soon as Pat got home," "to speak to an operator," and "as of June twenty-third" are in the nonfinal position. There is more to follow to complete the thought, and this is signaled intonationally via a contour that does not fall as low as in the case of final phrases. Because this pattern ends at midlevel, %2, let's refer to it as contour 2.

Contour 2 can sometimes be used as a nonfinal building block for longer contour 1 messages. See, for example, the concatenation plan for sentences such as "Transferring one hundred dollars from checking to savings" in Figure 11-4.

graphics/sound_icon.gif

Figure 11-4. You can use contour 2 in association with contour 1 messages.

graphics/11fig04.gif

Ideally, this utterance would constitute a single breath group, ending at %1. To facilitate recording and concatenation, however, it may be helpful to think of Figure 11-4 as a linking of two smaller constituents: contour 2 plus contour 1. In this example, the concatenated phrase "Transferring | X dollars" could be spoken as a contour 2 pattern, whereas the phrase "from savings to checking" must be a contour 1 pattern.

Later in this chapter, you will see that contour 2 is especially relevant for the concatenation of phone numbers, which begin with a kind of subordinate clause (in the case of seven-digit numbers) or with two such clauses (in the case of ten-digit numbers).

11.4.2 Contours in Context

Now we turn to the use of the basic contours in context. This discussion is particularly relevant to the concatenation of messages.

Lists

Lists are ubiquitous in speech applications, particularly when you offer the user numerous choices, as in menu and help prompts. Depending on the nature of the application, it is sometimes desirable or even necessary to concatenate lists when the choices vary or when the length of the list itself is variable, as is often the case with dynamically generated lists.

With proper planning, it is easy to make concatenated lists sound natural. This is because list items are often surrounded by a slight pause in naturally occurring discourse, especially when the delivery is slow or emphatic. Because these concatenation units are cushioned by silence, less attention is drawn to the undesirable splicing effect that generally marks concatenation junctures.

Figure 11-5 depicts default list intonation, although other list patterns are also possible.

graphics/sound_icon.gif

Figure 11-5. This sentence uses default list intonation.

graphics/11fig05.gif

Lists consist of a series of contour 3 phrases, ending with contour 1.[4]

[4] There are other, less common intonation patterns for lists. See, for example, Quirk and Greenbaum (1973).

Figure 11-6 shows the concatenation plan for an error prompt that provides the caller with a list of choices.

graphics/sound_icon.gif

Figure 11-6. This concatenation plan helps listeners comprehend a list of choices.

graphics/11fig06.gif

Note that "and sports," as well as "and rye," are best treated as a single concatenation unit. The reason is that in natural speech, this phrase is phonetically continuous, with no break between the conjunction ("and,") and the final option. Notice also the use of pauses. These correspond with naturally occurring breaks and ensure a more natural result despite concatenation.

Yes/No Questions

Questions that expect a response of either yes or no are called yes/no questions. In North American English, these sentences generally conform to the basic rising pattern. Concatenation items that fall at the end of such questions must therefore conform to contour 3, ending in a high tone.

The example in Figure 11-7 show the concatenation plan for a prompt confirming dates for example, "Did you say Saturday, January first?" or "Did you say Monday, May twenty-second?"

graphics/sound_icon.gif

Figure 11-7. This concatenation plan covers a prompt that confirms dates.

graphics/11fig07.gif

The concatenated result should convey contour 3, that is, a steady rise from level 2 to 3. In addition, phrasing should reflect natural speech. When we say a date such as "Wednesday, June seventh," it is possible to pause between the day of the week and the rest, as indicated by the comma. Observing this pause during recording will facilitate natural-sounding concatenation, because the pause will coincide with the concatenation splice. In contrast, because the month and the ordinal constitute a continuous, fluid stream in natural speech, the voice actor should leave only the slightest of pauses between the month and the ordinal only enough for the sound engineer to isolate the desired wave files, but not so much as to suggest an explicit break or sense of separation.

If it is feasible, you can avoid concatenation altogether by recording all necessary yes/no possibilities for a given context, as in the prompt set in (17).

(17)

Did you want the home phone?

Did you want the work number?

Did you want the cell phone number?


There is no need to concatenate such simple questions as these. Nonconcatenated prompts will sound better.

Wh- Questions

Wh- questions are questions that start with who, what, where, when, why, and how. They are also called information questions, because their function is to elicit information, in contrast to yes/no questions, whose function is to elicit either an affirmation, agreement, or acceptance, on the one hand, or denial, dissent, and rejection on the other.

Wh- questions have two possible intonation structures, depending on the underlying intent. Usually, Wh- questions function as first-time requests for information. Figure 11-8 depicts the basic pattern for Wh- questions.

graphics/sound_icon.gif

Figure 11-8. The basic intonation pattern of Wh- questions is contour 1.

graphics/11fig08.gif

The default intonation structure of Wh- questions is contour 1. There is, however, another possible use of Wh- questions, which is to request repetition or clarification, as when we hear "What's your name?" or "What was your name again?" (Figure 11-9).

graphics/sound_icon.gif

Figure 11-9. The default intonation structure of Wh- questions that ask for repetition or clarification is contour 3.

graphics/11fig09.gif

The intonation of requests for repetition or clarification is contour 3, the same as for yes/no questions.

The prosodic distinction between Figure 11-8 and Figure 11-9 is important for recording prompts and messages, regardless of concatenation. The function of most error prompts is to request repetition. The voice actor should therefore inflect such requests as in Figure 11-9, with contour 3, rather than as in Figure 11-8, with contour 1. Of course, if the voice actor is reading a list of prompts bereft of context and if direction is inadequate, there is little chance of capturing the appropriate prosody.

As you have seen, when contrastive stress is used appropriately, the use of suitable intonation contours also gives the impression that the system possesses humanlike intelligence and is attentive to the progress of the dialog. Consider example (18).

graphics/sound_icon.gif

(18)

CALLER:

Get a quote.

SYSTEM:

Who do you want a quote for?

CALLER:

[utterance not recognized]

SYSTEM:

Sorry, who do you want a quote for?


The first prompt should be contour 1 (falling), and the second prompt contour 3 (rising). In this way, the system's intonational behavior will comply with callers' expectations based on their experience with authentic conversations. If, however, the first prompt is recorded with contour 3, then it will appear to the caller that the system misrecognized "Get a quote" as a request for a quote on a specific company but that it failed to recognize the company name, as if the caller had said "Get me a quote for Blah Blah Blah Incorporated." The caller would be justified in responding, "Hold on! I haven't said for who yet!" The appropriate use of intonation is crucial not only for instantiating a persona that possesses linguistic intelligence and attentiveness to the caller's needs, but also for making the interface comprehensible and usable.

Before we move on to the next intonation pattern, let's briefly evaluate a popular but wrong idea about questions in English. It can be summed up as follows: "Whenever you see a question mark, your voice should go up." Some English teachers dispense this advice to nonnative speakers, and there are even text-to-speech engines that have been designed to uniformly impose a rising intonation contour on all sentences ending in a question mark, including Wh- questions. As you have seen in this section, however, first-time requests for Wh- information do not "go up" without coming back down, and down low. The next section describes another type of question that also ends on a low tone.

Either/Or Questions

Either/or questions aim to get the listener to choose one of two or more explicitly stated options rather than provide a simple yes or no. Because they function as information questions, they end on a low tone (%1), in accordance with first-time information requests, described earlier. Figure 11-10 shows an example of an either/or question.[5]

[5] Of course, it is also possible to use a continuously rising pattern, but then the question in Figure 11-10 becomes a yes/no question. The answer would be "Yeah, soup" or "Yes please, salad."

graphics/sound_icon.gif

Figure 11-10. An either/or question ends on a low tone.

graphics/11fig10.gif

Questions with more than two options, as in Figure 11-11, follow the intonation pattern typical of lists.

graphics/sound_icon.gif

Figure 11-11. Multiple-choice questions follow the intonation pattern of lists.

graphics/11fig11.gif

The intonation patterns of simple as well as complex either/or questions can be generalized as follows: The last list item must fall to a low tone (contour 1), whereas the others must rise to a high tone (contour 3).

As in the case of lists, "or" plus the final option should be recorded as a continuous chunk, without the intrusion of a concatenation splice.

This pattern has other uses, such as reporting a pair of temperatures in a weather report (19).

graphics/sound_icon.gif

(19)

. . . with a high of twenty-TWO, and a low of thirTEEN.




Voice User Interface Design 2004
Voice User Interface Design 2004
ISBN: 321185765
EAN: N/A
Year: 2005
Pages: 117

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net