Many activities during design and development contribute to accuracy. Some we have covered earlier (e.g., the use of acoustic adaptation, mentioned in Chapter 2). Others are covered in Chapter 15 (tuning recognition parameters) and Chapter 16 (tuning grammar coverage). In this chapter, we cover some of the design choices you can make during detailed design that can help optimize the accuracy of difficult recognition tasks. A number of common recognition tasks are extremely challenging. One example is the alphabet. The names of 25 of the 26 letters in the English alphabet have one syllable and thus provide limited acoustic information. Furthermore, 9 of the letter names rhyme with E; these are often referred to as the eset. Another 4 have a vowel rhyming with A. Another two F and S are hard to distinguish over the phone. Telephones don't transmit frequencies higher than about 3,500 cycles per second, which is where most of the information to distinguish F and S lies. Recognition of alphabetics comes up in applications requiring spelling (e.g., of person names or street names) and often as part of account IDs. Another common and challenging recognition task is digit strings. Most of the names of the digits have one syllable. Even if recognition of individual digits is extremely high, when you string many of them together, the recognition rate on the entire string (getting every digit right) may be low. Many common recognition tasks require digit-string recognition for example, account numbers, PINs, telephone numbers, credit card numbers, and social security numbers. When handling tough recognition problems such as strings of alphabetics, digits, or a combination of both (alphanumerics), you should look for all possible ways to constrain which strings are valid, as well as all possible sources of knowledge that can be brought to bear to limit the possibilities the recognizer must consider. In general, you can apply these constraints either by building them into the grammar or by postprocessing an N-best list (i.e., choose the first item on the N-best list that fulfills the constraint). It is best to build these constraints into the grammar, if at all possible. This approach will result in higher accuracy. Here are examples of encoding structure in grammar:
Here are examples of encoding structure by post-processing an N-best list:
Another difficult recognition problem is recognition from an extremely long list, such as street names. One approach for maximizing accuracy is to constrain the grammar dynamically (as the application is running) based on information previously supplied by the caller. For example, if the zip code is known, you can dynamically load a grammar with only the streets for that zip code. Dynamic grammars are discussed in Chapter 16. In general, recognition from long lists can be improved by incorporating probabilities into the grammar based on in-service data from the application. |