| < Day Day Up > |
Building a Multimodal ApplicationDesign Considerations
Encouraging users to speak commands means offering them incentives to do so. You cannot simply speech-enable a traditional point-and-click application. It is not enough to place microphone icons
The
Loading the Sample ApplicationTable 4.1 contains a listing of the single project that comprises the sample application. To execute the sample code provided on the book's Web site, you will need to execute the following steps:
Table 4.1. Code Listing for the Chapter4 project. The files can be accessed by opening the Chapter4.sln file. Multimodal applications, unlike telephony applications, do not utilize prompts, so there is no Prompt database project for this solution.
Examining the Sample Web Pages
Login.aspx simply retrieves the login and password and then
Tip Readers executing the sample application can use the following login and password values to login: Login - JONESA Password - 030478 The main menu, which offers only two choices, Query Course Catalog and View My Schedule, is also not an appropriate place for speech processing. It is just as fast for the user to click a button as to speak a command phrase. Main Query PageThe query page ( Review.aspx ), however, is a good place to offer speech capabilities. Available when the user clicks Query Course Catalog, this page offers five options to narrow the search: course title, department, instructor, days, and start time. Figure 4.2 is a screenshot of the Review.aspx page as it appears on the client device. From this page, the student can click the Speak Query button and begin speaking a query. The sample application allows a combination of two of the search criteria. Figure 4.2. Screenshot of the Query Course Catalog page, Review.aspx. To speak a query, the user must click the Speak Query button. The student can also listen to detailed instructions by clicking the Instructions button.
Note
It would have been possible to allow for more combinations, but this would have expanded the
When designing a grammar file, developers should consider the size of the grammar file. A large grammar file is cumbersome to debug. Moreover, even when preloaded, it may slow the processing of the Web application. This is a
For tips on how to write effective grammars, refer to the Technical FAQ's section of the Microsoft Speech Web site at http://www.microsoft.com/speech/techinfo/faq/speechsdk.asp. Creating a Query ScriptBuilding grammars with the Grammar editor was introduced in Chapter 2. This chapter utilizes a complex grammar for querying the course catalog. It can be helpful to make a list of possible queries before beginning the grammar-building process. Listing 4.1 lists potential queries for the Review.aspx page. Brainstorming a list of queries helps to identify the complexity of the grammar. Creating such a list also provides a testing script that can be used later to confirm the grammar rules.
Note
The queries in Listing 4.1 are
Listing 4.1. Potential search queriesI want all courses for Financial Accounting 1 List classes where Doctor Jones is the teacher List all classes for Jan Jessup that start before nine a.m. Give me classes starting after two o'clock List all classes in the English department Show classes on Tuesday and Thursday that start after noon List Business classes scheduled for Monday, Wednesday and Friday List all classes in the Math department that start after six p.m. Controls Used on the Query Page
The Query Page,
Review.aspx,
contains Listen and Prompt controls. These controls are hidden from the user, but are activated when the user clicks one of two command
function SpeakQuery() {
Listen1.Start();
}
As opposed to the voice-only application in Chapter 3, the multimodal application relies on visible controls and not prompts to initiate dialogs. An exception is the instructional prompt, available when the user clicks the Instructions button.
Instructional prompts are useful for providing detailed information to the user and minimizing the amount of screen space necessary for such instructions. When the user clicks
Instructions
on Review.aspx, the Prompt control is
Understanding the Grammar
The Listen control uses a grammar file to identity valid user responses. For the query page, the Listen control references the SpeakQuery.grxml grammar file. The root rule for this grammar is named TopLevel. It is referenced in the src property for the grammar control as
<speech:Grammar Src="Grammars/SpeakQuery.grxml#TopLevel"
>
To keep the grammar file organized and readable, it is broken down into component grammar files. The additional grammars are then referenced through RuleRef elements. Since there are five query choices, each is represented by a separate grammar file.
Most of the grammar files are built dynamically because they involve matching values in a changing database. Introduced in Chapter 3, dynamic grammars involve the creation of XML-based grammar files at the time of application execution. This is useful when the grammar contains content that is likely to change. As opposed to grammar files designed by the developer using the Grammar Editor, dynamic grammars are rewritten as often as necessary. The following code is used to build grammar files containing department
'** Example of programmatically defining the grammar
Dim sb As New StringBuilder
'Define the high level tags used in the grammar file
sb.Append(_
"<grammar xml:lang=""en-US"" tag-format=""semantics-ms/1.0"" " _
& "version=""1.0"" root=""" + Name + "Rule"" mode=""voice"" " _
& "xmlns=""http://www.w3.org/2001/06/grammar"">")
sb.Append("<rule " + Name + "Rule"" scope=""public"">")
'Define one-of elements which indicates a list
sb.Append("<one-of>")
Dim dr As SqlDataReader = SqlHelper.ExecuteReader( _
AppSettings("Chapter4.Connection"), _
CommandType.StoredProcedure, sp)
'Loop through the Datareader
Dim _DynamicGrammar As New DynamicGrammar
Do While dr.Read()
sb.Append(_DynamicGrammar.ConvertGrammarItem(Name, _
dr.GetString(0), dr.GetValue(1)))
Loop
_DynamicGrammar = Nothing
'Closing tags
sb.Append("</one-of>")
sb.Append("</rule>")
sb.Append("</grammar>")
'Write out the new file
Dim xDoc As XmlDocument = New XmlDocument
xDoc.LoadXml(sb.ToString)
xDoc.Save(fileloc)
'release resources
If Not dr.IsClosed Then
dr.Close()
End If
dr = Nothing
sb = Nothing
The dynamic grammar code is used to create an XML-based file that if
<grammar xml:lang="en-US" tag-format="semantics-ms/1.0"
version="1.0" root="DepartmentRule" mode="voice"
xmlns="http://www.w3.org/2001/06/grammar">
<rule scope="public">
<one-of>
<item>
<item>Business</item>
<tag>$.Department = "5"</tag>
</item>
<item>
<item>English</item>
<tag>$.Department = "4"</tag>
</item>
<item>
<item>Liberal Arts</item>
<tag>$.Department = "2"</tag>
</item>
<item>
<item>Math</item>
<tag>$.Department = "1"</tag>
</item>
<item>
<item>Science</item>
<tag>$.Department = "3"</tag>
</item>
</one-of>
</rule>
</grammar>
To improve scalability, the grammar files will be built once the application session is initialized, as opposed to every time the query page is loaded. In the page_load function for Review.aspx, a public variable named blnQueryGrammarsLoaded is checked. If it contains a value of false, the dynamic grammars for title, department, and instructors are built and the session variable is assigned a value of true. Preambles and PostamblesSince the user's query will probably be phrased as a question, the grammar needs to account for unnecessary language. In the phrase "List classes where Doctor Jones is the teacher," the application is only interested in "Doctor Jones" and not the other words in the phrase. Optional words in a spoken phrase are known as preambles and postambles. A special grammar file is created for each of these. In the case of preambles, the PreambleRule contains three optional lists that contain possible words or subphrases. In Visual Studio.NET, the PreambleRule can be viewed by double-clicking the QueryPreamble.grxml file in the Grammars subdirectory of Solution Explorer . From the Grammar Explorer window, double-click PreambleRule.
Figure 4.3 shows the contents of the PreambleRule. Note that a circular icon and the
Figure 4.3. Screenshot of the PreambleRule as seen in the Grammar Editor inside Visual Studio. Preambles represent optional words spoken at the beginning or middle of a student's query.
The preamble rule should be used with other rules in which the phrases are not optional. In this manner, spoken phrases that include any of the
For example, if the user spoke the query "Give me all classes for Doctor Davis," the following SML would be returned:
<SML confidence="1.000" text="Give me all classes for Doctor Davis"
utteranceConfidence="1.000">
<Instructor confidence="1.000">
<InstructorTitle>Doctor</InstructorTitle>
<InstructorLastName>Davis</InstructorLastName>
</Instructor>
</SML>
In this case, the query was typed into the Recognition string textbox inside the Grammar editor. Therefore, the speech recognition engine ranked the confidence score as 1.000, or 100 percent. The confidence score is a value
The SML also contains values for two semantic items, InstructorTitle and InstructorLastName. Alternatively, if the user spoke the query "Courses for Doctor Davis," the same two semantic items would be returned. In the case of the second query, items from only two of the lists used in the preamble rule were
The PostambleRule, part of the QueryPostamble.grxml file, works in much the same way as the PreambleRule. In this case, optional words are spoken after the key words. Since these words are optional, a phrase such as "Doctor Davis
Base RulesThe TitleRule, DepartmentRule, InstructorRule, DayRule, and StartTimeRule are base rules included in SpeakQuery.grxml. For each of the page elements, such as title and department, there is a corresponding base rule. If you look at the TitleQuery rule in grammar editor (see Figure 4.4), you will see that the PreambleRule and PostambleRule are both referenced. The TitleRule, which lies in the middle, is dynamically built and contains an entry for each current course title in the database. Figure 4.4. Screenshot of the grammar editor as it displays the TitleQuery rule which is referenced in SpeakQuery.grxml.
Once the base rules were built, the TopLevel rule was
Figure 4.5. Screenshot of the grammar editor as it displays the TopLevel rule. Phrases can be
|
| < Day Day Up > |