Custom Grammars for Speech Recognition

 Download CD Content

Overview

In the previous chapter, we built an application using the SAPI 5.1 SDK. With very little voice training, an end user could use the program for basic dictation and text-to-speech output. Although the general grammar used by the application works well for purposes like dictation, a very specific grammar can be beneficial in particular situations.

In this chapter, we build an application that allows number and mathematical operations to be entered by speech. The program then computes the answer automatically.

  Note 

The source code for the projects are located on the CD-ROM in the PROJECTS folder. You can either type them in as you go or you can copy the projects from the CD-ROM to your hard drive for editing.

Creating a Custom Grammar

Much of the programming in this application is similar to the previous SAPI example. Therefore, we begin by creating our custom grammar. The grammar rules used by SAPI are defined using XML (eXtensible Markup Language). This is very attractive to those with an HTML or any XML-derivative background and makes writing grammar fairly easy.

We use Notepad, although you could use any text editor, XML editor, HTML editor, or even Visual Studio if you prefer. Let's begin by opening Notepad. Add the following line to the empty document:

The grammar itself is surrounded by 'GRAMMAR' tags. The next line consists of the rules, the first (and only for our application) of which is the number rule, which has an 'ACTIVE' tag associated with it, meaning this is something the speech recognition engine should use.

Add the following lines to your code:


 

At this time, your Notepad document should contain the following lines:




Before moving on, we need to take a quick look at the various elements we'll encounter:

<L>: Defines an expression of alternate phrase recognitions. Each subelement represents a possible separate recognition in place of this element. It is a synonym of the LIST tag. Empty elements are not valid (i.e., the tag must have children). The LIST element can define a default property name (PROPNAME) or ID (PROPID), which is inherited by its child PHRASE elements.

<P>: Describes the PHRASE element. It is a synonym of the PHRASE element. An associated property name and value pair is generated only if the contents of this element are recognized. It is important to note that a P empty element is not allowed.

We also need to understand the grammar attributes:

<... VALSTR=''>: Specifies the string value to be associated with the semantic property (name/value pair)

<... PROPNAME=''>: Specifies the string identifier to be associated with the semantic property (name/value pair)

<... VAL=''>: Specifies the numeric value to be associated with the semantic property (name/value pair)

Now that we have the attributes and elements to work with, it's very easy to fill in the remaining part of the grammar. We need to recognize the following operations: plus, minus, times, divided by, quit, equal, and new.

Here are the operations:


 

plus

minus

times

divided by

QUIT

EQUAL

NEW

The last part of the XML file should handle all of the numeric entries from 0 to 30. Here are the entries:


 

zero

one

two

three

four

five

six

seven

eight

nine

ten

eleven

twelve

thirteen

fourteen

fifteen

sixteen

seventeen

eighteen

nineteen

twenty

twenty-one

twenty-two

twenty-three

twenty-four

twenty-five

twenty-six

twenty-seven

twenty-eight

twenty-nine

thirty

  Note 

The application will only recognize the values that you place into the file. As such, if you need to place, for example, something like '45,' you need to add the values up to 45 to the file.

The complete XML file should contain all of these lines as follows:




plus

minus

times

divided by

QUIT

EQUAL

NEW

zero

one

two

three

four

five

six

seven

eight

nine

ten

eleven

twelve

thirteen

fourteen

fifteen

sixteen

seventeen

eighteen

nineteen

twenty

twenty-one

twenty-two

twenty-three

twenty-four

twenty-five

twenty-six

twenty-seven

twenty-eight

twenty-nine

thirty

After you have entered all of the text into Notepad, you can save the XML file. To save a file in Notepad with an extension other than '.txt,' you must choose Save As from the File menu and then use quotation marks around the filename. In our case, we need to save the file as 'grammar.xml.' You can save this to your desktop or some other place that is easily accessible. Later, we'll copy this file to the 'bin' directory of our application so that is available when we run the program.

User Interface

The custom grammar is now finished and is probably the most important thing we are going to create. However, in order to test the XML-based grammar, we need to build an application that loads it. We begin with a GUI that consists of a few controls, shown in Figure 22.1. You can use the figure as a guide to add the controls found in Table 22.1 to the form:

click to expand
Figure 22.1: The finished GUI.

Table 22.1: Adding controls to the GUI

Type

Name

Text

TextBox

txtSpeech

TextBox1

Label

lblFirst

0

Label

lblOperand

+

Label

lblSecond

0

Label

lblAnswer

=

You already know that this application differs from the previous example because we are going to load a custom grammar. This is the biggest change, but it is definitely not the only one. One of the changes is in the way that recognition is handled. Rather than clicking a button to start the recognition, the application is speech-ready on startup. All input for this application is speech-enabled. That is, you never need a mouse or your pen to do anything. You have the ability to close the application and control all input using only speech.

Load Grammar

We begin the programming part of the application by adding the reference to the Microsoft Speech Object Library and then adding the Imports statement as we did in the previous example. We also create the same variables, although we don't have a need for m_bRecoRunning because the recognition engine is always running.

Here are the three Dim statements:

Dim WithEvents RecoContext As SpeechLib.SpSharedRecoContext
Dim Grammar As SpeechLib.ISpeechRecoGrammar
Dim m_cChars As Short

The Form_Load event will be used for initializing several variables and loading the grammar. Most of the code, with the exception of loading the grammar, should look very similar to the previous example. As you can see from the following code, we are loading our custom  Grammar.xml file. This is a good time to copy the  Grammar.xml file from the location you saved it to earlier, to the 'bin' folder for our project. Without this file, you simply receive an error message and the application does not run.

Here is the code for the procedure:

txtSpeech.Text = ""
m_cChars = 0
lblFirst.Text = ""
lblSecond.Text = ""
If (RecoContext Is Nothing) Then
 RecoContext = New SpeechLib.SpSharedRecoContext()
 Grammar = RecoContext.CreateGrammar(1)
 Grammar.CmdLoadFromFile(System.AppDomain.CurrentDomain.BaseDirectory() & "grammar.xml", SpeechLoadOption.SLOStatic)
 Grammar.DictationSetState(SpeechRuleState.SGDSInactive)
 Grammar.CmdSetRuleIdState(1, SpeechRuleState.SGDSActive)
End If

Recognition

The recognition event for this application is handled in exactly the same way the earlier application was handled. We use a Case statement to determine the recognized text and then perform the appropriate changes to the labels and text box. The application works as follows:

  1. The application is opened and everything is blank.
  2. Recognition starts for the first number (lblFirst).
  3. Recognition occurs for the operand (lblOperand).
  4. Recognition for the last number takes place (lblSecond).
  5. 'Equal' is said by the user to perform the calculation (lblAnswer).
  6. The user has several options. He can continue to dictate a different operand or second number and obtain different answers when doing so by saying 'Equal.' He can also say 'New' or 'Quit' to start a new problem or exit the application. If he starts a new equation, the project starts back at step 2.

Here is the code for the procedure:

Dim strText As String
strText = Result.PhraseInfo.GetText

Select Case strText
 Case "plus"
 lblOperand.Text = "+"
 Case "minus"
 lblOperand.Text = "-"
 Case "divided by"
 lblOperand.Text = "/"
 Case "times"
 lblOperand.Text = "*"
 Case "QUIT"
 End
 Case "EQUAL"
 If lblFirst.Text <> "" And lblSecond.Text <> "" Then
 Dim X, Y As Integer
 X = Int32.Parse(lblFirst.Text)
 Y = Int32.Parse(lblSecond.Text)
 If lblOperand.Text = "+" Then
 lblAnswer.Text = "= " & (X + Y).ToString
 ElseIf lblOperand.Text = "-" Then
 lblAnswer.Text = "= " & (X - Y).ToString
 ElseIf lblOperand.Text = "/" Then
 lblAnswer.Text = "= " & (X / Y).ToString
 ElseIf lblOperand.Text = "*" Then
 lblAnswer.Text = "= " & (X * Y).ToString
 End If
 End If
 Case "NEW"
 lblFirst.Text = ""
 lblSecond.Text = ""
 lblAnswer.Text = "="
 Case Else
 If lblFirst.Text = "" Then
 lblFirst.Text = Result.PhraseInfo.Properties.Item(0).Value
 Else
 If lblSecond.Text = "" Then
 lblSecond.Text = Result.PhraseInfo.Properties.Item(0).Value
 End If
 End If
End Select

txtSpeech.Text = Result.PhraseInfo.Properties.Item(0).Value

Testing the Application

Testing this application is very simple. When you run the application, it looks like Figure 22.2. Next, say a number, such as 'Three.' You should now see the application recognize the number in the text box, and it also changes the first of the two labels to the same number (see Figure 22.3). Next, give the application the operation you want to perform. For example, you can say 'Times' to perform multiplication (see Figure 22.4). The last number is entered in the same way as the first, so if you say 'Five,' it is displayed in the text box and changes the second label (see Figure 22.5). Finally, you can say 'Equal' to perform the calculation (see Figure 22.6).

click to expand
Figure 22.2: The application on startup.

click to expand
Figure 22.3: The application receives its first digit.

click to expand
Figure 22.4: The operand is recognized.

click to expand
Figure 22.5: The final number is recognized.

click to expand
Figure 22.6: The calculation is complete.

If you want to continue performing calculations, you can say 'New' to start over.

Summary

In this chapter, we created an application that essentially uses speech recognition to perform basic calculations. We created a custom grammar, so the application only recognizes the items we want it to. You can see how the grammars can be a very effective method to increase speech recognition accuracy for certain types of applications. In the next two chapters, we begin our look at the hardware of the Tablet PC and how we can develop software around it.



Developing Tablet PC Applications
Developing Tablet PC Applications (Charles River Media Programming)
ISBN: 1584502525
EAN: 2147483647
Year: 2003
Pages: 191

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net