Advanced Microsoft Agent

Table of contents:

Download CD Content

In this chapter, we're going to continue on with the example we started in the previous chapter. In addition to offering the ability to convert text information to speech, Agent also allows us to take speech input, which can be manipulated in a variety of ways.

The source code for the projects are located on the CD-ROM in the PROJECTS folder. You can either type them in as you go or you can copy the projects from the CD-ROM to your hard drive for editing.

Introduction to Speech Recognition

If you remember back to Chapter 19, Getting Started with Microsoft Agent, we installed several engines when we installed Microsoft Agent. One of these engines was the speech recognition engine that we use in this chapter. Most speech recognition engines convert incoming audio data to engine-specific phonemes (the smallest structural unit of sound that can be used to distinguish one utterance from another in a spoken language), which are then translated into text that an application can use.

In continuous speech recognition, clients can speak to the system naturally, and the system keeps up with it. On the other hand, discrete recognition requires a user to speak very deliberately and pause between each word. At first glance, it might appear that continuous recognition would always be preferred over discrete recognition. After all, anything you would ever want to do could be accomplished with it. However, continuous speech recognition requires much more processing power, which isn't always available.

If you are planning to develop an application for dictation, you must support a very large vocabulary of words, whereas smaller vocabularies are a satisfactory way to allow a user to send commands to their computers. We use small vocabularies for our applications because we are developing command-based programs.

If you have ever used any type of speech recognition before, you know that you are often required to go through a series of tests to train your system for your voice and speaker. Speaker-independent speech recognition works well with very little or even absolutely zero training, whereas speaker-dependent systems require training, which could amount to hours of your time.

Agent uses 'Command and Control' speech recognition, which is continuous, has a small vocabulary, and is speaker independent. With this engine, we can create several hundred different commands or phrases for a program to recognize. If the engine does not recognize a command you give it, the speech-recognition system has two possibilities. The engine could return 'not recognized,' or it could even mistake it for a command similar to the one you intended. With this in mind, the user of an application must be given the list of phrases they can say, and it is preferable to list them on the screen. You can display the list of commands a given application 'listens' for by using the Agent Command window. If speech recognition is disabled, the Voice Commands window is still displayed with the text 'Speech input disabled.' A character can also have a language setting, so if no speech engine is installed that matches the character's language setting, the window displays, 'Speech input not available.' If the application has not defined voice parameters for its commands, the window displays, 'No voice commands.' You can also query the properties of the Voice Commands window (see Figure 20.1) regardless of whether speech input is disabled or a compatible speech engine is installed.

click to expand
Figure 20.1: The Voice Commands window.

Speech recognition is not enabled at all times, so the end user of an Agent application needs to press a 'push-to-talk' key before voice input is enabled. Once enabled, a special ToolTip appears. This listening tip (see Figure 20.2) displays contextual information associated to the current input state.

click to expand
Figure 20.2: The current state of the character is visible in the listening tip.

Programming Basics

With the background we have from the previous chapter, it's very easy to make a character recognize a user's speech input. The first thing we need to do is add commands to a character. The commands are the words or phrases that Agent can recognize when a user speaks through his microphone. The character will not recognize any command given by the user unless you program a command for it. To add a command to a character, add the following to your code:

Character.Commands.Add("Say Hello.", , "Say Hello.", True, True)

Let's break down the previous line into its various pieces so we can understand the 'Say Hello' command. The first 'Say Hello.' is the name of the command name. The command name is how the character 'hears' the commands. The second 'Say Hello.' is the command voice, which is what the listening tip shows. For example, if we had a command name of 'Hello' and the command voice was 'Hello1,' then the listening tip would say 'Character heard Hello1' even though the user actually said 'Hello.' The first of the two 'True' statements specifies if the command is enabled, whereas the last one sets the visibility. We're going to define some real commands for our character later in this chapter, but this is an easy-to-follow example so you can see how the commands are broken down.

Adding the command is part of the equation. However, at this time, a character does not do anything if it hears our command. We need to add a procedure to make a character respond to the 'Say Hello.' command. We can use the Agent_Command Sub and an If...Then statement to determine what the character says and how it will respond. Continuing with our 'Say Hello.' example, the following procedure allows the character to say hello:

Private Sub Agent1_Command(ByVal sender As Object, ByVal e As 
AxAgentObjects._AgentEvents_CommandEvent) Handles Agent1.Command
 Dim command As AgentObjects.IAgentCtlUserInput = CType(e.userInput, 
AgentObjects.IAgentCtlUserInput)
If command.Name = "Say Hello." Then
 Character.Speak("Hello")
 End If
End Sub

Advanced Commands

Although the 'Say Hello' command we used to demonstrate the speech recognition process is very simple, the commands we use often become much more complicated. When we define the voice grammar, we can use punctuation and symbols to enhance our offerings. These options include brackets ([ ]), stars (*), addition signs (+), parentheses ('('and ')'), and vertical bars (|).

The following list details the grammar options:

'*' and '+': You use the star to specify zero or more instances of a word. For example, if you use a command such as 'Please* open file,' the character recognizes it regardless of how many times 'Please' was said. That is, if 'Open file' is said, 'Please open file' is recognized. Likewise, 'Please Please open file' is also recognized as 'Please open file.' You use the plus sign in the same way, but the plus sign requires one or more instances of the word.

'[ ]': You can use brackets to indicate optional words. This is similar to the '*' command. For example, we can use '[Please] open file' like we did 'Please* open file.' Please is optional in either example.

'( )': Parentheses are used to indicate alternative words. As an example, you can use 'Please open (the) file,' which allows 'the' to be optional for the command.

'|': You use the vertical bar to let the character know that the word can be pronounced in different ways. As an example, some people may pronounce the word 'the' differently. That is, someone could pronounce it as 'thee,' whereas others pronounce it as 'the.' You can take care of this problem using the vertical bar as follows: 'Open (thee|the) file.' This example combined two of the options to allow the user to say 'Open thee file,' 'Open the file,' or 'Open file' and all are recognized by the application. You could also use numbers with the vertical bar: 'Please call(five|five|five|one|two|one|two)' allows the person to say 'Please call 555-1212.'

Reacting to Other Events

Agent can also respond to other events such as when a user clicks on the character. The click action causes Agent to fire an event that passes back the button that was clicked, any modifier keys that were pressed, and the x and y coordinates of the mouse. An example event handler for this is as follows:

Sub Agent_Click(ByVal CharacterID As String, ByVal Button As Integer, 
ByVal Shift As Integer, ByVal X As Integer, ByVal Y As Integer)

 Agent1.Speak "Click event"

End Sub

Note

Agent also includes support for a context menu, when the user right-clicks the character.

Real World Usefulness

Now that we have seen how to add commands to Agent, we will use them in our example. Open the file from the previous chapter and add two buttons with the properties shown in Table 20.1.

Table 20.1: Adding btnProperties and btnListen
Name	Text
btnProperties	Properties
btnListen	Listen

You can also replace the TextBox1 control with an InkEdit control. You can see the buttons added to the form and the replaced controls in Figure 20.3.

click to expand
Figure 20.3: The GUI is complete.

Next, in the Code Editor, change any references for TextBox1 to InkEdit1, and remove any references that set a character to show and hide because we want the character to be visible at all times. In addition, remove the 'Hello World!' text from initializing InkEdit1 as follows:

Private Sub Form1_Load(ByVal eventSender As System.Object, ByVal 
eventArgs As System.EventArgs) Handles MyBase.Load
 On Error GoTo handler
 Agent1.Characters.Load("Merlin", sChar)
 Character = Agent1.Characters("Merlin")
 Character.LanguageID = &H409S
 InkEdit1.Text = ""
handler:
 If Err.Number <> 0 Then
MessageBox.Show("Description: " & Err.Description, "Error!", MessageBoxButtons.OK, _
MessageBoxIcon.Warning.Warning, MessageBoxDefaultButton.Button1)
 Err.Clear()
End Sub

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
 Character.Speak(InkEdit1.Text)
End Sub

Next, we can add the commands to our character using the Form_Load event. For starters, let's add a caption for our commands (the lines can be added before the error handler and directly beneath initializing InkEdit1 with an empty string):

Character.Commands.Caption = "Test commands"

The next step is to add several commands to our character. We're going to instruct the character to read the clipboard, say the contents of InkEdit, and close all via speech input. Here is the code:

Character.Commands.Add("Read Clipboard", "Read Clipboard", True, True)
Character.Commands.Add("Say Ink", "Say Ink", True, True)
Character.Commands.Add("exit|close|quit", "exit", True, True)
Character.Show()

We now need to create the event to respond to the Agent's commands. We use a Case statement to look at the possible choices and respond appropriately. Here is the code for the procedure:

Private Sub Agent1_Command(ByVal sender As Object, ByVal e As 
AxAgentObjects._AgentEvents_CommandEvent) Handles Agent1.Command
 Dim command As AgentObjects.IAgentCtlUserInput = CType(e.userInput, AgentObjects.IAgentCtlUserInput)

 Select Case command.Name
 Case "exit"
 End
 Case "Say Ink"
 Button1.PerformClick()
 Case "Read Clipboard"
 Character.Speak(Clipboard.GetDataObject())
 End Select
End Sub

By default, the Scroll Lock key instructs the Agent character to listen to commands. Although this is fine for many situations, a Tablet PC user may or may not have access to a keyboard. Therefore, we provide two ways to cause the character to listen for keys.

First, we create a click event for btnListen. This procedure begins by stopping anything that the character is currently doing. Next, we set the character to listen:

Private Sub btnListen_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnListen.Click
 Character.StopAll()
 Character.Listen(True)
End Sub

We've already mentioned the Tablet PC user may or may not have access to their keyboard. At this time, the user could click btnListen to start listening for any voice commands. This works similarly to the push-to-talk Listening hotkey, but if the user prefers to press one of the buttons they have available (such as the Tab key, which is included on the HP 1000 Tablet PC when in slate mode), we should provide this option. Unfortunately, we cannot set the push-to-talk Listening hotkey programmatically in our application, but we do have access to the Agent's Property sheet (see Figure 20.4), which allows the user to set the key themselves:

Private Sub btnAgentProperties_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnAgentProperties.Click
 Agent1.PropertySheet.Visible = True
End Sub

click to expand
Figure 20.4: The Property sheet provides options for the end user.

You can now test the application to make sure that it functions correctly. You should spend the time to test both the voice input (see Figure 20.5) and speech output (see Figure 20.6), along with changes to the Agent Property sheet.

click to expand
Figure 20.5: Listing for a command.

click to expand
Figure 20.6: Reading content of InkEdit Control.

Summary

In this chapter, we built our second application with Microsoft Agent. In this example, we used Agent and its speech recognition engine to allow us to give commands verbally. In Chapter 21, Speech Input with SAPI, we build another speech recognition program, but it will be our first with the Speech SDK version 5.1.