Section 17.14. Microsoft Agent

17.14. Microsoft Agent

Microsoft Agent is a technology used to add interactive animated characters to Windows applications or Web pages. Microsoft Agent characters can speak and respond to user input via speech recognition and synthesis. Microsoft employs its Agent technology in applications such as Word, Excel and PowerPoint. Agents in these programs aid users in finding answers to questions and in understanding how the applications function.

The Microsoft Agent control provides programmers with access to four predefined charactersGenie (a genie), Merlin (a wizard), Peedy (a parrot) and Robby (a robot). Each character has a unique set of animations that programmers can use in their applications to illustrate different points and functions. For instance, the Peedy character-animation set includes different flying animations, which the programmer might use to move Peedy on the screen. Microsoft provides basic information on Agent technology at

www.microsoft.com/msagent

Microsoft Agent technology enables users to interact with applications and Web pages through speech, the most natural form of human communication. To understand speech, the control uses a speech-recognition enginean application that translates vocal sound input from a microphone to language that the computer understands. The Microsoft Agent control also uses a text-to-speech engine, which generates characters' spoken responses. A text-to-speech engine is an application that translates typed words into audio sound that users hear through headphones or speakers connected to a computer. Microsoft provides speech-recognition and text-to-speech engines for several languages at

www.microsoft.com/msagent/downloads/user.asp

Programmers can even create their own animated characters with the help of the Microsoft Agent Character Editor and the Microsoft Linguistic Sound Editing Tool. These products are available free for download from

www.microsoft.com/msagent/downloads/developer.asp

This section introduces the basic capabilities of the Microsoft Agent control. For complete details on downloading this control, visit

www.microsoft.com/msagent/downloads/user.asp

The following example, Peedy's Pizza Palace, was developed by Microsoft to illustrate the capabilities of the Microsoft Agent control. Peedy's Pizza Palace is an online pizza shop where users can place their orders via voice input. The Peedy character interacts with users by helping them choose toppings and calculating the totals for their orders. You can view this example at

www.microsoft.com/agent2/sdk/samples/html/peedypza.htm

To run the example, you must go to www.microsoft.com/msagent/downloads/user.asp and download and install the Peedy character file, a text-to-speech engine and a speechrecognition engine.

When the window opens, Peedy introduces himself (Fig. 17.28), and the words he speaks appear in a cartoon bubble above his head. Note that Peedy's animations correspond to the words he speaks.

Figure 17.28. Peedy introducing himself when the window opens.

Programmers can synchronize character animations with speech output to illustrate a point or to convey a character's mood. For instance, Fig. 17.29 depicts Peedy's Pleased animation. The Peedy character-animation set includes 85 different animations, each of which is unique to the Peedy character.

Figure 17.29. Peedy's Pleased animation.

Look-and-Feel Observation 17.1

Agent characters remain on top of all active windows while a Microsoft Agent application is running. Their motions are not limited by the boundaries of the browser or the application window.

Peedy also responds to input from the keyboard and mouse. Figure 17.30 shows what happens when a user clicks Peedy with the mouse pointer. Peedy jumps up, ruffles his feathers and exclaims, "Hey, that tickles!" or "Be careful with that pointer!" Users can relocate Peedy on the screen by dragging him with the mouse. However, even when the user moves Peedy to a different part of the screen, he continues to perform his preset animations and location changes.

Figure 17.30. Peedy's reaction when he is clicked.

Many location changes involve animations. For instance, Peedy can hop from one screen location to another, or he can fly (Fig. 17.31).

Figure 17.31. Peedy flying animation.

Once Peedy completes the ordering instructions, a tool tip appears beneath him indicating that he is listening for a voice command (Fig. 17.32). You can enter the type of pizza to order either by speaking the style name into a microphone or by clicking the radio button corresponding to your choice.

Figure 17.32. Peedy waiting for speech input.

If you choose speech input, a box appears below Peedy displaying the words that Peedy "heard" (i.e., the words translated to the program by the speech-recognition engine). Once he recognizes your input, Peedy gives you a description of the selected pizza. Figure 17.33 shows what happens when you choose Seattle as the pizza style.

Figure 17.33. Peedy repeating a request for Seattle-style pizza.

Peedy then asks you to choose additional toppings. Again, you can either speak or use the mouse to make a selection. Checkboxes corresponding to toppings that come with the selected pizza style are checked for you. Figure 17.34 shows what happens when you choose anchovies as an additional topping. Peedy makes a wisecrack about your choice.

Figure 17.34. Peedy repeating a request for anchovies as an additional topping.

You can submit the order either by pressing the Place My Order button or by speaking "Place order" into the microphone. Peedy recounts the order while writing down the order items on his notepad (Fig. 17.35). He then calculates the figures on his calculator and reports the total price (Fig. 17.36).

Figure 17.35. Peedy recounting the order.

Figure 17.36. Peedy calculating the total.

Creating an Application That Uses Microsoft Agent

[Note: Before running this example, you must first download and install the Microsoft Agent control, a speech-recognition engine, a text-to-speech engine and the four character definitions from the Microsoft Agent Web site, as we discussed at the beginning of this section.]

The following example (Fig. 17.37) demonstrates how to build a simple application with the Microsoft Agent control. This application contains two drop-down lists from which the user can choose an Agent character and a character animation. When the user chooses from these lists, the chosen character appears and performs the selected animation. The application uses speech recognition and synthesis to control the character animations and speechyou can tell the character which animation to perform by pressing the Scroll Lock key, then speaking the animation name into a microphone.

Figure 17.37. Microsoft Agent demonstration.

  1  ' Fig. 17.28: FrmAgent.vb  2  ' Microsoft Agent demonstration.  3  Imports System.IO  4  5  Public Class FrmAgent  6     ' current agent object  7     Private speaker As AgentObjects.IAgentCtlCharacter  8  9     ' parameterless constructor 10     Public Sub New() 11        InitializeComponent() 12 13        ' initialize the characters 14        Try 15           ' load characters into agent object 16           mainAgent.Characters.Load("Genie", _      17              "C:\windows\msagent\chars\Genie.acs")  18           mainAgent.Characters.Load("Merlin", _     19              "C:\windows\msagent\chars\Merlin.acs") 20           mainAgent.Characters.Load("Peedy", _      21              "C:\windows\msagent\chars\Peedy.acs")  22           mainAgent.Characters.Load("Robby", _      23              "C:\windows\msagent\chars\Robby.acs")  24 25           ' set current character to Genie and show him 26           speaker = mainAgent.Characters("Genie") 27           GetAnimationNames() ' obtain an animation name list 28           speaker.Show(0) ' display Genie 29           cboCharacter.SelectedText = "Genie" 30        Catch fileNotFound As FileNotFoundException 31           MessageBox.Show("Invalid character location", _ 32              "Error", MessageBoxButtons.OK, MessageBoxIcon.Error) 33        End Try 34     End Sub ' New 35 36     ' event handler for Speak Button 37     Private Sub btnSpeak_Click(ByVal sender As System.Object, _ 38        ByVal e As System.EventArgs) Handles btnSpeak.Click 39        ' if TextBox is empty, have the character ask 40        ' user to type the words into the TextBox otherwise, 41        ' have the character say the words in the TextBox 42       If txtSpeech.Text = "" Then 43          speaker.Speak("Please, type the words you want me to speak", "") 44       Else 45          speaker.Speak(txtSpeech.Text, "") 46       End If 47     End Sub ' btnSpeak_Click 48 49       ' event handler for Agent control's ClickEvent 50       Private Sub mainAgent_ClickEvent(ByVal sender As Object, _ 51          ByVal e As AxAgentObjects._AgentEvents_ClickEvent) _ 52          Handles mainAgent.ClickEvent 53 54          speaker.Play("Confused")                    55          speaker.Speak("Why are you poking me?", "") 56          speaker.Play("RestPose")                    57       End Sub ' mainAgent_ClickEvent 58 59       ' ComboBox changed event, switch active agent character 60       Private Sub cboCharacter_SelectedIndexChanged( _ 61          ByVal sender As System.Object, ByVal e As System.EventArgs) _ 62          Handles cboCharacter.SelectedIndexChanged 63 64          ChangeCharacter(cboCharacter.Text) 65       End Sub ' cboCharacter_SelectedIndexChanged 66 67       ' utility method to change characters 68       Private Sub ChangeCharacter(ByVal name As String) 69       speaker.StopAll("Play")              70       speaker.Hide(0)                      71       speaker = mainAgent.Characters(Name) 72 73       ' regenerate animation name list 74       GetAnimationNames() 75       speaker.Show(0) 76    End Sub ' ChangeCharacter 77 78    ' get animation names and store in ArrayList 79    Private Sub GetAnimationNames() 80       ' ensure thread safety 81       SyncLock (Me) 82          ' get animation names 83          Dim enumerator As IEnumerator = mainAgent.Characters( _ 84             speaker.Name).AnimationNames.GetEnumerator()         85 86          Dim voiceString As String 87 88          ' clear cboActions 89          cboActions.Items.Clear() 90          speaker.Commands.RemoveAll() 91 92          ' copy enumeration to ArrayList 93          While enumerator.MoveNext() 94              ' remove underscores in speech string 95              voiceString = enumerator.Current.ToString()          96              voiceString = voiceString.Replace("_", "underscore")  97 98              cboActions.Items.Add(enumerator.Current) 99 100             ' add all animations as voice enabled commands 101             speaker.Commands.Add(enumerator.Current.ToString(), _ 102                enumerator.Current, voiceString, True, False)      103          End While 104 105          ' add custom command 106          speaker.Commands.Add("MoveToMouse", "MoveToMouse", _ 107             "MoveToMouse", True, True)                        108       End SyncLock 109    End Sub ' GetAnimationNames 110 111    ' user selects new action 112    Private Sub cboActions_SelectedIndexChanged( _ 113       ByVal sender As System.Object, ByVal e As System.EventArgs) _ 114       Handles cboActions.SelectedIndexChanged 115 116       speaker.StopAll("Play")       117       speaker.Play(cboActions.Text) 118       speaker.Play("RestPose")      119    End Sub ' cboActions_SelectedIndexChanged     120 121    ' event handler for Agent commands 122    Private Sub mainAgent.Command(ByVal sender As Object, _ 123       ByVal e As AxAgentObjects._AgentEvents_CommandEvent) _ 124       Handles mainAgent_Command 125       ' get UserInput object 126       Dim command As AgentObjects.IAgentCtlUserInput = _     127          CType(e.userInput, AgentObjects.IAgentCtlUserInput) 128 129       ' change character if user speaks character name 130       If command.Voice = "Peedy" OrElse command.Voice = "Robby" OrElse _ 131          command.Voice = "Merlin" OrElse command.Voice = "Genie" Then 132          ChangeCharacter(command.Voice) 133          Return 134       End If 135 136       ' send agent to mouse 137       If command.Voice = "MoveToMouse" Then 138          speaker.MoveTo( _                                            139             Convert.ToInt16(Windows.Forms.Cursor.Position.X - 60), _  140             Convert.ToInt16(Windows.Forms.Cursor.Position.Y - 60), 5) 141          Return 142       End If 143 144       ' play new animation 145       speaker.StopAll("Play")    146       speaker.Play(command.Name) 147     End Sub ' mainAgent_Command 148  End Class ' FrmAgent

The example also allows you to switch to a new character by speaking its name and creates a custom command, MoveToMouse. In addition, when you press the Speak Button, the characters speak any text that you typed in the TextBox.

To use the Microsoft Agent control, you must add it to the Toolbox. Select Tools > Choose Toolbox Items... to display the Choose Toolbox Items dialog. In the dialog, select the COM Components tab, then scroll down and select the Microsoft Agent Control 2.0 option. When this option is selected properly, a small check mark appears in the box to the left of the option. Click OK to dismiss the dialog. The icon for the Microsoft Agent control now appears at the bottom of the Toolbox. Drag the Microsoft Agent Control 2.0 control onto your Form and name the object mainAgent.

In addition to the Microsoft Agent object mainAgent (of type AxAgent) that manages the characters, you also need a variable of type IAgentCtlCharacter to represent the current character. We create this variable, named speaker, in line 11.

When you execute this program, class FrmAgent's constructor (lines 1034) loads the character descriptions for the predefined animated characters (lines 1623). If the specified location of the characters is incorrect, or if any character is missing, a FileNotFoundException is thrown. By default, the character descriptions are stored in C:\Windows\msagent\chars. If your system uses another name for the Windows directory, you'll need to modify the paths in lines 1623.

Lines 2628 set Genie as the default character, obtain all animation names via our utility method GetAnimationNames and call IAgentCtlCharacter method Show to display the character. We access characters through property Characters of mainAgent, which contains all the characters that have been loaded. We use the indexer of the Characters property to specify the name of the character we wish to load (Genie).

Responding to the Agent Control's `ClickEvent`

When a user clicks the character (i.e., pokes it with the mouse), event handler mainAgent_ClickEvent (lines 5057) executes. First, speaker method Play plays an animation. This method accepts as an argument a String representing one of the predefined animations for the character (a list of animations for each character is available at the Microsoft Agent Web site; each character provides over 70 animations). In our example, the argument to Play is "Confused"this animation is defined for all four characters, each of which expresses this emotion in a unique way. The character then speaks, "Why are you poking me?" via a call to method Speak. Finally, we play the RestPose animation, which returns the character to its neutral, resting pose.

Obtaining a Character's List of Animations and Defining Its Commands

The list of valid commands for a character is contained in property Commands of the IAgentCtlCharacter object (speaker, in this example). The commands for an Agent character can be viewed in the Commands pop-up window, which displays when the user right clicks an Agent character (the last screenshot in Fig. 17.37). Method Add of property Commands adds a new command to the command list. Method Add takes three String arguments and two Boolean arguments. The first String argument identifies the name of the command, which we use to identify the command programmatically. The second String defines the command name as it appears in the Commands pop-up window. The third String defines the voice input that triggers the command. The first Boolean specifies whether the command is active, and the second Boolean indicates whether the command is visible in the Commands pop-up window. A command is triggered when the user selects the command from the Commands pop-up window or speaks the voice input into a microphone. Command logic is handled in the Command event handler of the AxAgent control (mainAgent, in this example). In addition, Agent defines several global commands that have predefined functions (for example, speaking a character name causes that character to appear).

Method GetAnimationNames (lines 79109) fills the cboActions ComboBox with the current character's animation listing and defines the valid commands that can be used with the character. The method contains a SyncLock block to prevent errors resulting from rapid character changes. The method uses an IEnumerator (lines 8384) to obtain the current character's animations. Lines 8990 clear the existing items in the ComboBox and the character's Commands property. Lines 93103 iterate through all the items in the animation-name enumerator. For each animation, line 95 assigns the animation name to String voiceString. Line 96 removes any underscore characters (_) and replaces them with the String "underscore"; this changes the String so that a user can pronounce and employ it as a command activator. Line 98 adds the animation's name to the cboActions ComboBox. The Add method of the Commands property (lines 101102) adds a new command to the current character. In this example, we add every animation name as a command. Each call to Add receives the animation name as both the name of the command and the string that appears in the Commands pop-up window. The third argument is the voice command, and the last two arguments enable the command but indicate that it is not available via the Commands pop-up window. Thus, the command can be activated only by voice input. Lines 106107 create a new command, named MoveToMouse, which is visible in the Commands pop-up window.

Responding to Selections from the `cboActions ComboBox`

After the GetAnimationNames method has been called, the user can select a value from the cboActions ComboBox. Event handler cboActions_SelectedIndexChanged (lines 112119) stops any current animation, then plays the animation that the user selects from the ComboBox, followed by the RestPose animation.

Speaking the Text Typed by the User

You can also type text in the TextBox and click Speak. This causes event handler btnSpeak_Click (line 3747) to call speaker method Speak, supplying as an argument the text in speechTextBox. If the user clicks Speak without providing text, the character speaks, "Please, type the words you want me to speak".

Changing Characters

At any point in the program, the user can choose a different character from the cboCharacters ComboBox. When this happens, the SelectedIndexChanged event handler for cboCharacters (lines 6065) executes. The event handler calls method ChangeCharacter (declared in lines 6876) with the text in the cboCharacters as an argument. Method ChangeCharacter stops any current animation, then calls the Hide method of speaker (line 70) to remove the current character from view. Line 71 assigns the newly selected character to speaker, line 74 generates the character's animation names and commands, and line 75 displays the character via a call to method Show.

Responding to Commands

Each time a user presses the Scroll Lock key and speaks into a microphone or selects a command from the Commands pop-up window, event handler mainAgent_Command (lines 122147) is called. This method receives an argument of type AxAgentObjects._AgentEvents_CommandEvent, which contains a single property, userInput. This property returns an Object that can be converted to type AgentObjects.IAgentCtlUserInput. Lines 126127 assigns the userInput object to an IAgentCtlUserInput object named command, which is used to identify the command, so that the program can respond appropriately. Lines 130134 use method ChangeCharacter to change the current Agent character if the user speaks a character name. Microsoft Agent always will show a character when a user speaks its name; however, by controlling the character change, we can ensure that only one Agent character is displayed at a time. Lines 137142 move the character to the current mouse location if the user invokes the MoveToMouse command. Agent method MoveTo takes x- and y-coordinate arguments and moves the character to the specified screen position, applying appropriate movement animations. For all other commands, we Play the command name as an animation in line 146.