Hack60.Control Your Car PC with Voice Recognition

Hack 60. Control Your Car PC with Voice Recognition

You can use software to make your car PC's navigation system and other functions voice-operated, just like the cool cars in the commercials.

My initial inspiration for implementing voice recognition in my car PC came because I kept seeing commercials for voice-activated navigation systems, like those in Hondas and Acuras.

Speech recognition requires the computer to accept spoken words as input and interpret what has been spoken. To make the job of understanding speech easier for the computer, a method of speech input called command and control is used. With command and control, a limited number of voice commands are specified and listened for by the computer, which greatly decreases the chances of errors in interpretation.

If you install Microsoft's Speech API (SAPI) and my NaviVoice (http://www.whipflash.com/vamr/routisvoice.htm) program on your Windows computer, you can configure it to respond to voice commands. The text commands that you can use are stored in an XML (eXtensible Markup Language) file. In the XML file that comes with NaviVoice, each command is next to a number; when a command is uttered, the corresponding number is sent to NaviVoice, which then executes the command that is associated with that number. NaviVoice itself responds to the number that it receives from SAPI by executing commands stored in INI files. These INI files specify what should be done to respond to the recognized voice command (e.g., launching a file or executing a macro).

NaviVoice implements a macro system so that it can control closed source applications for which no documentation or programming API exists. Basically, the macros are series of emulated user commands (such as menu selections) that are executed very quickly and automatically.

5.7.1. Controlling Routis and iGuidance

Routis and iGuidance are popular in-car computer navigation programs [Hack #71], but neither of these programs provides a way to access their commands from software, so I had to develop a workaround. NaviVoice emulates the keyboard and mouse inputs a user would normally give to the program. When you want to enter a destination, you speak a voice command and NaviVoice "presses" Alt-F, Enter, Enter, Enter in the NAV application on your behalf. When you then spell out an address and say "Enter" twice, NaviVoice enters the information in the host NAV application for you and has it perform a search for directions.

The process of accessing favorite locations is even easier. Enter your desired favorite into the configuration program, and add the desired voice command and its corresponding number into the XML file. When you want to go to that favorite, speak the voice command you associated with it, and NaviVoice will do all the typing for you. Then you can just follow the directions to your destination.

Currently, NaviVoice voice-enables almost every function that either Routis or iGuidance can perform, including access to some nested commands, such as automatic speed warnings, map orientation, map size, 3D map view, point of interest icons on/off, map or guidance view, route mode, and so on.

Ninja Monkey (from the http://www.mp3car.com forums) has developed another popular navigation program based on the Destinator engine, called Map Monkey. NaviVoice controls this version of Destinator just as well.

In earlier versions of NaviVoice, the application was always listening, leading to erroneous command recognition when the driver spoke to passengers or talked on a mobile phone. The solution I implemented was to give NaviVoice a trigger word that, when spoken by the user, opens up a user-configurable "window" (in time) in which the user may speak all of his commands. Each successful command recognition resets the window, so you won't have to say the trigger command constantly. This way, NaviVoice listens constantly just for the trigger word, and only when it hears it will it then listen for other navigation-specific commands. If the user needs to close the window early, there is a "done" command; there is also an infinite time window command to eliminate the need for a trigger word. When NaviVoice is listening for commands, the system tray icon will indicate this by changing from a globe with headphones to a microphone.

Problems Caused by Standby or Hibernation

SAPI has an unfortunate tendency not to resume after Windows comes out of either standby or hibernation, both of which are often used with in-car computers. However, I've noticed that if I pause the recognition before the system goes into either standby or hibernation, SAPI will work when I resume after I click on resume recognition. So now, NaviVoice monitors system power events and pauses recognition before going into standby or hibernation and resumes recognition after resumption.

Another problem that was fixed by monitoring the system power events was a bug in which Routis/iGuidance did not recognize USB GPS receivers when the computer resumed from standby or hibernation. In NaviVoice.ini, you can configure NaviVoice so that when resumption is detected, NaviVoice will dismiss the GPS error dialog and command either Routis or iGuidance to recognize the GPS receiver.

5.7.2. Text-to-Speech Output

Text-to-speech (TTS) is another technology offered by SAPI. The speech is totally synthesized in software, and additional voices are available in your favorite dialects and languages [Hack #59].

Routis, iGuidance, and many other navigation applications use prerecorded voice prompts instead of generic voice synthesis. This is because they have a standard set of 50 or so phrases that they say a lot (e.g., "turn left"), which are recorded by a voice actor instead of being synthesized. However, it can be unsettling to have multiple voices in the car (one for general TTS and one for prerecorded navigation prompts). With help from Frodo (the author of FrodoPlayer [Hack #75]), I added a feature to the NaviVoice configuration application that uses the SAPI 5 engine to output the navigation text prompts into WAV format, suitable for replacing the prerecorded prompts in Routis/iGuidance and providing a more consistent listening experience.

5.7.3. SAPI Hints

Here are some useful hints for using the Microsoft Speech API:

Update to the latest version of SAPI (6) included with Microsoft Office 2003.
Increase the "Pronunciation Sensitivity" and "Accuracy vs. Recognition Time" settings (Start Control Panel Speech Settings).
Control Panel Speech Settings). Do you really want SAPI learning from what it hears? Probably not.
Training seems like an obvious requirement, but most people seem to skip this. The original SAPI training session is not the means by which a properly configured SAPI learns to understand you. The more you train, the more accurate SAPI will be.

5.7.4. Further Development

For future versions, I would love to enable album and artist selection of MP3s using voice recognition. I know that this is possible; it's just a matter of figuring out how to add voice commands after SAPI has initialized. Also, I would love to add support for PhoneControl.NET [Hack #63], with the goal of being able to say things like "call home" and have your home number automatically dialed.

5.7.5. See Also

"Car-Enable Clunky Applications" [Hack #58]
"Choose Your in-Car Navigation Software" [Hack #71]

David Burban