Hack 60. Control Your Car PC with Voice Recognition
You can use software to make your car PC's navigation system and other functions voice-operated, just like the cool cars in the commercials. My initial inspiration for implementing voice recognition in my car PC came because I kept seeing commercials for voice-activated navigation systems, like those in Hondas and Acuras. Speech recognition requires the computer to accept spoken words as input and interpret what has been spoken. To make the job of understanding speech easier for the computer, a method of speech input called command and control is used. With command and control, a limited number of voice commands are specified and listened for by the computer, which greatly decreases the chances of errors in interpretation. If you install Microsoft's Speech API (SAPI) and my NaviVoice (http://www.whipflash.com/vamr/routisvoice.htm) program on your Windows computer, you can configure it to respond to voice commands. The text commands that you can use are stored in an XML (eXtensible Markup Language) file. In the XML file that comes with NaviVoice, each command is next to a number; when a command is uttered, the corresponding number is sent to NaviVoice, which then executes the command that is associated with that number. NaviVoice itself responds to the number that it receives from SAPI by executing commands stored in INI files. These INI files specify what should be done to respond to the recognized voice command (e.g., launching a file or executing a macro). NaviVoice implements a macro system so that it can control closed source applications for which no documentation or programming API exists. Basically, the macros are series of emulated user commands (such as menu selections) that are executed very quickly and automatically. 5.7.1. Controlling Routis and iGuidanceRoutis and iGuidance are popular in-car computer navigation programs [Hack #71], but neither of these programs provides a way to access their commands from software, so I had to develop a workaround. NaviVoice emulates the keyboard and mouse inputs a user would normally give to the program. When you want to enter a destination, you speak a voice command and NaviVoice "presses" Alt-F, Enter, Enter, Enter in the NAV application on your behalf. When you then spell out an address and say "Enter" twice, NaviVoice enters the information in the host NAV application for you and has it perform a search for directions. The process of accessing favorite locations is even easier. Enter your desired favorite into the configuration program, and add the desired voice command and its corresponding number into the XML file. When you want to go to that favorite, speak the voice command you associated with it, and NaviVoice will do all the typing for you. Then you can just follow the directions to your destination. Currently, NaviVoice voice-enables almost every function that either Routis or iGuidance can perform, including access to some nested commands, such as automatic speed warnings, map orientation, map size, 3D map view, point of interest icons on/off, map or guidance view, route mode, and so on.
In earlier versions of NaviVoice, the application was always listening, leading to erroneous command recognition when the driver spoke to passengers or talked on a mobile phone. The solution I implemented was to give NaviVoice a trigger word that, when spoken by the user, opens up a user-configurable "window" (in time) in which the user may speak all of his commands. Each successful command recognition resets the window, so you won't have to say the trigger command constantly. This way, NaviVoice listens constantly just for the trigger word, and only when it hears it will it then listen for other navigation-specific commands. If the user needs to close the window early, there is a "done" command; there is also an infinite time window command to eliminate the need for a trigger word. When NaviVoice is listening for commands, the system tray icon will indicate this by changing from a globe with headphones to a microphone.
5.7.2. Text-to-Speech OutputText-to-speech (TTS) is another technology offered by SAPI. The speech is totally synthesized in software, and additional voices are available in your favorite dialects and languages [Hack #59]. Routis, iGuidance, and many other navigation applications use prerecorded voice prompts instead of generic voice synthesis. This is because they have a standard set of 50 or so phrases that they say a lot (e.g., "turn left"), which are recorded by a voice actor instead of being synthesized. However, it can be unsettling to have multiple voices in the car (one for general TTS and one for prerecorded navigation prompts). With help from Frodo (the author of FrodoPlayer [Hack #75]), I added a feature to the NaviVoice configuration application that uses the SAPI 5 engine to output the navigation text prompts into WAV format, suitable for replacing the prerecorded prompts in Routis/iGuidance and providing a more consistent listening experience. 5.7.3. SAPI HintsHere are some useful hints for using the Microsoft Speech API:
5.7.4. Further DevelopmentFor future versions, I would love to enable album and artist selection of MP3s using voice recognition. I know that this is possible; it's just a matter of figuring out how to add voice commands after SAPI has initialized. Also, I would love to add support for PhoneControl.NET [Hack #63], with the goal of being able to say things like "call home" and have your home number automatically dialed. 5.7.5. See Also
David Burban |