The JSAPI packages contain a number of interfaces and classes to support the speech technology. The packages used in JSAPI are:
javax.speech
javax.speech.synthesis
javax.speech.recognition
To learn more about JSAPI, refer to the link http://java.sun.com/products/java-media/speech/reference/api/index.html.
You can download the free implementation of JSAPI from the following URL:
http://freetts. sourceforge .net/docs/index.php
To install JSAPI, extract the downloaded file, freetts-1.2beta2-bin.zip, in C drive and set the classpath of the lib directory of free implementation by executing the following command at the command prompt:
set classpath = %classpath%;c:\freetts-bin- 1_2_beta\lib\freetts.jar;c:\freetts-bin-1_2_beta\lib\cmulex.jar;c:\freetts-bin-1_2_beta\lib\ jsapi.jar;
The javax.speech package contains classes and interfaces that define how the speech engine functions. A speech engine is a system that manages speech input and output. The javax.speech package defines the basic properties of a speech engine.
The commonly used classes of the javax.speech package are:
AudioEvent
Central
EngineModeDesc
EngineList
The commonly used interfaces of the javax.speech package are:
Engine
AudioManager
VocabManager
The AudioEvent class specifies the events related to audio input for the speech recognizer and audio output for speech synthesis. The AudioEvent class defines a method, paramString(), which returns a parameter string to identify the event occurred. This method is used for debugging and for maintaining event logs.
The Central class allows you to access all the speech input and output functions of a speech engine. This class provides methods to locate, select, and create speech engines, such as speech recognizers and speech synthesizers. A Java application can use a speech engine if the speech engine is registered with the Central class. The various methods declared in the Central class are:
availableRecognizers(): Returns a list of available speech recognizers according to the required properties specified in the input parameter, such as the EngineModeDesc class or the RecognizerModeDesc class. If the parameter passed is null, the availableRecognizers() method lists all the available known recognizers.
availableSynthesizer(): Returns a list of available synthesizers according to the required properties specified in the input parameter, such as the EngineModeDesc class. If the parameter passed is null, the availableSynthesizer() method lists all the available known synthesizers.
createRecognizer(): Creates a recognizer according to the specified properties in the input parameter, such as the EngineModeDesc class or the RecognizerModeDesc class. The createRecognizer() method returns null if there is no recognizer with the specified properties.
createSynthesizer(): Creates a synthesizer according to the specified properties in the input parameter, such as the EngineModeDesc class or the SynthesizerModeDesc class. The createSynthesizer() method returns null if there is no synthesizer with the specified properties.
registerEngineCentral(): Registers a speech engine with the Central class. The registerEngineCentral() method takes an object of the String class as an input parameter. The registerEngineCentral() method adds the specified class name to the list of engines.
The EngineModeDesc class defines the basic properties of a speech engine that determine the mode of operation, such as Spanish or English dictator. The various methods declared in the EngineModeDesc class are:
getEngineName(): Returns the engine name, which should be a unique string across the provider.
setEngineName(): Sets the name of the engine as provided in the input parameter string.
getModeName(): Returns the mode name, which uniquely identifies the single mode of operation of the speech engine.
setModeName(): Sets the mode name as provided in the input parameter string.
getLocale(): Returns the object of the Locale class for the engine mode.
setLocale(): Sets the Locale of the engine according to the specified input parameter, which is an object of the Locale class.
getRunning(): Returns a Boolean value indicating whether or not the speech engine is already running.
setRunning(): Sets the feature required to run the engine, according to the Boolean input parameter.
match(): Returns a Boolean value to determine whether or not the EngineModeDesc object input parameter has all the defined features.
equals(): Returns a Boolean value, which is true if the EngineModeDesc object input parameter is not null and has equal values for engine name, mode name, and Locale.
The EngineList class selects the appropriate speech engine with the help of the methods of the Central class. The EngineList class contains a set of EngineModeDesc class objects. The various methods available in the EngineList class are:
anyMatch(): Returns a Boolean value, which is true if one or more of the EngineModeDesc class objects in the EngineList class match the EngineModeDesc class object in the input parameter.
requireMatch(): Removes the EngineModeDesc class object entries from the EngineList class that do not match the EngineModeDesc class object specified as the input parameter. For each EngineModeDesc class object in the list, the match method is called. If the match method returns false, the corresponding entry is removed from the list.
rejectMatch(): Removes the EngineModeDesc class object entries from the EngineList class that match the EngineModeDesc class object specified as the input parameter. For each EngineModeDesc class object in the list, the match method is called. If the match method returns true, the corresponding entry is removed from the list.
orderByMatch(): Orders the list that matches the required features. This method takes the EngineModeDesc class object as an input parameter.
The Engine interface is the parent interface for all speech engines. The speech engines derive functions, such as allocation and deallocation of methods, access to EngineProperties and EngineModeDesc classes, and use of the pause() and resume() methods from the Engine interface. Some of the methods defined by the Engine interface are:
allocate(): Allocates the resources required by the Engine interface and sets the state of the Engine interface as ALLOCATED. When the method executes, the Engine interface is in the ALLOCATING_RESOURCES state.
deallocate(): Deallocates the resources of the engine, which are acquired at the ALLOCATED state and during the operation. This method sets the state of the engine as DEALLOCATED.
pause(): Pauses the audio stream of the engine and sets the state of the engine as PAUSED .
resume(): Resumes the audio streaming to or from a paused engine and sets the state of the engine as RESUME.
getEngineState(): Returns the current state of the Engine interface.
The AudioManager interface allows an application to control and monitor the audio input and output, and other audio-related events, such as start and stop audio. The methods provided by this interface are:
addAudioListener(): Requests notifications of audio events to the AudioListener object specified as an input parameter.
removeAudioListener(): Removes the object of the AudioListener interface specified as an input parameter from the AudioManager interface.
The VocabManager interface manages words that the speech engine uses. This interface provides information about difficult words to the speech engine. Some of the methods provided by this interface are:
addWord(): Adds a word to the vocabulary of the speech engine. This method takes an object of the Word class as an input parameter.
addWords(): Adds an array of words to the vocabulary of the speech engine. This method takes an object array of the Word class as an input parameter.
removeWord(): Removes a word from the vocabulary of the speech engine. This method takes an object of the Word class as an input parameter.
removeWords(): Removes an array of words from the vocabulary of the speech engine. This method takes an object array of the Word class as an input parameter.
listProblemWords(): Returns an array of words that cause problems due to spelling mistakes.
The javax.speech.recognition package provides classes and interfaces that support speech recognition. This package inherits the basic functioning from the javax.speech package. The speech recognizer is a type of speech engine that has the ability to recognize and convert incoming speech to text.
The commonly used classes of the javax.speech.recognition package are:
RecognizerModeDesc
Rule
GrammarEvent
The commonly used interfaces of the javax.speech.recognition package are:
Grammar
Recognizer
Result
The RecognizerModeDesc class extends the basic functioning of the EngineModeDesc class with properties specific to a speech recognizer. Some commonly used methods of the RecognizerModeDesc class are:
isDictationGrammarSupported(): Returns a Boolean value indicating whether or not the engine mode provides an object of the DictationGrammar interface.
addSpeakerProfile(): Adds a speaker profile specified in an input parameter to the object array of the SpeakerProfile class.
match(): Returns a Boolean value depending on whether or not the RecognizerModeDesc object contains all the features specified by the input parameter. The input parameter can be an object of the RecognizerModeDesc class or the EngineModeDesc class. For the EngineModeDesc class, the match() method checks whether or not all the features supported by the EngineModeDesc class are defined.
getSpeakeProfiles(): Returns an array of the SpeakerProfile class containing a list of speaker profiles known to the current mode of the speech recognizer.
The Rule class defines the basic component of the RuleGrammar interface. The methods provided by this class are:
copy(): Returns a copy of the Rule class and all its subrules, which includes the RuleAlternatives, RuleCount, RuleParse, RuleSequence, and RuleTag classes.
toString(): Returns a string representing the portion of Java Speech Grammar Format (JSGF) that appears on the right of a rule definition.
The Recognizer interface extends the functioning of the Engine interface of the javax.speech package. The Recognizer interface is created by using the createRecognizer() method of the Central class. Some methods defined in the Recognizer interface are:
newRuleGrammar(): Creates a new object of the RuleGrammar interface for the Recognizer interface with the name specified as the input string parameter.
getRuleGrammar(): Returns the object of the RuleGrammar interface specified as the input string parameter. If the grammar is not known to the Recognizer interface, the method returns a null value.
getDictationGrammar(): Returns the dictation grammar corresponding to the name specified in the input string parameter.
commitChanges(): Commits changes to the loaded types of grammar.
removeResultListener(): Removes an object of the ResultListener interface, specified as the input parameter, from the recognizer.
getSpeakerManager(): Returns an object of the SpeakerManager interface that allows management of the speakers , such as storing speaker data, of a Recognizer interface.
suspend(): Suspends the speech recognition temporarily and places the Recognizer interface in the SUSPENDED state. The incoming audio is buffered whereas the recognizer is suspended .
The Result interface recognizes the incoming audio that matched an active grammar object, which is an object of the Grammar class. When an incoming speech is recognized, the Result interface provides information, such as sequence of finalized and unfinalized words, matched grammar, and result state. The result state includes UNFINALIZED, ACCEPTED, and REJECTED. A new object of the Result interface is created when the recognizer identifies incoming speech that matches with active grammar. Some methods of the Result interface are:
getResultState(): Returns the current state of the Result interface object in the form of an integer. The values can be UNFINALIZED, ACCEPTED, and REJECTED.
getGrammar(): Returns an object of the Grammar interface that matches the finalized tokens of the Result interface.
numTokens(): Returns the integer number of the finalized tokens in the Result interface.
removeResultListener(): Removes a listener from the Result interface that corresponds to the object of the ResultListener interface input parameter.
getBestTokens(): Returns an array of all the finalized tokens for the Result interface.
The javax.speech.synthesis package provides classes and interfaces that support synthesis of speech. A speech synthesizer is a speech engine that converts text to speech. A synthesizer is created, selected, and searched through the Central class of the javax.speech package. Some commonly used classes of the javax.speech.synthesis package are:
Voice
SynthesizerModeDesc
Some commonly used interfaces of the javax.speech.synthesis package are:
Synthesizer
SynthesizerProperties
The Voice class defines one output voice for the speech synthesizer. The class supports fields, such as GENDER_MALE, GENDER_FEMALE, AGE_CHILD, and AGE_TEENAGER to describe the synthesizer voice. Some methods provided by the Voice class are:
getName(): Returns the voice name as a string.
setName(): Sets the voice name according to the input string parameter.
getGender(): Returns the integer value of the gender of the voice.
setGender(): Sets the voice gender according to the specified integer input parameter.
getAge(): Returns the integer value of the age of the voice.
clone(): Creates a copy of the voice.
match(): Returns a Boolean value specifying whether or not the Voice class has all the features corresponding to the voice object in the input parameter.
The SynthesizerModeDesc class extends the functioning of the EngineModeDesc class of the javax.speech package. Apart from the engine name, locale, mode name, and running properties inherited from the EngineModeDesc class, the SynthesizerModeDesc class includes two properties, the voice to be loaded when the synthesizer is started and the list of voices provided by the synthesizer. Some methods provided by the SynthesizerModeDesc class are:
addVoice(): Adds a voice, specified in the voice input parameter, to the existing list of voices.
equals(): Returns a Boolean value, which is true if the object of the SynthesizerModeDesc class and the specified input parameter have equal values of properties, such as engine name, locale, mode name, and all voices.
match(): Returns a Boolean value depending on whether or not the object of the SynthesizerModeDesc class has all the features specified by the input parameter. The input parameter can be SynthesizerModeDesc or EngineModeDesc. If the input parameter is EngineModeDesc, the method checks only for the features of the EngineModeDesc class.
getVoices(): Returns an array of the list of voices available in the synthesizer.
The Synthesizer interface provides an extension to the Engine interface of the javax.speech package. The Synthesizer interface is created by using the createSynthesizer() method of the Central class. Some methods defined by the Synthesizer interface are:
speak(): Reads out text from a Uniform Resource Locator (URL) that has been formatted with the Java Speech Markup Language (JSML). This method accepts two input parameters, the URL containing the JSML text and the SpeakableListener interface object to which the Synthesizer interface sends the notifications of events. The Synthesizer interface checks the text specified in the URL for JSML formatting and places in the output queue.
speakPlainText(): Reads out a plain text string. This method accepts two input parameters, the string containing text and the SpeakableListener interface object to which the notifications of events are sent during the synthesis process.
phoneme(): Returns the phoneme string for the corresponding text string input parameter. The input string can be simple text with out JSML formatting.
enumerationQueue(): Returns an enumeration containing the list of all the objects present in the output queue. This method returns the objects placed on the speech output queue by the current application only. The top of the queue represents the first item.
cancelAll(): Cancels all the objects in the speech output queue and stops the audio process of the current object in the top of the queue.
The SynthesizerProperties interface provides an extension to the EngineProperties interface of the javax.speech package. This interface allows you to control the run time properties, such as voice, speech rate, pitch range, and volume. Some methods provided by the SynthesizerProperties interface are:
getVoice(): Returns the current synthesizers voice.
setVoice(): Sets the current synthesizers voice according to the specified voice input parameter.
getPitch(): Returns the baseline pitch for synthesis as a float value.
setPitchRange(): Sets the pitch range according to the input float parameter.
setSpeakingRate(): Sets the target speech rate according to the input float parameter. The rate is usually represented as number of words per minute.
getVolume(): Returns the volume of speech.