Speech Recognition | Java InstantCode. Developing Applications using Java Speech API

Speech recognition is the process of converting speech to text. Some features of a speech recognizer are:

Supports different specifications for different languages.
Accepts single audio stream as input.
Provides ability to simulate an end user s voice.
Provides ability to update grammar at run time.
Defines a set of properties that the application can control, such as voice, speech rate, pitch range, and volume.

The various phases of producing text from speech are:

Grammar recognition: The speech recognizer establishes standards for words based on the manner in which they are spoken by an end user.
Signal processing: The speech recognizer processes incoming audio stream to process its characteristics.
Phoneme recognition: The speech recognizer compares the spectrum characteristics generated in the previous phase with the phoneme patterns of the specified language.
Word recognition: The speech recognizer matches a pattern of phonemes with the word patterns defined by the active grammar object, an object of the Grammar class, of the specified language. In JSAPI, the active grammar object of the Grammar class provides the speech recognizer with words that an end user might say and the possible arrangement of these words. The basic types of grammar supported by the Java Speech API are:
- Rule Grammar: In speech recognition systems employing rule-based grammar, the application is responsible for letting the computer know what to expect from an end user. Rule grammar is based on commonly used terms.
- Dictation Grammar: In speech recognition systems employing dictation grammar, an end user can provide input in free form. This type of grammar does not put constraints on what an end user says because it is not based on commonly used terms. Dictation grammar can be optimized for certain set of words, such as words used in law or medicine. JSAPI has a built-in dictation grammar and dictation recognizer.
Result generation: The speech recognizer provides the output in the form of text. The speech recognizer generates an output as soon as a single expression, such as a sentence , is complete.