The Java Speech API (JSAPI) helps you design speech-enabled Java applications and provides support for speech synthesizers, command and control recognizers, and dictation systems. JSAPI is a part of the Java Media API, which enhances the functioning of Java applications by using audio, video, animation, graphics, and images.
This chapter describes the JSAPI technologies and packages that help you create speech-enabled applications for converting text to speech.
The two main speech technologies supported by JSAPI are:
Speech Synthesis: Converts the input text to speech.
Speech Recognition: Converts the input audio to text.
The speech synthesis technology helps you develop Java applications that convert text to speech. This process of converting text to speech is known as Text To Speech (TTS) conversion.
The various phases in TTS conversion are:
Formation analysis: The speech synthesizer analyzes the input text to determine the start and end of paragraphs, sentences, and words.
Text analysis: The speech synthesizer identifies the text for special language constructs, such as abbreviations, date, time, e-mail address, and credit card number.
Text-to-phoneme conversion: The speech synthesizer converts each unit of text to phonemes. A phoneme is a unit of sound. Different languages have varied sets of sounds and phonemes.
Wave format production: The phoneme generated in the previous phase generates the audio waveform. Generation of an audio waveform occurs for each unit of text.