Speech Synthesis Packages

At this point you may be asking, "Okay, enough about the theory. I just want a program that will work." Never fear; there are several packages that will work under Linux. In a later section we'll see how to combine them with the other tools on a typical Linux system to perform useful tasks . However, it still helps to know what the software is doing, in order to better make it do what you want.

Rsynth

Rsynth is one of the simplest, fastest speech synthesizers available for Linux. It is a public domain program that originally appeared on the comp.speech newsgroup. Unfortunately, due to its simplicity it has relatively poor sound quality. The basic usage is:

 say -d a "Hello world."

 echo "Goodbye cruel world."  say -d a

The -d option describes which pronunciation dictionary to use: a means Ameri can, and b means British. The pronunciation dictionaries must be set up in a library directory, usually /usr/local/lib ( /usr/lib/dict on Debian systems). Rsynth can be used without a pronunciation dictionary, but due to the irregularities of the English language, it performs atrociously. Luckily, the software comes with a utility called mkdictdb, which converts plain-text dictionaries into specially indexed dictionaries that rsynth can use.

There are many plain-text pronunciation dictionaries available online. One of the most commonly used American English dictionaries is available from Carnegie Mellon University. You will need to convert it to rsynth format in order to use it:

 $ wget ftp://ftp.cs.cmu.edu/project/fgdata/dict/cmudict-0.6.gz

 $ gunzip cmudict-0.6.gz

 $ mkdictdb cmudict-0.6 /usr/local/lib/aDict.db

If you don't already have it, wget is a nifty utility to retrieve known URLs. See Chapter 15 , Tools You Should Know, for a discussion of wget. The final line of code runs the utility to index the dictionary and place the output file in the directory that rsynth expects.

There is also a British English pronunication dictionary available at the following URL:

 ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/dictionaries/beep-1.0.tar.gz

Rsynth allows you to insert specific phonemes in the plain text. This provides an easy way to correct for the cases in which a word is not in the pronunciation dictionary and rsynth's internal rules produce the wrong pronunciation. For example, when rsynth is given the word Linux, it pronounces it "Line-ux", instead of "Lin-ux." The proper pronunciation is obtained by supplying the list of phonemes enclosed in brackets:

 say -d a "[lInVks]"

The correspondence between characters and phonemes follows the same standard as the prebuilt pronunciation dictionaries. Rather than detailing the codes for each phoneme, it's actually easier to experiment a little to find the correct pronunciation for a word. A good approach is to look at the encoding for a word that sounds similar and try modifying it as necessary.

Festival

Festival is an open source speech synthesis package developed primarily at the University of Edinburgh Centre for Speech Technology Research. It is distributed under an X11-style license. The core speech engine, written in C++, includes a command interpreter based on Scheme, one of the many dialects of LISP. Inveterate users of the advanced features of Emacs should feel right at home in the Festival command interpreter. Festival aims to provide a complete set of tools addressing all levels of speech synthesis, and it is deliberately built to support multiple (human) languages.

Festival can be obtained as a RedHat or Debian package, and can be downloaded directly from the University of Edinburgh:

http://www.cstr.ed.ac.uk/projects/festival/

Festival provides several pronunication and phoneme dictionaries, and additional voices are available from other sources at no cost or for a small fee. The basic package includes:

· voice_rab_diphone: A British English male speaker. The accent is linguistically known as RP, which would be most familiar to nonlinguists as the "voice of the BBC."

· voice_don_diphone: Another British English male speaker, which sounds better than rab.

· voice_ked_diphone: An American English male speaker.

· voice_kal_diphone: Another American English male speaker, which sounds better than ked.

· voice_ell_diphone: A Castilian Spanish male speaker.

Despite its underlying complexity, Festival is actually very easy to use. The simplest method of invocation is via the ”tts option (text to speech). It will read a file or standard input and speak the incoming words:

  fortune  festival --tts -

Without the text-to-speech option, Festival invokes a Scheme interpreter that takes commands from standard input. For example,

  festival <<END

  (Parameter.set 'Audio_Method 'linux16audio)

  (voice_kal_diphone)

  (SayText "Hello, world!")

  (SayText "You can hear me now!")

END

This example shows a few of the features of Scheme. Commands are executed by enclosing the command name and arguments in parentheses. Strings are represented by enclosing them in double quotes (not single quotes.) Festival populates the interpreter environment with a number of commands, but the two basic ones you need to know are the ( voice_* ) commands, which set a particular voice to use, and the ( SayText " " ) command, which does exactly what it would appear to do, that is, parse a phrase and speak it to the audio device.

The remaining command ( Parameter.set ) is used to set the audio output method. If you are using an audio server that has a large output buffer, Festival may start sending a sentence to the audio mixer before the previous one is done, resulting in an unexpected muddle of voices. To avoid this problem, you can specifically set the audio output method in the interpreter to linux16audio, which tells Festival to use the /dev/audio device.

You can also modify the interpreter's initial environment to specify a different default voice and audio mode. Included in the Festival distribution is a file called init.scm; its location in the file system may vary depending on whether you compiled Festival from source or obtained it as a DEB or RPM. This file contains most of the default settings and can be hand-edited to set the desired parameters.

Festival was originally built on Unix-based systems, and as a consequence it is easily integrated with the other Unix command line tools. We'll see an example of this later in the chapter. It also has an excellent set of documentation, which is always an asset when learning to use new software!