Phonetic Analysis


In this chapter we've looked at traditional encoding methods and also hash generation, but Codec offers an additional set of classes for use in phonetic analysis.

It's beyond the scope of this text to delve into the theory behind Soundex and Metaphone phonetic analysis, but suffice to say that these algorithms provide different mechanisms for generating a phonetic key. This key is designed to be able to help answer the question, "How much does this word sound like another?"

Table 12-1 shows how different phonetic algorithms included with Codec translate words into phonetic keys. Depending on the application you are trying to build, different algorithms may be of interest. For example, a spell-checking application might use Metaphone to determine alternative words as part of a spell-checking routine, whereas a voice-recognition system might rely on the RefinedSoundEx algorithm to determine the most likely words the speaker is trying to convey. Notice that while SoundEx produces the same length key for all words, the other algorithms produce variable length keys.

Table 12-1. Phonetic Algorithm Results

Word

SoundEx

RefinedSoundEx

Metaphone

Double Metaphone

hello

H400

H070

HL

HL

fellow

F400

F2070

FL

FL

mellow

M400

M8070

ML

ML

monster

M523

M8083609

MNST

MNST

monstrous

M523

M80836903

MNST

MNST


By using these algorithms and a very large set of words, you can generate a set of keys linking the words phoneticallythe fundamentals needed for a spell checker.

Listing 12-10 shows the code used to generate the terms as shown in Table 12-1.

Listing 12-10. Phonetic Key Generation Source
 public static void phoneticDemo() {     printHeader("Phonetic Demo");     String[] words =         { "hello", "fellow", "mellow", "monster", "monstrous" };     System.out.println("Notice the sounds of the words, and");     System.out.println("how the sounds are translated into 4");     System.out.println("character character flags.");     System.out.print("Word, SoundEx, RefinedSoundEx, ");     System.out.println("Metaphone DoubleMetaphone");     for (int i = 0; i < words.length; i++)     {         System.out.print(words[i] + ", ");         System.out.print(new Soundex().encode(words[i]) + ", ");         System.out.print(             new RefinedSoundex().encode(words[i]) + ", ");         System.out.print(             new Metaphone().encode(words[i]) + ", ");         System.out.print(             new DoubleMetaphone().encode(words[i]));         System.out.println();     } } 

For more information on phonetic analysis, see the following sites:

http://encyclopedia.thefreedictionary.com/Soundex

http://www.archives.gov/research_room/genealogy/census/soundex.html

http://aspell.sourceforge.net/metaphone/



    Apache Jakarta Commons(c) Reusable Java Components
    Real World Web Services
    ISBN: N/A
    EAN: 2147483647
    Year: 2006
    Pages: 137
    Authors: Will Iverson

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net