Section 9.4. Example: Counting Frequencies of Letters


[Page 416 (continued)]

9.4. Example: Counting Frequencies of Letters

Suppose you wish to write a program to break a text message encrypted with one of the historical ciphers discussed in the two preceding chapters. It is well known that historical ciphers often can be broken. Put differently, the plaintext can be found from the ciphertext, by examining the frequencies of letters and comparing them to the average frequencies of typical samples of plaintext. For example, E and T are the two most frequently used letters in the English language. So, in a ciphertext encrypted with a Caesar cipher, E and T are good guesses as the plaintext letter corresponding to the most frequent letter in the ciphertext message.

Let's write a program that will count how many times each of the 26 letters of the English language appears in a given string. There are a number of ways to design such a program, depending on how flexible you wish it to be. Let us keep this example simple by assuming that we will only be interested in counting occurrences of the letters A through Z and not occurrences of spaces or punctuation marks. Assume further that we will change lowercase letters in our string sample to uppercase before counting letters and that we will want to print out the frequencies of letters to the console window. Finally, assume that, later in the chapter after we discuss sorting arrays, we will want to enhance our program so that it can print out the letter frequencies in order of increasing frequency.

9.4.1. A Class to Store the Frequency of One Letter

It is clear that an array should be used for storing the frequencies, but a decision must also be made as to what to store as the array elements. If we store letter frequencies as int values, with the frequency of A stored at index 0, and the frequency of B at index 1, and so forth, we will not be able to rearrange the frequencies into increasing order without losing track of which letter corresponds to which frequency. One way of solving this problem is to create an array of objects, where each object stores both a letter and its frequency.


[Page 417]

So let us design a LetterFreq class that stores a letter in an instance variable of type char and its frequency in an instance variable of type int. These instance variables can be declared as:

private char letter;   // A character being counted private int freq;      // The frequency of letter 


We will want a constructor that can initialize these two values and two accessor methods to return these values. Since we are familiar enough with these kinds of methods, it will not be necessary to discuss them any further. We need one additional method to increment freq whenever we encounter the letter while processing the string:

public void incrFreq() {     freq++; } // setFreq() 


A UML diagram for the LetterFreq class is given in Figure 9.9, and the class definition is given in Figure 9.10. We will have to make a minor modification to this class later in the chapter to enable us to sort an array of objects from the class.

Figure 9.9. UML for LetterFreq.


Figure 9.10. The LetterFreq class definition.

public class LetterFreq {     private char letter;    // A character being counted     private int freq;       // The frequency of letter     public LetterFreq(char ch, int fre) {         letter = ch;         freq = fre;     }     public char getLetter() {         return letter;     }     public int getFreq() {         return freq;     }     public void incrFreq() {         freq++;     } } // LetterFreq class 


[Page 418]

9.4.2. A Class to Count Letter Frequencies

Now let us turn to designing a class named AnalyzeFreq that will use an array of objects of type LetterFreq to count the frequencies of the letters A through Z in a given string. The arraylet's call it freqArrwill be the only instance variable of the class. The class needs a constructor to instantiate the array and to create the 26 array elements, each with a different letter and an initial frequency of 0. This class should also have two methods: one to count the frequencies of the 26 letters in a given string and, the other to print out the frequency of each letter to the console window. The UML diagram for the class is given in Figure 9.11.

Figure 9.11. UML for AnalyzeFreq.


The array instance variable can be declared by:

private LetterFreq[] freqArr; // An array of frequencies 


The constructor creates an array of 26 elements to store references to LetterFreq objects with the statement

freqArr = new LetterFreq[26]; 


The indices of the array range from 0 to 25, and the elements at these locations should store the letters A to Z. Recall that in Java, char data are a form of int data and can be used in arithmetic. If we let k be an integer that ranges between 0 and 25, then the expression (char)('A' + k) will correspond to the letters A to Z. Thus, the following loop will initialize the array correctly.

for (int k = 0; k < 26; k++) {     freqArr[k] = new LetterFreq((char)('A' + k), 0); } // for 


The countLetters() method must identify the array index for LetterFreq object that stores a letter between A and Z. If let is a char variable that stores such a letter, then the expression (let - 'A') will give the index of the array element corresponding to let. Thus the following code will calculate the frequencies of the letters in the string parameter str:

public void countLetters(String str) {     char let;                                // For use in the loop.     str = str.toUpperCase();     for (int k = 0; k < str.length(); k++) {         let = str.charAt(k);         if ((let >= 'A') && (let <= 'Z')) {             freqArr[let - 'A'].incrFreq();         } // if     } // for } // countLetters() 



[Page 419]

The definition of the printArray() method is completely straightforward:

public void printArray() {   for (int k = 0; k < 26; k++) {     System.out.print("letter: " + freqArr[k].getLetter());     System.out.println("    freq: " + freqArr[k].getFreq());   } // for } // printArray() 


The entire definition of AnalyzeFreq is given in Figure 9.12. We will modify this class later in the chapter to be able to sort the array after counting. The following main() method, either in this class or in its own class, will demonstrate how the class methods are used.

Figure 9.12. The AnalyzeFreq class definition.

public class AnalyzeFreq {   private LetterFreq[] freqArr;                       // An array of frequencies   public AnalyzeFreq() {     freqArr = new LetterFreq[26];     for (int k = 0; k < 26; k++) {       freqArr[k] = new LetterFreq((char)('A' + k), 0);     } // for   }   public void countLetters(String str) {     char let;                                         // For use in the loop.     str = str.toUpperCase();     for (int k = 0; k < str.length(); k++) {       let = str.charAt(k);       if ((let >= 'A') && (let <= 'Z')) {         freqArr[let - 'A'].incrFreq();       } // if     } // for   }   public void printArray() {     for (int k = 0; k < 26; k++) {       System.out.print("letter: " + freqArr[k].getLetter());       System.out.println(" freq: " + freqArr[k].getFreq());     } // for   } } // AnalyzeFreq class 

public static void main(String[] argv) {   AnalyzeFreq af = new AnalyzeFreq();   af.countLetters("Now is the time for all good students" +      " to study computer related topics.");   af.printArray(); } // main() 



[Page 420]
Self-Study Exercise

Exercise 9.7

Rewrite the main() of the AnalyzeFreq class so that it opens a file named freqtest.txt and counts the frequencies of the letters of the text stored in the file. You will need to use the Scanner class to read from the file, as was done in Chapter 4. Create a file named freqtest.txt that contains several hundred characters of typical English text to test the new main() method.




Java, Java, Java(c) Object-Orienting Problem Solving
Java, Java, Java, Object-Oriented Problem Solving (3rd Edition)
ISBN: 0131474340
EAN: 2147483647
Year: 2005
Pages: 275

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net