A Word-Count Program

I l @ ve RuBoard

A Word-Count Program

Now you have the tools to make a word-counting program, that is, a program that reads input and reports the number of words it finds. You may as well count characters and lines while you are at it. Let's see what such a program involves.

First, the program should read input character-by-character, and it should have some way of knowing when to stop. Second, it should be able to recognize and count the following units: characters, lines, words. Here's a pseudocode representation:

 read a character while there is more input      increment character count      if a line has been read, increment line count      if a word has been read, increment word count      read next character 

You already have a model for the input loop:

 while ((ch = getchar()) != STOP) {   ... } 

Here, STOP represents some value for ch that signals the end of the input. So far you have used the newline character and a period for this purpose, but neither is satisfactory for a general word-counting program. For the present, choose a character () that is not common in text. In Chapter 8, "Character Input/Output and Redirection," we'll present a better solution that also allows the program to be used with text files as well as keyboard input.

Now let's consider the body of the loop. Because the program uses getchar() for input, it can count characters by incrementing a counter during each loop cycle. To count lines, the program can check for newline characters. If a character is a newline, then the program should increment the line count.

The trickiest part is identifying words. First, you have to define what you mean by a word. Let's take a relatively simple approach and define a word as a sequence of characters that contains no whitespace (that is, no spaces, tabs, or newlines). Therefore, "glymxck" and "r2d2" are words. A word starts, then, when a program first encounters non-whitespace, and it ends when the next whitespace character shows up. The most straightforward test expression for detecting non-whitespace is this:

 c != ` ' && c != `\n' && c != `\t'   /* true if c is not whitespace */ 

And the most straightforward test for detecting whitespace is this:

 c == ` '  c == `\n'  c == `\t'   /* true if c is whitespace */ 

However, it is simpler to use the ctype .h function isspace () , which returns true if its argument is a whitespace character. So isspace(c) is true if c is whitespace, and !isspace(c) is true if c isn't whitespace.

To keep track of whether a character is in a word, you can set a flag (call it wordflag ) to 1 when the first character in a word is read. You can also increment the word count at that point. Then, as long as wordflag remains 1 , subsequent non-whitespace characters don't mark the beginning of a word. At the next whitespace character, you must reset the flag to , and the program will be ready to find the next word. Let's put that into pseudocode:

 if c is not whitespace and wordflag is 0      set wordflag to 1 and count the word if c is whitespace and wordflag is 1      set wordflag to 0 

This approach sets wordflag to 1 at the beginning of each word and to at the end of each word. Words are counted only at the time the flag setting is changed from to 1 . Listing 7.7 translates these ideas into C.

Listing 7.7 The wordcnt.c program.
 /* wordcnt.c -- counts characters, words, lines */ #include <stdio.h> #include <ctype.h>       /* for isspace()            */ #define STOP `' #define YES 1 #define NO 0 int main(void) {    char c;               /* read in character        */    long n_chars = 0L;    /* number of characters     */    int n_lines = 0;      /* number of lines          */    int n_words = 0;      /* number of words          */    int wordflag = NO;    /* ==YES if c is in a word  */    printf("Enter text to be analyzed ( to terminate):\n");    while ((c = getchar()) != STOP)    {       n_chars++;              /* count characters    */       if (c == `\n')          n_lines++;           /* count lines         */       if (!isspace(c) && wordflag == NO)       {          wordflag = YES;      /* starting a new word */          n_words++;           /* count word          */       }       if (isspace(c) && wordflag == YES)          wordflag = NO;       /* reached end of word */    }    printf("characters = %ld, words = %d, lines = %d\n",          n_chars, n_words, n_lines);    return 0; } 

Here is a sample run:

 Enter text to be analyzed ( to terminate):  Reason is a   powerful servant but   an inadequate master.  characters = 55, words = 9, lines = 3 

The program uses logical operators to translate the pseudocode to C. For example,

 if c is not whitespace and wordflag is 0 

gets translated to the following:

 if (!isspace(c) && wordflag == NO) 

This certainly is more readable than testing for each whitespace character individually:

 if (c != ` ' && c != `\n' && c != `\t' && wordflag == NO) 

Either form says, "If c is not whitespace, and if you are not in a word." If both conditions are met, then you must be starting a new word, and n_words is incremented. If you are in the middle of a word, then the first condition holds, but wordflag will be YES , and n_words is not incremented. When you reach the next whitespace character, you set wordflag equal to NO again. Check the coding to see whether the program gets confused when there are several spaces between one word and the next. In Chapter 8, we'll show how to modify this program to count words in a file.

I l @ ve RuBoard


C++ Primer Plus
C Primer Plus (5th Edition)
ISBN: 0672326965
EAN: 2147483647
Year: 2000
Pages: 314
Authors: Stephen Prata

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net