Counting the Number of Characters, Words, and Lines in a Text File

Problem

You have to count the numbers of characters, words, and linesor some other type of text elementin a text file.

Solution

Use an input stream to read the characters in, one at a time, and increment local statistics as you encounter characters, words, and line breaks. Example 4-26 contains the function countStuff, which does exactly that.

Example 4-26. Calculating statistics about a text file

#include 
#include 
#include 
#include 

using namespace std;

void countStuff(istream& in,
 int& chars,
 int& words,
 int& lines) {

 char cur = '';
 char last = '';
 chars = words = lines = 0;

 while (in.get(cur)) {
 if (cur == '
' ||
 (cur == 'f' && last == '
'))
 lines++;
 else
 chars++;
 if (!std::isalnum(cur) && // This is the end of a
 std::isalnum(last)) // word
 words++;
 last = cur;
 }
 if (chars > 0) { // Adjust word and line
 if (std::isalnum(last)) // counts for special
 words++; // case
 lines++;
 }
}

int main(int argc, char** argv) {

 if (argc < 2)
 return(EXIT_FAILURE);

 ifstream in(argv[1]);

 if (!in)
 exit(EXIT_FAILURE);

 int c, w, l;

 countStuff(in, c, w, l);
1
 cout << "chars: " << c << '
';
 cout << "words: " << w << '
';
 cout << "lines: " << l << '
';
}

 

Discussion

The algorithm here is straightforward. Characters are easy: increment the character count each time you call get on the input stream. Lines are only slightly more difficult, since the way a line ends depends on the operating system. Thankfully, it's usually either a new-line character ( ) or a carriage return line feed sequence ( l). By keeping track of the current and last characters, you can easily capture occurrences of this sequence. Words are easy or hard, depending on your definition of a word.

For Example 4-26, I consider a word to be a contiguous sequence of alphanumeric characters. As I look at each character in the input stream, when I encounter a nonalphanumeric character, I look at the previous character to see if it was alphanumeric. If it was, then a word has just ended and I can increment the word count. I can tell if a character is alphanumeric by using isalnum from . But that's not allyou can test characters for a number of different qualities with similar functions. See Table 4-3 for the functions you can use to test character qualities. For wide characters, use the functions of the same name but with a "w" after the "is," e.g., iswspace. The wide-character versions are declared in the header .

Table 4-3. Character test functions from and

Function

Description

isalpha
iswalpha

Alpha characters: a-z, A-Z (upper- or lowercase).

isupper
iswupper

Alpha characters in uppercase only: A-Z.

islower
iswlower

Alpha characters in lowercase only: a-z.

isdigit
iswdigit

Numeric characters: 0-9.

isxdigit
iswxdigit

Hexadecimal numeric characters: 0-9, a-f, A-F.

isspace
iswspace

Whitespace characters: ' `, , , v, , l.

iscntrl
iswcntrl

Control characters: ASCII 0-31 and 127.

ispunct
iswpunct

Punctuation characters that don't belong to the previous groups.

isalnum
iswalnum

isalpha or isdigit is true.

isprint
iswprint

Printable ASCII characters.

isgraph
iswgraph

isalpha or isdigit or ispunct is true.

After all characters have been read in and the end of the stream has been reached, there is a bit of adjustment to do. First, the loop only counts line breaks, and not, strictly speaking, lines. Therefore, it will always be one less than the actual number of lines. To make this problem go away I just increment the line count by one if there are more than zero characters in the file. Second, if the stream ends with an alphanumeric character, the test for the end of the last word will never occur because I can't test the next character. To account for this, I check if the last character in the stream is alphanumeric (also only when there are more than zero characters in the file) and increment the word count by one.

The technique in Example 4-26 of using streams is nearly identical to that described in Recipe 4.14 and Recipe 4.15, but simpler since it's just inspecting the file and not making any changes.

See Also

Recipe 4.14 and Recipe 4.15

Building C++ Applications

Code Organization

Numbers

Strings and Text

Dates and Times

Managing Data with Containers

Algorithms

Classes

Exceptions and Safety

Streams and Files

Science and Mathematics

Multithreading

Internationalization

XML

Miscellaneous

Index



C++ Cookbook
Secure Programming Cookbook for C and C++: Recipes for Cryptography, Authentication, Input Validation & More
ISBN: 0596003943
EAN: 2147483647
Year: 2006
Pages: 241

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net