You want to do something with every word in a file. For example, you want to build a concordance of how many times each word is used to compute similarities between documents.
Read in each line with fgets( ), separate the line into words, and process each word, as in Example 23-24.
Processing each word in a file
Example 23-25 calculates the average word length in a file.
Calculating average word length
Processing every word proceeds differently depending on how "word" is defined. The code in this recipe uses the Perl-compatible regular expression engine's \s whitespace metacharacter, which includes space, tab, newline, carriage return, and formfeed. Recipe 1.5 breaks apart a line into words by splitting on a space, which is useful in that recipe because the words have to be rejoined with spaces. The Perl-compatible engine also has a word-boundary assertion (\b) that matches between a word character (alphanumeric) and a non-word character (anything else). Using \b instead of \s to delimit words most noticeably treats differently words with embedded punctuation. The term 6 o'clock is two words when split by whitespace (6 and o'clock); it's four words when split by word boundaries (6, o, ', and clock).
23.7.4. See Also
Recipe 22.2 discusses regular expressions to match words; Recipe 1.5 for breaking apart a line by words; documentation on fgets( ) at http://www.php.net/fgets, on preg_split( ) at http://www.php.net/preg-split, and on the Perl-compatible regular expression extension at http://www.php.net/pcre.