27.2 Code Design | Practical C Programming, 3rd Edition

I l @ ve RuBoard

There are several schools of code design. In structured programming, you divide the code into modules, the module into submodules, the submodules into subsubmodules, and so on. This is also known as procedure-oriented programming. In object-oriented programming, you try to think of the problem as a collection of data that you manipulate through member functions.

There are also other approaches, such as state tables and transition diagrams. All of these have the same basic principle at heart: "Arrange the program's information in the clearest and simplest way possible and try to turn it into C++ code."

Our program breaks down into several logical modules. First, there is a token scanner, which reads raw C++ code and turns it into tokens. Actually, this function sub-divides into two smaller modules. The first reads the input stream and determines what type of character we have. The second takes in character-type information and uses it to assemble tokens. The other module contains the statistics gathering and a small main program.

27.2.1 Token Module

Our program scans C++ source code and uses the tokens to generate statistics. A token is a group of characters that form a single word, number, or symbol. For example, the line:

 answer = (123 + 456) / 89;  // Compute some sort of result

consists of the tokens:

 T_ID                        The word "answer" T_OPERATOR                  The character "=" T_L_PAREN                   Left parenthesis T_NUMBER                    The number 123 T_OPERATOR                  The character "+" T_NUMBER                    The number 456 T_R_PAREN                   Right parenthesis T_OPERATOR                  The divide operator T_NUMBER                    The number 89 T_OPERATOR                  The semicolon T_COMMENT                   The // comment T_NEW_LINE                  The end-of-line character

Our token module needs to identify groups of characters. For example, an identifier is defined as a letter or underscore , followed by any number of letters or digits. Our tokenizer thus needs to contain the pseudocode:

 If the current character is a letter then          scan until we get a character that's not a letter or digit

As you can see from the pseudocode, our tokenizer depends a great deal on character types, so we need a module to help us with the type information.

27.2.2 Character-Type Module

The purpose of the character-type module is to read characters and decode their types. Some types overlap. For example, C_ALPHA_NUMERIC includes the C_NUMERIC character set. This module stores most of the type information in an array and requires only a little logic to handle the special types like C_ALPHA_NUMERIC.

27.2.3 Statistics Class

In this program, a statistic is an object that consumes tokens and outputs statistics. We start by defining an abstract class for our statistics. This class is used as the basis for the statistics we are collecting. The class diagram can be seen in Figure 27-1.

Figure 27-1. Statistics class hierarchy

Our definition of a statistic is "something that uses tokens to collect statistics." These statistics may be printed at the beginning of each line or at the end of the file.

Our four statistics are more specific. For example, the class paren_counter counts the nesting of parentheses as well as the maximum nesting. The current nesting is printed at the beginning of each line (the "(" number). The maximum nesting level is written out at the end of the file.

The other classes are defined in a similar manner. The only trick used here is that we've made the line numbering a statistic. It counts the number of T_NEW_LINE tokens and outputs that count at the start of each line.

I l @ ve RuBoard