27.4 Functional Description | Practical C Programming, 3rd Edition

I l @ ve RuBoard

This section describes all the classes and major functions in our program. For a more complete and detailed description, take a look at the listings at the end of this chapter.

27.4.1 char_type Class

The char_type class sets the type of a character. For the most part, this is done through a table named type_info . Some types, such as C_ALPHA_NUMERIC, include two different types of characters , C_ALPHA and C_DIGIT . Therefore, in addition to our table, we need a little code for the special cases.

27.4.2 input_file Class

This class reads data from the input file one character at a time. It buffers a line and on command writes the line to the output.

27.4.3 token Class

We want an input stream of tokens. We have an input stream consisting of characters. The main function of this class, next_token , turns characters into tokens. Actually, our tokenizer is rather simple, because we don't have to deal with most of the details that a full C++ tokenizer must handle.

The coding for this function is fairly straightforward, except for the fact that it breaks up multiline comments into a series of T_COMMENT and T_NEW_LINE tokens.

One clever trick is used in this section. The TOKEN_LIST macro is used to generate an enumerated list of token types and a string array containing the names of each of the tokens. Let's examine how this is done in more detail.

The definition of the TOKEN_LIST class is:

 #define TOKEN_LIST \    T(T_NUMBER),      /* Simple number (floating point or integer) */ \    T(T_STRING),      /* String or character constant */              \    T(T_COMMENT),     /* Comment */                                   \    T(T_NEWLINE),     /* Newline character */                         \    T(T_OPERATOR),    /* Arithmetic operator */                       \    T(T_L_PAREN),     /* Character "(" */                             \    T(T_R_PAREN),     /* Character ")" */                             \    T(T_L_CURLY),     /* Character "{" */                             \    T(T_R_CURLY),     /* Character "}" */                             \    T(T_ID),          /* Identifier */                                \    T(T_EOF)          /* End of File */

When invoked, this macro will generate the code:

 T(T_NUMBER), T(T_STRING), // .. and so on

If we define a T macro, it will be expanded when the TOKEN_LIST macro is expanded. We would like to use the TOKEN_LIST macro to generate a list of names, so we define the T macro as:

 #define T(x) x          // Define T(  ) as the name

Now, our TOKEN_LIST macro will generate:

 T_NUMBER, T_STRING, // .. and so on

Putting all this together with a little more code, we get a way to generate a TOKEN_TYPE enum list:

 #define T(x) x          // Define T(  ) as the name enum TOKEN_TYPE {    TOKEN_LIST }; #undef T                // Remove old temporary macro

Later we redefine T so it generates a string:

 #define T(x) #x         // Define x as a string

This allows us to use TOKEN_LIST to generate a list of strings containing the names of the tokens:

 #define T(x) #x         // Define x as a string const char *const TOKEN_NAMES[] = {    TOKEN_LIST }; #undef T                // Remove old temporary macro

When expanded, this macro generates:

 const char *const TOKEN_NAMES[] = {     "T_NUMBER",     "T_STRING",     //....

Using tricks like this is acceptable in limited cases. However, such tricks should be extensively commented so the maintenance programmer who has to fix your code can understand what you did.

27.4.4 stat Class

stat class is an abstract class that is used as a basis for the four real statistics we are collecting. It starts with a member function to consume tokens. This function is a pure virtual function, which means that any derived classes must define the function take_token :

 class stat {     public:         virtual void take_token(TOKEN_TYPE token) = 0;

The function take_token generates statistics from tokens. We need some way of printing them in two places. The first is at the beginning of each line, and the second is at the end of the file. Our abstract class contains two virtual functions to handle these two cases:

 virtual void line_start(  ) {};         virtual void eof(  ) {}; };

Unlike take_token , these functions have default bodies ”empty bodies, but bodies just the same. What does this mean? Our derived classes must define take_token . They don't have to define line_start or eof.

27.4.5 line_counter Class

The simplest statistic we collect is a count of the number of lines processed so far. This counting is done through the line_counter class. The only token it cares about is T_NEW_LINE . At the beginning of each line it outputs the line number (the current count of the T_NEW_LINE tokens). At the end of file, this class outputs nothing. As a matter of fact, the line_counter class doesn't even define an eof function. Instead, we let the default in the base class ( stat ) do the "work."

27.4.6 brace_counter Class

This class keeps track of the nesting level of the curly braces { }. We feed the class a stream of tokens through the take_token member function. This function keeps track of the left and right curly braces and ignores everything else:

 // Consume tokens,  count the nesting of {}  void brace_counter::take_token(TOKEN_TYPE token) {     switch (token) {         case T_L_CURLY:             ++cur_level;             if (cur_level > max_level)                 max_level = cur_level;             break;         case T_R_CURLY:             --cur_level;             break;         default:             // Ignore             break;     } }

The results of this statistic are printed in two places. The first is at the beginning of each line. The second is at the end-of-file. We define two member functions to print these statistics:

 // Output start of line statistics  // namely the current line number void brace_counter::line_start(  ) {    std::cout.setf(ios::left);    std::cout.width(2);    std::cout << '{' <<  cur_level << ' ';    std::cout.unsetf(std::ios::left);    std::cout.width(  ); } // Output eof statistics // namely the total number of lines void brace_counter::eof(  ) {    std::cout << "Maximum nesting of {} : " << max_level << '\n'; }

27.4.7 paren_counter Class

This class is very similar to the brace_counter class. As a matter of fact, it was created by copying the brace_counter class and performing a few simple edits.

We probably should combine the paren_counter class and the brace_counter class into one class that uses a parameter to tell it what to count. Oh well, something for the next version.

27.4.8 comment_counter Class

In this class, we keep track of lines with comments in them, lines with code in them, lines with both comments and code, and lines with none. The results are printed at the end of file.

27.4.9 do_file Procedure

The do_file procedure reads each file one token at a time, and sends them to the take_token routine for every statistic class. But how does it know what statistics classes to use? There is a list:

 static line_counter line_count;         // Counter of lines static paren_counter paren_count;       // Counter of (  ) levels static brace_counter brace_count;       // Counter of {} levels static comment_counter comment_count;   // Counter of comment info // A list of the statistics we are collecting static stat *stat_list[] = {     &line_count,     &paren_count,     &brace_count,     &comment_count,     NULL };

A couple of things should be noted about this list: although line_count , paren_count , brace_count , and comment_count are all different types, they are all based on the type stat . This means that we can put them in an array called stat_list . This design also makes it easy to add another statistic to the list. All we have to do is define a new class and put a new entry in the stat_list .

I l @ ve RuBoard