Dissecting an Expression


To evaluate an expression, a parser needs to be fed the individual components of that expression. For example, the expression

 A * B  (W + 10)

contains these individual parts: A, *, B, , (, W, +, 10, and ). Each component of an expression is called a token, and each token represents an indivisible unit of the expression. Since tokenizing an expression is fundamental to parsing, let’s look at it before examining the parser itself.

To render an expression into tokens, you need a method that sequentially returns each individual token in the expression, moving from start to finish. The method must also be able to skip over spaces and detect the end of the expression. In the parser developed here, the method that performs this task is called GetToken( ).

Both parsers in this chapter are encapsulated within the Parser class. Although this class is described in detail later, the first part of it needs to be shown here so that the workings of GetToken( ) can be explained. Parser begins by defining the enumerations and fields shown here:

 class Parser {   // Enumerate token types.   enum Types { NONE, DELIMITER, VARIABLE, NUMBER };   // Enumerate error types.   enum Errors { SYNTAX, UNBALPARENS, NOEXP, DIVBYZERO };   string exp;    // refers to expression string   int expIdx;    // current index into the expression   string token;  // holds current token   Types tokType; // holds token's type

Notice first the two enumerations. The first is Types. When parsing an expression, a token typically has a type associated with it. For the parsers developed in this chapter, only three types are needed: variable, number, and delimiter. These are represented by the values VARIABLE, NUMBER, and DELIMITER defined by the Types enumeration. The DELIMITER category is used for both operators and parentheses. The NONE type is just a placeholder value for an undefined token. The Errors enumeration represents various parsing errors.

A reference to the string that holds the expression being parsed is stored in exp. Thus, exp will refer to a string such as “10+4”. The index of the next token within that string is held in expIdx, which is initially zero. The token that is obtained is stored in token, and its type is stored in tokType.

The GetToken( ) method that is used by the parser is shown here. Each time it is called, it obtains the next token from an expression in the string referred to by exp, beginning at expIdx. In other words, each time GetToken( ) is called, it obtains the next token at exp[expIdx]. It puts this token into the token field. It puts the type of the token into tokType. GetToken( ) uses the IsDelim( ) method, which is also shown here:

 // Obtain the next token. void GetToken() {   tokType = Types.NONE;   token = "";   if(expIdx == exp.Length) return; // at end of expression   // skip over whitespace   while(expIdx < exp.Length &&         Char.IsWhiteSpace(exp[expIdx])) ++expIdx;   // trailing whitespace ends expression   if(expIdx == exp.Length) return;   if(IsDelim(exp[expIdx])) { // is operator     token += exp[expIdx];     expIdx++;     tokType = Types.DELIMITER;   }   else if(Char.IsLetter(exp[expIdx])) { // is variable     while(!IsDelim(exp[expIdx])) {       token += exp[expIdx];       expIdx++;       if(expIdx >= exp.Length) break;     }     tokType = Types.VARIABLE;   }   else if(Char.IsDigit(exp[expIdx])) { // is number     while(!IsDelim(exp[expIdx])) {       token += exp[expIdx];       expIdx++;       if(expIdx >= exp.Length) break;     }     tokType = Types.NUMBER;   } } // Return true if c is a delimiter. bool IsDelim(char c) {   if((" +-/*%^=()".IndexOf(c) != -1))     return true;   return false; }

Look closely at GetToken( ). After the first two initializations, GetToken( ) checks to determine whether the end of the expression has been reached by seeing if expIdx is equal to exp.Length. Since expIdx is an index into the expression being analyzed, if it equals the length of the expression string, the expression has been fully parsed.

If there are still more tokens to retrieve from the expression, GetToken( ) first skips over any leading spaces. Once the spaces have been skipped, exp[expIdx] contains either a digit, a variable, an operator, or, if trailing spaces end the expression, a space. If the next character is an operator, it is returned as a string in token, and Types.DELIMITER is stored in tokType. If the next character is a letter instead, it is assumed to be one of the variables. It is returned as a string in token, and tokType is assigned the value Types.VARIABLE. If the next character is a digit, the entire number is read and stored in its string form in token and its type is Types.NUMBER. Finally, if the next character is none of the preceding, token will contain a null string and tokType will contain Types.NONE.

To keep the code in GetToken( ) clear, a certain amount of error checking has been omitted, and some assumptions have been made. For example, any unrecognized character can end an expression as long as it is preceded by a space. Also, in this version, variables can be of any length, but only the first letter is significant. You can add more error checking and other details as your specific application dictates.

To better understand how GetToken( ) works, consider this expression:

 A + 100  (B * C) / 2

When tokenizing this expression, GetToken( ) returns the following tokens and types:

Token

Token Type

A

VARIABLE

+

DELIMITER

100

NUMBER

DELIMITER

(

DELIMITER

B

VARIABLE

*

DELIMITER

C

VARIABLE

)

DELIMITER

/

DELIMITER

2

NUMBER

Remember that token always holds a string, even if it contains just a single character.




C# 2.0(c) The Complete Reference
C# 2.0: The Complete Reference (Complete Reference Series)
ISBN: 0072262095
EAN: 2147483647
Year: 2006
Pages: 300

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net