8.3 A Regular Expression Grammar

Building Parsers with Java
By Steven  John  Metsker

Table of Contents
Chapter  8.   Parsing Regular Expressions


A modestly ambitious regular expression language allows the user to express sequences, alternations , and repetitions of characters . The following grammar is based on rules given in Introduction to Automata Theory, Languages and Computation [Hopcroft and Ullman]. The rules recognize conventional operator precedence, and they avoid the problem of left recursion.

 expression    = term ('' term)*;  term          = factor factor*; factor        = phrase  (phrase '*'); phrase        = letterOrDigit  '(' expression ')'; letterOrDigit = Letter  Digit; 

Note that there is a major difference between the meaning of a vertical bar by itself and a vertical bar in single quotes. There is also a major difference between an asterisk and an asterisk in single quotes. A character in single quotes means that the quoted character must appear in text that the grammar is matching. Without these quotes, a vertical bar means alternation , and an asterisk means repetition.

The function of the term rule would be more apparent if you required an ampersand, say, to mean sequence. Then, instead of typing the expression "jaJaJA" the user would have to type "j&aJ&aJ&A" . The grammar rule would appear as:

 term = factor ('&' factor)*; 

The given grammar follows the convention of using juxtaposition to indicate sequence. A sequence of letters such as "ja" implies a sequence to match, and no operator is necessary.


Building Parsers with Java
Building Parsers With Javaв„ў
ISBN: 0201719622
EAN: 2147483647
Year: 2000
Pages: 169

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net