14.2 A Logikus Grammar


 
Building Parsers with Java
By Steven  John  Metsker

Table of Contents
Chapter  14.   Parsing a Logic Language

    Content

The building blocks of Logikus programs are structures and variables . A typical structure has a string as its functor and has terms enclosed in parentheses. For now, let's sketch a grammar that will recognize a structure such as

 starred(jamesCagney, "Mister Roberts", Year) 

The following grammar gives a first draft for recognizing Logikus structures and variables:

 structure    = functor ('(' commaList(term) ')'  Empty);  functor      = LowercaseWord  QuotedString; term         = structure  variable; variable     = UppercaseWord; commaList(p) = p (',' p)*; 

Note that the functor of a structure cannot be an uppercase word. Logikus shares this feature with Prolog; users need not and cannot declare variables. A string that begins with an uppercase letter is a variable, and a string that begins with a lowercase letter is a functor.

A Logikus program is a series of axioms terminated by semicolons. For example, Logikus must recognize the following three axioms:

 customer("Jasmine Jones", 2093);  order(2093, "Brimful", 2); q (Name, Type) :- customer(Name, CustomerNum),     order(CustomerNum, Type, Pounds); 

Here is an initial draft of a grammar for Logikus axioms:

 axiom   = structure (ruleDef  Empty);  ruleDef = ":-" commaList(structure); 

The grammar so far defines a fairly comprehensive logic language, but it does not yet provide for all the features of Logikus. In particular, you need to broaden the concept of the composition of a rule. Rules in Logikus are series of conditions, and a normal structure is only one type of condition. Other conditions you wish to allow are lists, comparisons, negations, and evaluations. For example, Logikus must recognize this rule:

 travel(AltBand, Name) :-      for (AltBand, 1000, 20000),     city(Name, Alt),     >(Alt, AltBand - 1000),     <= (Alt, AltBand); 

Logikus allows any combination of arithmetic operators and parentheses in its expressions. To capture this in the grammar, you can extend the definition of axioms as follows :

 axiom      = structure (ruleDef  Empty);  structure  = functor ('(' commaList(term) ')'  Empty); functor    = LowercaseWord  QuotedString; term       = structure  variable; variable   = UppercaseWord; ruleDef    = ":-" commaList(condition); condition  = structure  evaluation  comparison; evaluation =      '#' '(' arg ',' arg ')'; comparison = operator '(' arg ',' arg ')'; arg        = expression  functor; operator   = '<'  '>'  '='  "<=" ">="  "!=" ; expression = phrase ('+' phrase  '-' phrase)*; phrase     = factor ('*' factor  '/' factor)*; factor     = '(' expression ')'  Num  variable; commaList(p) = p (',' p)*; 

The evaluation rule recognizes any string of the form #(arg, arg) , such as #(LowerPlus, Lower + 1000) . The expressions allowed by evaluations and comparisons draw on the grammar developed in Chapter 7, "Parsing Arithmetic." One difference is that Logikus evaluations and comparisons work with either strings or numbers . For example, >( tyrannosaurus , triceratops ) is a valid Logikus comparison.

Another valid condition is the not condition. You can extend the grammar with

 condition = structure  evaluation  comparison  not;  not       = "not" structure; 

Logikus also supports lists. A list is a pair of brackets that may or may not have contents. The nonempty contents of a list are a series of terms separated by commas. A list may follow these terms with a vertical bar and a tail, which must be either another list or a variable. The following grammar rules succinctly state these principles:

 list         = '[' (listContents  Empty) ']';  listContents = commaList(term) listTail; listTail     = ('' (variable  list))  Empty; 

These rules refer to term , which we said earlier was either a structure or variable.

You need to widen this to

 term = structure  Num  list  variable; 

You also need to allow anonymous variables, and that means

 variable = UppercaseWord  '_'; 

Finally, you need to allow a single dot as a valid functor for a structure. This specifically allows the manual construction of lists, such as

 test(.(A, .(B, .(C, [])))) 

To allow the dot functor, you extend the functor rule

 functor = '.'  LowercaseWord  QuotedString; 

Here is the complete grammar for a Logikus axiom:

 axiom        = structure (ruleDef  Empty);  structure    = functor ('(' commaList(term) ')'  Empty); functor      = '.'  LowercaseWord  QuotedString; term         = structure  Num  list  variable; variable     = UppercaseWord  '_'; ruleDef      = ":-" commaList(condition); condition    = structure  not  evaluation  comparison                 list; not          = "not" structure; evaluation   =      '#' '(' arg ',' arg ')'; comparison   = operator '(' arg ',' arg ')'; arg          = expression  functor; operator     = '<'  '>'  '='  "<=" ">="  "!="; expression   = phrase ('+' phrase  '-' phrase)*; phrase       = factor ('*' factor  '/' factor)*; factor       = '(' expression ')'  Num  variable; list         = '[' (listContents  Empty) ']'; listContents = commaList(term) listTail; listTail     = ('' (variable  list))  Empty; commaList(p) = p (',' p)*; 

14.2.1 Comments in Logikus

The preceding grammar recognizes Logikus axioms, but it does not allow for comments. In fact, you want to define the Logikus language to allow comments both within and between axioms. You can keep the grammar simple if you allow for an understanding that you will, in practice, rely on a tokenizer to screen out comments. The implementation of Logikus in sjm.examples.logic uses a default Tokenizer object from sjm.parse.tokens . This class by default allows // comments, which comment out characters to the end of a line, and /**/ comments, which comment out their contents. See Section 9.7, "Tokenizer States," for an explanation of how Tokenizer handles comments.

14.2.2 Logikus Programs

The grammar developed in Section 14.2 recognizes Logikus axioms but not entire Logikus programs. You could extend the grammar to observe that a program is a semicolon-separated series of axioms, but it is more efficient to use a simple text-processing class to separate the input into axioms and then apply a parser to each axiom. The class TokenStringSource in sjm.parse.tokens parses a Logikus program string into statements. This simplifies the work of constructing the parser, and it reduces the amount of work the parser must perform.

Knowing that you need to build a parser for just Logikus axioms, you can generate the parser almost directly from the grammar. The parser, however, needs assemblers to build Structure objects, Variable objects, and the other components that can be composed into a Program object.


   
Top


Building Parsers with Java
Building Parsers With Javaв„ў
ISBN: 0201719622
EAN: 2147483647
Year: 2000
Pages: 169

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net