5.5 DATA STRUCTURES FOR REPRESENTING PARSING TABLES


5.4 THE LR PARSER

The LR parser is a shift-reduce parser that makes use of a deterministic finite automata , recognizing the set of all viable prefixes by reading the stack from bottom to top. It determines what handle, if any, is available. A viable prefix of a right sentential form is that prefix that contains a handle, but no symbol to the right of the handle. Therefore, if a finite-state machine that recognizes viable prefixes of the right sentential forms is constructed , it can be used to guide the handle selection in the shift-reduce parser.

Since the LR parser makes use of a DFA that recognizes viable prefixes to guide the selection of handles, it must keep track of the states of the DFA. Hence, the LR parser stack contains two types of symbols: state symbols used to identify the states of the DFA and grammar symbols. The parser starts with the initial state of a DFA 10 on the stack. The parser operates by looking at the next input symbol a and the state symbol I i on the top of the stack. If there is a transition from the state I i on a in the DFA going to state I j , then it shifts the symbol a , followed by the state symbol I j , onto the stack. If there is no transition from I i on a in the DFA, and if the state I i on the top of the stack recognizes, when entered, a viable prefix that contains the handle A ± , then the parser carries out the reduction by popping ± and pushing A onto the stack. This is equivalent to making a backward transition from I i on ± in the DFA and then making a forward transition on A . Every shift action of the parser corresponds to a transition on a terminal symbol in the DFA. Therefore, the current state of the DFA and the next input symbol determine whether the parser shifts the next input symbol goes for reduction.

If we construct a table mapping every state and input symbol pair as either "shift," "reduce," "accept," or "error," we get a table that can be used to guide the parsing process. Such a table is called a parsing "action" table. When carrying out the reduction by A ± , the parser has to pop ± and push A onto the stack. This requires knowledge of where the transition goes in a DFA from the state brought onto the top of the stack after popping ± on the nonterminal A ; and hence, we require another table mapping of every state and nonterminal pair into a state. The table of transitions on the nonterminals in the DFA is called a "goto" table. Therefore, to create an LR parser we require an ActionGOTO table.

If the current state of a DFA has a transition on the terminal symbol a to the state I j , then the next move will be to shift the symbol a and enter the state I j . But if the current state of the DFA is one in which when entered recognizes that a viable prefix contains the handle, then the next move of the parser will be to reduce.

Therefore, an LR parser is comprised of an input buffer (which holds the input string w to be parsed and assumed to be terminated by the right endmarker $), a stack holding the viable prefixes of the right sentential forms, and a parsing table that is obtain by mapping the moves of a DFA that recognizes viable prefixes and controls the parsing actions. The configuration of a parser is given by a token pair: the first component is a stack's content, and second component is unexpended input. If, at a particular instant (and $ is used as bottom-of-the-stack marker, also), a parser is configured as follows :

where I i is a state symbol identifying the state of a DFA recognizing the viable prefixes, and X i is the grammar symbol. The parser consults the parsing action table entry, [ I m , a i ]. If action[ I m , a i ] = S j , then the parser shifts the next input symbol followed by the state I j on the stack and enters into the configuration:

If action[ I m , a i ] = reduce by production A ± , then the parser carries out the reduction as follows. If ± = r , then the parser pops two r symbols from the stack (because every shift action shifts a grammar symbol as well as state symbol), thereby bringing I m ˆ’ r on the top. It then consults the goto table entry, goto[ I m ˆ’ r , A ]. If goto[ I m ˆ’ r , A ] = I k , then it shifts A followed by I k onto the stack, thereby entering into the configuration:

If action[ I m , a i ] = accept, then the parser halts and accepts the input string. If action[ I m , a i ] = error, then the parser invokes a suitable error-recovery routine. Initially the parser will be in the configuration given by the pair ($ I , w $). Therefore, we conclude that parsing table construction involves constructing a DFA that recognizes the viable prefixes of the right sentential forms, using the given grammar, and then maps its the moves into the form of the ActionGOTO table. To construct such a DFA, we make use of the items that are part of a grammar's productions . Here, an item called the " LR (0)" of a production is a production with a dot placed at some position on the right side of the production. For example if A XYZ is a production, then the following items can be generated from it:

If the length of the right side of the production is n, then there are ( n +1) different positions on the right side of a production where a dot can be placed. Hence, the number of items that can be generated are ( n +1).

The dot's position on the right side tells us how much of the right-hand side of the production is seen in the process of parsing. For example, the item A X . YZ tells us that we have already seen a string derivable from X in the input and expect to see the string derivable from YZ next in the input.

5.4.1 Augmented Grammar

To construct a DFA that recognizes the viable prefixes, we make use of augmented grammar, which is defined as follows: if G = ( V, T, P, S ) is a given grammar, then the augmented grammar will be G 1 = ( V ˆ { S 1 }, T , P ˆ { S 1 S }, S 1 ); that is, we add a unit production S 1 S to the grammar G and make S 1 the new start symbol. The resulting grammar will be an augmented grammar. The purpose of augmenting the grammar is to make it explicitly clear to parser when to accept the string. Parsing will stop when the parser is on the verge of carrying out the reduction using S 1 S . A NFA that recognizes the viable prefixes will be a finite automata whose states correspond to the production items of the augmented grammar. Every item represents one state in the automata, with the initial state corresponding to an item S 1 S . The transitions in the automata are defined as follows:

( A ± . B ² , ˆˆ ) = B . ³ (This transition is required, because if the current state is A ± . B ² , that means we have not yet seen a string derivable from the nonterminal B ; and since B ³ is a production of the grammar, unless we see ³ , we will not get B . Therefore, we have to travel the path that recognizes ³ , which requires entering into the state identified by B . ³ without consuming any input symbols.)

This NFA can then be transformed into a DFA using the subset construction method. For example, consider the following grammar:

The augmented grammar is:

The items that can be generated using these productions are:

Therefore, the transition diagram of the NFA that recognizes viable prefixes is as shown in Figure 5.1.

click to expand
Figure 5.1: NFA transition diagram recognizes viable prefixes.

The DFA equivalent of the NFA shown in Figure 5.1 is, by using subset construction, illustrated in Figure 5.2.

click to expand
Figure 5.2: Using subset construction, a DFA equivalent is derived from the transition diagram in Figure 5.1.

Therefore, every state of the DFA that recognizes viable prefixes is a set of items; and hence, the set of DFA states will be a collection of sets of items ”but any arbitrary collection of set of items will not correspond to the DFA set of states. A set of items that corresponds to the states of a DFA that recognizes viable prefixes is called a "canonical collection". Therefore, construction of a DFA involves finding canonical collection. An algorithm exists that directly obtains the canonical collection of LR(0) sets of items, thereby allowing us to obtain the DFA. Using this algorithm, we can directly obtain a DFA that recognizes the viable prefixes, rather than going through NFA to DFA transformation, as explained above. The algorithm for finding out the canonical collection of LR(0) sets of items makes use of the closure and goto functions. The set closure( I ), where I is a set of items, is computed as follows:

  1. Add every item in I to closure( I )

  2. Repeat
    For every item of the form A ± . B ² in closure( I ) do

    For every production B ² do
    Add B ² . to closure( I )

    Until no new item can be added to closure( I )

For example, consider the following grammar:

That is, to find out goto from I on X , first identify all the items in I in which the dot precedes X on the right side. Then, move the dot in all the selected items one position to the right(i.e., over X ), and then take a closure of the set of these items.

5.4.2 An Algorithm for Finding the Canonical Collection of Sets of LR(0) Items

/* Let C be the canonical collection of sets of LR(0) items. We maintain C new and C old to continue the iterations*/
Input: augmented grammar
Output: canonical collection of sets of LR(0) items (i.e., set C )

  1. C old =

  2. add closure ({ S 1 . S }) to C

  3. while C old ‰  C new do

  4. C = C new

For example consider the following grammar:

The augmented grammar is:

Initially, C old = . First we obtain:

We call it I and add it to C new . Therefore:

In the first iteration, we obtain the goto from I on every grammar symbol, as shown below:

Add it to C new :

Add it to C new :

Add it to C new :

Add it to C new :

Therefore, at the end of first iteration:

In the second the iteration:

So, in the second iteration, we obtain goto from { I 1 , I 2 , I 3 , I 4 }on every grammar symbol, as shown below:

Add it to C new :

Add it to C new :

Therefore, at the end of the second iteration:

In the third iteration:

In the third iteration, we obtain goto from { I 5 , I 6 } on every grammar symbol, as shown below:

Add it to C new :

Add it to C new :

Therefore, at the end of the third iteration:

In the fourth iteration:

So, in the fourth iteration, we obtain a goto from { I 7 , I 8 } on every grammar symbol, as shown below:

At the end of fourth iteration:

The transition diagram of the DFA is shown in Figure 5.3.

click to expand
Figure 5.3: DFA transition diagram showing four iterations for a canonical collection of sets.

5.4.3 Construction of a Parsing ActionGOTO Table for an SLR(1) Parser

The methods for constructing the parsing ActionGOTO table are described below.

Construction of the Action Table

  1. For every state I 1 in C do
    for every terminal symbol a do
    if goto( I i , a ) = I j , then
    make action[ I i , a ] = S j /*for shift and enter into the state j */

  2. For every state I i in C whose underlying set of LR(0) items contains an item of the form A ± .do
    for every b in FOLLOW( A ) do
    make action[ I i , b ] = R k /*where k is the number of the production A ± standing for reduce by A ± */

  3. Make [ I i , $) = accept if I i contains an item S 1 S .

It is obvious that if a state I i has a transition on a terminal a going to I j , then the parser's next move will be to shift and enter into state j . Therefore, the shift entries in the action table are the mappings of the transitions in the DFA on terminals. Similarly, if state I i corresponds to the viable prefix that contains the right side of the production A ± , then the parser will call a reduction by A ± on all those symbols that are in the FOLLOW( A ). This is because if the next input symbol happens to be a terminal symbol that can FOLLOW( A ), then only the reduction by A ± may lead to a previous right-most derivation. That is, if the next input symbol belongs to FOLLOW( A ), then the position of ± can be considered to be the one where, if it is replaced by A , we might get a previous right-most derivation. Whether or not A ± is a handle is decided in this manner.

The initial state is the one whose underlying set of items' representations contain an item S 1 . S . This method is called "SLR(1)" ” ± Simple LR; and the (1) indicates a length of one lookahead (the next symbol used by the parser to decide its next move) used. Therefore, this parsing table is an SLR parsing table. (When the parentheses are not specified, the length of the lookahead is assumed to be one.)

Construction of the Goto Table

A goto table is simply a mapping of transitions in the DFA on nonterminals. Therefore, it is constructed as follows:

  • For every I i in C do

  • For every nonterminal A do

    if goto( I i , A ) = I i then

  • Make GOTO[ I i , A ) = j

Therefore, the SLR parsing table for the grammar having the following productions is shown in Table 5.4.

Table 5.4: ActionGOTO SLR Parsing Table

Action Table

GOTO Table

 

id

+

*

$

E

T

F

I

S 4

     

1

2

3

I 1

 

S 5

 

Accept

     

I 2

 

R 2

S 6

R 2

     

I 3

 

R 4

R 4

R 4

     

I 4

 

R 5

R 5

R 5

     

I 5

S 4

       

7

3

I 6

S 4

         

8

I 7

 

R 1

S 6

R 1

     

I 8

 

R 3

R 3

R 3

     

The productions are numbered as:

EXAMPLE 5.2
start example

Consider the following grammar:

The augmented grammar is:

end example
 

The canonical collection of sets of LR(0) items are computed as follows.

The transition diagram of the DFA is shown in Figure 5.4.

click to expand
Figure 5.4: Transition diagram for Example 5.2 DFA.

Therefore, the grammar has the following productions:

which are numbered as:

has an SLR parsing table as shown in Table 5.5.

Table 5.5: SLR Parsing Table

Action Table

GOTO Table

 

c

d

$

S

C

I

S 3

S 4

 

1

2

I 1

   

accept

   

I 2

S 3

S 4

   

5

I 3

S 3

S 4

   

6

I 4

R 3

R 3

R 3

   

I 5

   

R 1

   

I 6

R 2

R 2

R 2

   

By using the method discussed above, a parsing table can be obtained for any grammar. But the action table obtained from the method above will not necessarily be without multiple entries for every grammar. Therefore, we define a SLR(1) grammar as one in which we can obtain the action table without multiple entries by using the method described. If the action table contains multiple entries, then the grammar for which the table is obtained is not SLR(1) grammar.

For example, consider the following grammar:

The augmented grammar will be:

The canonical collection sets of LR(0) items are computed as follows.

The transition diagram for the DFA is shown in Figure 5.5.

click to expand
Figure 5.5: DFA Transition diagram.

Table 5.6 shows the SLR parsing table for the grammar having the following productions:

Table 5.6: Action GOTO SLR Parsing Table

Action Table

GOTO Table

 

a

b

$

S

A

B

I

R 3 / R 4

R 3 / R 4

 

1

2

3

I 1

   

Accept

     

I 2

S 4

         

I 3

 

S 5

       

I 4

R 3

R 3

   

6

 

I 5

R 4

R 4

     

7

I 6

 

S 8

       

I 7

S 9

         

I 8

   

R 1

     

I 9

   

R 2

     

The productions are numbered as follows:

Since the action table shown in Table 5.6 contains multiple entries, the above grammar is not SLR(1).

SLR(1) grammars constitute a small subset of context-free grammars, so an SLR parser can only succeed on a small number of context-free grammars. That means an SLR parser is a less-powerful LR parser. (The power of the parser is measured in terms of the number of grammars on which it can succeed.) This is because when an SLR parser sees a right-hand-side production rule A ± on the top of the stack, it replaces this rule by the left-hand-side nonterminal A if the next input symbol can FOLLOW the nonterminal A . But sometimes this reduction may not lead to the generation of previous right-most derivations . For example, the parser constructed above can do the reduction by the production A ˆˆ in the state I if the next input symbol happens to be either a or b , because both a and b are in the FOLLOW( A ). But since the reduction by A ˆˆ in I leads to the generation of a first instance of A in the sentential form AaAb , this reduction proves to be a proper one if the next input symbol is a . This is because the first instance of A in the sentential form AaAb is followed by a . But if the next input symbol is b , then this is not a proper reduction, because even though b follows A , b follows a second instance of A in the sentential form AaAb . Similarly, if the parser carries out the reduction by A ˆˆ in the state I 4 , then it should be done if the next input symbol is b , because reduction by A ˆˆ in I 4 leads to the generation of a second instance of A in the sentential form AaAb .

Therefore, we conclude that if:

  1. We let terminal a follow the first instance of A and let terminal b follow the second instance of A in the sentential form AaAb ;

  2. We associate a with the item A . in I and terminal b with item A . in I 4 ;

  3. The parser has been asked to carry out a reduction by A ˆˆ in I on those terminals associated with the item A . in I , and carry out the reduction A ˆˆ in I 4 on those terminals associated with the item A . in I 4 ;

then there would have been no conflict and the parser would have been more powerful. But this requires associating a list of terminals (lookaheads) with the items. You may recall (see Chapter 4) that lookaheads are symbols that the parser uses to ˜look ahead in the input buffer to decide whether or not reduction is to be done. That is, we have to work with items of the form A ± . X ² . The item a is called as an LR(1) item, because the length of the lookahead is one; therefore, an item without a lookahead is one with lookahead length of zero 0, an LR(0) item. In the SLR method, we were working with LR(0) items. Therefore, we define an LR( k ) item to be an item using lookaheads of length k . So, an LR(1) item is comprised of two parts : the LR(0) item and the lookahead associated with the item.

Note  

We conclude that if we work with LR(1) items instead of using LR(0) items, then every state of the parser will correspond to a set of LR(1) items. When the parser looks ahead in the input buffer to decide whether reduction is to be done or not, the information about the terminals will be available in the state of the parser itself, which is not the case with the SLR parser state. Hence, with LR(1), we get a more powerful parser.

Therefore, if we modify the closure and the goto functions to work suitably with the LR(1) items, by allowing them to compute the lookaheads, we can obtain the canonical collection of sets of LR(1). And from this we can obtain the parsing ActionGOTO table. For example, closure( I ), where I is a set of LR(1) items, is computed as follows:

  1. Add every item in I to closure( I ).

  2. Repeat

    • For every item of the form A ± . B ² , a in closure( I ) do

    • For every production B ³ do

    • Add B . ³ , FIRST( ² a ) to closure( I )

/* because the reduction by B ³ generates B preceding ² in the right side of A ± B ² ; and hence, the reduction by B ³ is proper only on those symbols that are in the FIRST( ² ). But if ² derives to an empty string, then a will also follow B, and the lookaheads of B ³ will be FIRST( ² a)
until no new item can be added to closure( I )
For example, consider the following grammar:

goto( I, X ) = closure({ A ± X . ² , a A ± . X ² , a is in I })

That is, to find out goto from I on X , first identify all the items in I with a dot preceding X in the LR(0) section of the item. Then, move the dot in all the selected items one position to the right (i.e., over X ), and then take this new set's closure. For example:

5.4.4 An Algorithm for Finding the Canonical Collection of Sets of LR(1) Items

/* Let C be the canonical collection of sets of LR(1) items. We maintain C new and C old to continue the iterations */
Input : augmented grammar
Output: canonical collection of sets of LR(1) items (i.e., set C )

  1. C old =

  2. add closure({ S 1 . S , $}) to C

  3. while C old ‰  C new do

     temp =  C   new     C   old   C   old  =  C   new  for every  I  in temp do for every  X  in  V     T  (i.e., for every grammar symbol  X  ) do if goto(  I  ,  X  ) is not empty and not in  C   new  , then add goto(  I  ,  X  ) to  C   new  } 
  4. C = C new

For example, consider the following grammar:

The augmented grammar will be:

The canonical collection of sets of LR(1) items are computed as follows:

The transition diagram of the DFA is shown in Figure 5.6.

click to expand
Figure 5.6: Transition diagram for the canonical collection of sets of LR(1) items.

5.4.5 Construction of the ActionGOTO Table for the LR(1) Parser

The following steps will construct the parsing action table for the LR(1) parser:

  1. for every state I i in C do

     for every terminal symbol  a  do if goto(  I  i   ,  a  ) =  I  j   then     make action[  I  i   ,  a  ] =  S  j   /*for shift and enter     into the state j*/ 
  2. for every state I i in C whose underlying set of LR(1) items contains an item of the form A ± ., a do

     make action[  I  i   ,  a  ] =  R  k   /*where  k  is the number of     the production  A      , standing for reduce by  A      */ 
  3. make [ I i , $] = accept if I i contains an item S 1 S ., $

The goto table is simply a mapping of transitions in the DFA on nonterminals. Therefore, it is constructed as follows:

 for every  I  i   in  C  do for every nonterminal  A  do   if goto (  I  i   ,  A  ) =  I  j   then make goto[  I  i   ,  A  ] =  j  

This method is called as CLR(1) or LR(1), and is more powerful than SLR(1). Therefore, the CLR or LR parsing table for the grammar having the following productions is:

Table 5.7: CLR/LR Parsing Action GOTO Table

Action Table

GOTO Table

 

a

b

$

S

A

B

I

R 3

R 4

 

1

2

3

I 1

   

Accept

     

I 2

S 4

         

I 3

 

S 5

       

I 4

R 3

R 3

   

6

 

I 5

R 4

R 4

     

7

I 6

 

S 8

       

I 7

S 9

         

I 8

   

R 1

     

I 9

   

R 2

     

The productions are numbered as follows:

By comparing the SLR(1) parser with the CLR(1) parser, we find that the CLR(1) parser is more powerful. But the CLR(1) has a greater number of states than the SLR(1) parser; hence, its storage requirement is also greater than the SLR(1) parser. Therefore, we can devise a parser that is an intermediate between the two; that is, the parser's power will be in between that of SLR(1) and CLR(1), and its storage requirement will be the same as SLR(1)'s. Such a parser, LALR(1), will be much more useful: since each of its states corresponds to the set of LR(1) items, the information about the lookaheads is available in the state itself, making it more powerful than the SLR parser. But a state of the LALR(1) parser is obtained by combining those states of the CLR parser that have identical LR(0) ( core ) items, but which differ in the lookaheads in their item set representations. Therefore, even if there is no reduce-reduce conflict in the states of the CLR parser that has been combined to form an LALR parser, a conflict may get generated in the state of LALR parser. We may be able to obtain a CLR parsing table without multiple entries for a grammar, but when we construct the LALR parsing table for the same grammar, it might have multiple entries.

5.4.6 Construction of the LALR Parsing Table

The steps in constructing an LALR parsing table are as follows:

  1. Obtain the canonical collection of sets of LR(1) items.

  2. If more than one set of LR(1) items exists in the canonical collection obtained that have identical cores or LR(0)s, but which have different in lookaheads, then combine these sets of LR(1) items to obtain a reduced collection, C 1 , of sets of LR(1) items.

  3. Construct the parsing table by using this reduced collection, as follows.

Construction of the Action Table

  1. for every state I i in C 1 do

     for every terminal symbol  a  do if goto(  I  i   ,  a  ) =  I  j   then     make action[  I  i   ,  a  ] =  S  j   /*for shift and enter     into the state j*/ 
  2. for every state I i in C 1 whose underlying set of LR(1) items contains an item of the form A ± ., a , do

     make action[  I  i   ,  a  ] =  R  k   /*where  k  is the number of the production  A      standing for reduce by  A      */ 
  3. make [ I i , $] = accept if I i contains an item S 1 S ., $

Construction of the Goto Table

The goto table simply maps transitions on nonterminals in the DFA. Therefore, the table is constructed as follows:

 for every  I  i   in  C   1  do for every nonterminal  A  do   if goto(  I  i   ,  A  ) =  I  j   then make goto(  I  i   ,  A  ) =  j  

For example, consider the following grammar:

The augmented grammar is:

The canonical collection of sets of LR(1) items are computed as follows:

We see that the states I 3 and I 6 have identical LR(0) set items that differ only in their lookaheads. The same goes for the pair of states I 4 , I 7 and the pair of states I 8 , I 9 . Hence, we can combine I 3 with I 6 , I 4 with I 7 , and I 8 with I 9 to obtain the reduced collection shown below:

where I 36 stands for combination of I 3 and I 6 , I 47 stands for the combination of I 4 and I 7 , and I 89 stands for the combination of I 8 and I 9 . The transition diagram of the DFA using the reduced collection is shown in Figure 5.7.

click to expand
Figure 5.7: Transition diagram for a DFA using a reduced collection.

Therefore, Table 5.8 shows the LALR parsing table for the grammar having the following productions:

which are numbered as:

Table 5.8: LALR Parsing Table for a DFA Using a Reduced Collection

Action Table

GOTO Table

 

a

b

$

S

A

I

S 36

S 47

 

1

2

I 1

   

Accept

   

I 2

S 36

S 47

   

5

I 36

S 36

S 47

   

89

I 47

R 3

R 3

R 3

   

I 5

   

R 1

   

I 89

R 2

R 2

R 2

   

5.4.7 Parser Conflicts

An LR parser may encounter two types of conflicts: shift-reduce conflicts and reduce-reduce conflicts.

Shift-Reduce Conflict

A shift-reduce ( S-R ) conflict occurs in an SLR parser state if the underlying set of LR(0) item representations contains items of the form depicted in Figure 5.8, and FOLLOW( B ) contains terminal a .


Figure 5.8: LR(0) underlying set representations that can cause SLR parser conflicts.

Similarly, an S-R conflict occurs in a state of the CLR or LALR parser if the underlying set of LR(1) items representation contains items of the form shown in Figure 5.9.


Figure 5.9: LR(1) underlying set representations that can cause CLR/LALR parser conflicts.

Reduce-Reduce Conflict

A reduce-reduce ( R-R ) conflict occurs if the underlying set of LR(0) items representation in a state of an SLR parser contains items of the form shown in Figure 5.10, and FOLLOW( A ) and FOLLOW( B ) are not disjoint sets.


Figure 5.10: LR(0) underlying set representations that can cause an SLR parser reduce-reduce conflict .

Similarly an R-R conflict occurs if the underlying set of LR(1) items representation in a state of a CLR or LALR parser contains items of the form shown in Figure 5.11.


Figure 5.11: LR(1) underlying set representations that can cause an CLR/LALR parser.

If a set of items' representation contains only nonfinal items, then there is no conflict in the corresponding state. (An item in which the dot is in the right-most position, like A XYZ ., is called as a final item; and an item in which the dot is not in the right-most position, like A X.YZ , is a nonfinal item).

Even if there is no R-R conflict in the states of a CLR parser, conflicts may be generated in the state of a LALR parser that is obtained by combining the states of the CLR parser. We combine the states of the CLR parser in order to form an LALR state. The items' lookaheads in the LALR parser state are obtained by combining the lookaheads of the corresponding items in the states of the CLR parser. And since reduction depends on the lookaheads, even if there is no R-R conflict in the states of the CLR parser, a conflict may become generated in the state of the LALR parser as a result of this state combination. For example, consider the sets of LR(1) items that represent the two different states of the CLR(1) parser, as shown in Figure 5.12.

click to expand
Figure 5.12: Sets of LR(1) items represent two different CLR(1) parser states.

There is no R-R conflict in these states. But when we combine these states to form an LALR, the state's set of items representation will be as shown in Figure 5.13.


Figure 5.13: States are combined to form an LALR.

We see that there is an R-R conflict in this state, because the parser will call a reduction by A ± as well as by B ³ on both a and b . If there is a S-R conflict in the CLR(1) states, it will never be reflected in the LALR(1) state obtained by combining the CLR(1) states. For example consider the sets of LR(1) items representing the two different states of the CLR(1) parser as shown in Figure 5.14.

click to expand
Figure 5.14: LR(1) items represent two different states of the CLR(1) parser.

There is no S-R conflict in these states. But when we combine these states, the resulting LALR state set will be as shown in Figure 5.15. There is no S-R conflict in this state, as well.


Figure 5.15: LALR state set resulting from the combination of CLR(1) state sets.

5.4.8 Handling Ambiguous Grammars

Since every ambiguous grammar fails to be LR, they will not belong to either the SLR, CLR, or LALR grammar classes. But some ambiguous grammars are quite useful for specifying languages. Hence, the question is how to deal with these grammars in the framework of LR parsing. For example, the natural grammar that specifies nonparenthesized expressions with + and * operators is:

But this is ambiguous grammar, because the precedence and associations of the operators has not been specified. Even so, we prefer this grammar, because we can easily change the precedence and associations as required, thereby allowing us more flexibility. Similarly, if we use unambiguous grammar instead of the above grammar to specify the same language, it will have the following productions:

This parser will spend a substantial portion its time in carrying out reductions by the unit productions E T and T F . These production are in the grammar to enforce associations and precedence, thereby making the parsing time excessive. With an ambiguous grammar, every LR parser construction method will have conflicts. But these conflicts can be resolved by using the precedence and association information of + and * as per the language's usage. For example, consider the SLR parser construction for the above grammar. The augmented grammar is:

The canonical collection of sets of LR(0) items is shown below:

The transition diagram of the DFA for the augmented grammar is shown in Figure 5.16. The SLR parsing table is shown in Table 5.9.

click to expand
Figure 5.16: Transition diagram for augmented grammar DFA.
Table 5.9: SLR Parsing Table for Augmented Grammar

Action Table

GOTO Table

 

+

*

id

$

E

I

   

S 2

 

1

I 1

S 3

S 4

 

accept

 

I 2

R 3

R 3

 

R 3

 

I 3

   

S 2

 

5

I 4

   

S 2

 

6

I 5

S 3 / R 1

S 4 / R 1

 

R 1

 

I 6

S 3 / R 2

S 4 / R 2

 

R 2

 

We see that there are shift-reduce conflicts in I 5 and I 6 on + as well as *. Therefore, for an input string id + id + id$, when the parser enters into the state I 5 , the parser will be in the configuration:

Hence, the parser can either reduce by E E + E or shift the + onto the stack and enter into the state I 3 . To resolve this conflict, we make use of associations. If we want left-associativity, then a reduction by E E + E is the right choice. Whereas if we want right-associativity, then shift is a right choice.

Similarly, if the input string is id + id * id$ when the parser enters into the state I 5 , it will be in the configuration:

Hence, the parser can either reduce by E E + E or shift the * onto the stack and enter into the state I 4 in order to resolve this conflict. We must make use of precedence if we want a higher precedence for + then the reduction by E E + E . If we want a higher precedence for *, then shift is a right choice.

Similarly if the input string is id * id + id$ when the parser enters into the state I 6 , it will be in the configuration:

Hence, the parser can either reduce by E E * E or shift the + onto the stack and enter into the state I 3 in order to resolve this conflict. We have to make use of precedence if we want a higher precedence for *; then reduction by E E * E is a right choice. Whereas if we want a higher precedence for +, then shift is right choice.

Similarly, if the input string is id * id * id$ when the parser enters into the state I 6, the parser will be in the configuration:

The parser can either reduce by E E * E or shift the * onto the stack and enter into the state I 4 . To resolve this conflict, we have to make use of associations. If we want left-associativity, then a reduction by E E * E is a right choice. If we want right-associativity, then shift is a right choice.

Therefore, for a higher precedence to *, and for left-associativity for both + and *, we get the SLR parsing table shown in Table 5.10.

Table 5.10: SLR Parsing Table Reflects Higher Precedence and Left-Associativity

Action Table

GOTO Table

 

+

*

id

$

E

I

   

S 2

 

1

I 1

S 3

S 4

 

Accept

 

I 2

R 3

R 3

 

R 3

 

I 3

   

S 2

 

5

I 4

   

S 2

 

6

I 5

R 1

S 4

 

R 1

 

I 6

R 2

R 2

 

R 2

 

Therefore, we have a way to deal with ambiguous grammars. We can make use of nonambiguous rules to resolve parsing action conflicts.




Algorithms for Compiler Design
Algorithms for Compiler Design (Electrical and Computer Engineering Series)
ISBN: 1584501006
EAN: 2147483647
Year: 2005
Pages: 108
Authors: O G Kakde

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net