6.8 IMPLEMENTATION OF INCREMENT AND DECREMENT OPERATORS | Algorithms for Compiler Design (Electrical and Computer Engineering Series)

6.7 SYNTAX-DIRECTED TRANSLATION SCHEMES TO SPECIFY THE TRANSLATION OF VARIOUS PROGRAMMING LANGUAGE CONSTRUCTS

Specifying the translation of the construct involves specifying the construct's syntactic structure, using CFG, and associating suitable semantic actions with the productions of the CFG. For example, if we want to specify the translation of the arithmetic expressions into postfix notation so they can be carried along with the parsing, and if the parsing method is LR , then first we write a grammar that specifies the syntactic structure of the arithmetic expressions. We then associate suitable semantic actions with the productions of the grammar. The expressions used for these associations are covered below.

6.7.1 Arithmetic Expressions

The grammar that specifies the syntactic structure of the expressions in a typical programming language will have the following productions:

Translating arithmetic expressions involves generating code to evaluate the given expression. Hence, for an expression a + b * c , the three-address code that is required to be generated is:

where t 1 and t 2 are pointers to the symbol table records that contain compiler-generated temporaries, and a , b , and c are pointers to the symbol table records that contain the programmer-defined names a , b , and c , respectively. Syntax-directed translation schemes to specify the translation of an expression into postfix notation are as follows :

where code is a string value attribute used to hold the postfix expression, and place is pointer value attribute used to link the pointer to the symbol record that contains the name of the identifier. The label getname returns the name of the identifier from the symbol table record that is pointed to by ptr, and concate( s ₁ , s ₂ , s ₃ ) returns the concatenation of the strings s ₁ , s ₂ , and s ₃ , respectively. For the string a + b * c , the values of the attributes at the parse tree node are shown in Figure 6.7.

Figure 6.7: Values of attributes at the parse tree node for the string a + b * c .

id.place = addr(symtab rec of a )

Syntax-directed translation schemes to specify the translation of an expression into the syntax tree are as follows:

where ptr is pointer value attribute used to link the pointer to a node in the syntax tree, and place is pointer value attribute used to link the pointer to the symbol record that contains the name of the identifier. The mkleaf generates leaf nodes, and mknode generates intermediate nodes.

For the string a + b * c , the values of the attributes at the parse tree nodes are shown in Figure 6.8.

Figure 6.8: Values of the attributes at the parse tree nodes for a + b * c , id.place = addr(symtab rec of a).

id.place = addr(sumtab rec of a )

Syntax-directed translation schemes specify the translation of an expression into three-address code, as follows:

where ptr is a pointer value attribute used to link the pointer to the symbol record that contains the name of the identifier, mkleaf generates leafnodes, and mknode generates intermediate nodes. For the string a + b * c , the values of the attributes at the parse tree nodes are shown in Figure 6.9.

Figure 6.9: Values of the attributes at the parse tree nodes for a + b * c, id.place = addr(sumtab rec of a).

6.7.2 Boolean Expressions

One way of translating a Boolean expression is to encode the expression's true and false values as the integers one and zero, respectively. The code to evaluate the value of the expression in some temporary is generated as shown below:

  E     E  ₁ relop  E  ₂ {  t  1 = gentemp();              gencode(if  E  ₁ .place relop.val  E  ₂ .place              goto(nextquad + 3));              gencode(  t  1 = 0);              gencode(goto(nextquad+2))              gencode(  t  1 = 1)}  E  .place =  t  1;           }

where nextquad keeps track of the index in the code array. The next statement will be inserted by the gencode procedure, and will update the value of nextquad. The following translation scheme:

translates the expression a < b to the following three-address code:

Similarly, a Boolean expression formed by using logical operators involves generating code to evaluate those operators in some temporary form, as shown below:

  E     E  1 lop  E  2    {  t  1 = gentemp();      gencode (  t  1 =  E  1.place lop.val  E  2.place);  E  .place =  t  1;    }  E    not  E  1    {  t  1 = gentemp();      gencode (  t  1 = not  E  1.place)  E  .place =  t  1    } lop   and { lop.val = and} lop   or { lop.val = or}

The translation scheme above translates the expressions a < b and c > d to the following three-address code:

Another way to translate a Boolean expression is to represent its value by a position in the three-address code sequence. For example, if we point to the statement labeled L 1, then the value of the expression is true (1); whereas if we point to the statement labeled L 2, then the value of the expression is false (0). In this case, the use of a temporary to hold either a one or zero, depending upon the true of false value of the expression, becomes redundant. This also makes it possible to decide the value of the expression without evaluating it completely. This is called a "short circuit" or "jumping the code". To discover the true/false value of the expression a < b or c > d , it is not necessary to completely evaluate the expression; if a < b is true, then the entire expression will be true. Similarly to discover the true/false value of the expression a < b and c > d , it is not necessary to completely evaluate the expression, because if a < b is false, then the entire expression will be false.

Tip

Therefore a Boolean expression can be translated into two to three address statements, a conditional jump, and an unconditional jump. But the targets of these jumps are known at the time of translating a Boolean expression; hence, these jumps are generated without their targets, which are filled in later on.

Therefore, we must remember the indices of these jumps in the code array by using suitable attributes of E . For this, we use two pointer value attributes: E .true and E .false. The attribute E .true will hold the pointer to the list that contains the index of the conditional jump in the code array, whereas the attribute E .false will hold the pointer to the list that contains the index of the unconditional jump. The translation scheme for the Boolean expression that uses relational operators is as follows:

  E   E  ₁ relop  E  ₂ {  E  .true = mklist(nextquad);  E  .false = mklist(nextquad + l);      gencode (if  E  ₁ .place relop.val  E  ₂ .place goto);      gencode (goto_);    }

where mklist(ind) is a procedure that creates a list containing ind and returns a pointer to the created list.

The above translation scheme translates the expression a < b to the following three address code:

6.7.3 Short-Circuit Code for Logical Expressions

There are several methods to adequately handle the various elements of Boolean operators. These are covered by type below.

AND

Logical expressions that use the ˜and operator are expressions defined by the production E ’ E 1 and E 2. Generating the short-circuit code for these logical expressions involves setting the true value of the first expression, E 1, to the start of the second expression, E 2, in the code array. We make the true value of E the same as the true value of expression E 2; and we make the false value of E the same as the false values of both E 1 and E 2. This requires remembering where E 2 starts in the code array index, which means we must provision the memory of the nextquad value just before E 2 is processed . This can accomplished by introducing a nullable nonterminal M before E 2 in the above production, providing for a reduction by M ’ ˆˆ just before the processing of E 2. Hence, we can get a semantic action associated with this production and executed at this point. We therefore have a method for remembering the value of nextquad just before the E 2 code is generated.

  E     E  ₁ and  M E  ₂ {        backpatch(  E  1.true,  M  .quad);  E  .true =  E  2.true;  E  .false = merge(E1.false, E2.false);                        }  M      {  M  .quad = nextquad; }

where backpatch(ptr, L ) is a procedure that takes a pointer ptr to a list containing indices of the code array and fills the target of the statements at these indices in the code array by L .

OR

For an expression using the ˜or operator-that is, an expression defined by the production E ’ E 1 or E 2 ”generating the short-circuit code involves setting the false value of the first expression, E 1, to the start of E 2 in the code array, and making the false value of E the same as the false value of E 2. The true value of E is assigned the same true value as both E 1 and E 2. This requires remembering where E 2 starts in the code array index, which requires making a provision for remembering the value of nextquad just before the expression E 2 is processed. This can achieved by introducing a nullable nonterminal M before E 2 in the above production, providing for a reduction by M ’ ˆˆ just before the processing of E 2. Hence, we obtain a semantic action that is associated with this production and executed at this point; therefore, we have provisioned the recall of the value of nextquad just before the E 2 code is generated.

  E     E  1 or  M E  2              {    backpatch(  E  1.false,  M  .quad);  E  .false =  E  2.false;  E  .true = merge(  E  1.true,  E  2.true);                             }  M      {  M  .quad = nextquad; }

NOT

For the logical expression using the ˜not operator, that is, one defined by the production E ’ not E 1, generating the short-circuit code involves making the false value of the expression E the same as the true value of E 1. And the true value of E is assigned the false value of E 1.

  E    not  E  1   {  E  .true =  E  1.false  E  .false =  E  1.true              }

The above translation scheme translates the expression a < b and c > d to the following three-address code:

For example, consider the following Boolean expression:

When the above translation scheme is used to translate this construct, the three-address code generated for it is as shown below, and the translation scheme is shown in Figure 6.10.

Figure 6.10: Translation scheme for a Boolean expression containing and, not, and or.

IF-THEN-ELSE

Since an if-then-else statement is composed of three components ”a boolean expression E , a statement S 1 that is to be executed when E is true, and a statement S 2 that is to be executed when E is false ”the translation of if-then-else involves making a provision for transferring control to the start of S 1 if E is true, for transferring control to the start of S 2 if E is false, and for transferring control to the next statement after the execution of S 1 and S 2 is over. This requires remembering where S 1 starts in the index of the code array as well as remembering where S 2 starts in the index of the code array.

This is achieved by introducing a nullable nonterminal M 1 before the S 1 and a nullable nonterminal M 2 before the S 2 in the above production, providing for the reduction by M 1 ’ ˆˆ just before processing S 1. Hence, we get a semantic action associated with this production and executed at this point, which enables the recall of the value of nextquad just before generating S 1 code. Similarly, it provides for the reduction by M 2 ’ ˆˆ just before processing S 2, and we get a semantic action associated with production executed at this point, enabling the recall of the value of nextquad just before generating S 2 code.

In addition, an unconditional jump is required at the end of S 1 in order to transfer control to the statement that follows the if-then-else statement. To generate this unconditional jump, we add a nullable nonterminal N after S 1 to the production and associate a semantic action with the production N ’ ˆˆ , which takes care of generating this unconditional jump, as shown in Figure 6.11.

  S    if  E  then  M  1  S  1  N  else  M  2  S  2 {                           backpatch (  E  .true,  M  1.quad)                           backpatch (  E  .false,  M  2.quad)  S  .next:                               = merge (  S  1.next,  S  2.next,  N  .next)                           }  M  1     {  M  1.quad = nextquad;}  M  2     {  M  2.quad = nextquad}  N      {  N  .next = mklist (nextquad);                     gencode (goto...);        }

Figure 6.11: The addition of the nullable nonterminal N facilitates an unconditional jump.

Hence, for the statement if a < b then x = y + z else p = q + r , the three-address code that is required to be generated is:

IF-THEN

Since an if-then statement is comprised of two components, a Boolean expression E and an S 1 statement that will be executed when E is true, the translation of if-then involves making a provision for transferring control to the start of S 1 code if E is either true or false, and a provision is made for transferring control to the next statement after the execution of S 1 is over. This requires recalling the index of the start of S 1 in the code array, and can be achieved by introducing a nullable nonterminal M before S 1 in the production. This will provide for a reduction by M ’ ˆˆ , just before the processing of S 1. Hence, we can get a semantic action associated with this production and executed at this point, which makes a provisioning the recall of for remembering the value of nextquad just before generating code of S 1 code is generated, as shown in Figure 6.12 below:

  S    if  E  then  M S  1     {                                  backpatch (  E  .true,  M  .quad);  S  .next = merge(  E  .false,  S  1.next)                        }  M      {  M  .quad = nextquad; }

Figure 6.12: A nullable nonterminal M provisions the translation of if-then.

Hence, for the statement if a < b then x = y + z , the three-address code that is required to be generated is:

WHILE

Since a while statement has two components, a Boolean expression E and a statement S 1, which is the statement to be executed repeatedly as long as E is true, the translation of while involves provisioning the transfer of control to the start of S 1 code if E is true. The expression must be tested again after S 1 execution is over, control must be transferred to the next statement if E is false. This requires remembering the index in the code array where S 1 code starts as well as where the E code starts. This can be achieved by introducing a nullable nonterminal M 2 before S 1 in the production. This will provide for the reduction by M 2 ’ ˆˆ just before the processing of S 1. Hence, a semantic action is associated with this production and is executed at this point, enabling the recall of the value of nextquad just before generating S code. Similarly, introducing a nullable nonterminal M 1 before E will provide for the reduction by M 1 ’ ˆˆ just before the processing of E . Hence, a semantic action is now associated with this production and is executed at this point, provisioning the recall of the value of nextquad just before E code is generated, as shown in Figure 6.13.

  S    while  M  1  E  do  M  2  S  1  {                        backpatch (  E  .true,  M  2.quad)                        backpatch (  S  1.next,  M  1.quad)  S  .next =  E  .false                        gencode (goto(  M  1.quad))                     }  M  1   {  M  1.quad = nextquad; }  M  2   {  M  2.quad = nextquad; }

Figure 6.13: The translation of the Boolean while statement is facilitated by a nullable nonterminal M.

Hence, for the statement while a < b do x = y + z , the three-address code that is required to be generated is:

DO-WHILE

Since a do-while statement is comprised of two components, a Boolean expression E and an S 1 statement that is executed repeatedly as long as E is true (as well as the test for whether E is true or false at the end of S 1 execution), translation involves provisioning the transfer of control to test the expression after the execution of S 1 is over. Control must also be transferred to the start of S 1 code if E is true, and conversely to the next statement if E is false.

This requires recalling the S 1 start index in the code array as well as the E start index. We introduce a nullable nonterminal M 1 before S 1 in the production, providing for the reduction by M1 ’ ˆˆ just before the processing of S 1. Hence, a semantic action is now associated with this production and is executed at this point, provisioning the recall of the value of nextquad just before S 1 code generates. Similarly, introducing a nullable nonterminal M 2 before E will provide for the reduction by M 2 ’ ˆˆ just before the processing of E . We then have a semantic action associated with this production and executed at this point, and which provisions the recall of the value of nextquad just before E code generates, as shown in Figure 6.14.

  S    do  M  1  S  1 while  M  2  E  {                                 backpatch (  E  .true,  M  1.quad)                                 backpatch (  S  1.next,  M  2.quad)  S  .next =  F  .false                         }  M  1     {  M  1.quad = nextquad; }  M  2     {  M  2.quad = nextquad; }

Figure 6.14: Translation of the Boolean do-while.

Hence, for a statement do x = y + z while a < b , the three-address code that is required to be generated is:

REPEAT-UNTIL

Since a repeat-until statement has two components, a Boolean expression E and an S 1 statement that is executed repeatedly until E becomes true (as well as the test of whether E is true or false at the end of S 1), the translation of repeat-until involves provisioning transfer of control to a test of the expression after the execution of S 1 is over. We must also engineer a transfer a control to the start code of S 1 if E is false and to the next statement if E is true.

This requires recalling the index in the code array where S 1 code starts as well as the index in the code array where E code starts. We achieve this by introducing a nullable nonterminal M 1 before S 1 in the production. This will provide for the reduction by M ₁ ’ ˆˆ , just before the processing of S 1. Hence, we can get a semantic action that is associated with this production and is executed at this point. This makes a provision for remembering the value of nextquad just before S code generates, and introduces a nullable non-terminal M 2 before E . This will provide for the reduction by M ₂ ’ ˆˆ , just before the processing of E . Now we can get a semantic action associated with this production and executed at this point, and which provisions the recall of the value of nextquad just before E code generates, as shown in Figure 6.15.

  S    repeat  M  1  S  1           until  M  2  E  {                           backpatch (  E  .false,  M  1.quad)                           backpatch (  S  1.next,  M  2.quad)  S  .next =  E  .true                         }  M  1     {  M  1.quad = nextquad; }  M  2     {  M  2.quad = nextquad; }

Figure 6.15: Translation of Boolean repeat-until.

Hence, for the Boolean statement repeat x = y + z until a < b , the three-address code that is required to be generated is:

FOR

A for statement is composed of four components: an expression E 1, which is used to initialize the iteration variable; an expression E 2, which is a Boolean expression used to test whether or not the value of the iteration variable exceeds the final value; an expression E 3, which is used to specify the step by which the value of the iteration variable is to be incremented or decremented; and an S 1 statement, which is the statement to be executed as long as the value of the iteration variable is less than or equal to the final value. Hence, the translation of a for statement involves provisioning the transfer a control to the start of S 1 code if E 2 is true, transferring control to the start of E 3 code after the execution of S 1 is over, transferring control to the start of E 2 code after E 3 code is executed, and transferring control to the next statement if E 2 is false, as shown in Figure 6.16.

  S    for (  E  1;  M  1  E  2;  M  2  E  3)  M  3  S  1                  {                                  backpatch (  E  2.true,  M  3.quad)                                  backpatch (  M  3.next,  M  1.quad)                                  backpatch (  S  1.next,  M  2.quad)                                  gencode (goto(  M  2.quad))  S  .next =  E  2.false                  }  M  1     {  M  1.quad = nextquad; }  M  2     {  M  2.quad = nextquad; }  M  3     {  M  3.next: = mklist (nextquad)                     gencode (goto...)  M  3.quad = nextquad;                   }

Figure 6.16: Handling the translation of the Boolean for.

Hence, for a statement for( i = 1; i < = 20; i + +) x = y + z , the three-address code that is required to be generated is: