6.6 REPRESENTING THREE-ADDRESS STATEMENTS


6.5 INTERMEDIATE CODE GENERATION

While translating a source program into a functionally equivalent object code representation, a parser may first generate an intermediate representation. This makes retargeting of the code possible and allows some optimizations to be carried out that would otherwise not be possible. The following are commonly used intermediate representations:

  1. Postfix notation

  2. Syntax tree

  3. Three-address code

Postfix Notation

In postfix notation, the operator follows the operand. For example, in the expression ( a ˆ’ b ) * ( c + d ) + ( a ˆ’ b ), the postfix representation is:

Syntax Tree

The syntax tree is nothing more than a condensed form of the parse tree. The operator and keyword nodes of the parse tree (Figure 6.5) are moved to their parent, and a chain of single productions is replaced by single link (Figure 6.6).

click to expand
Figure 6.5: Parse tree for the string id+id*id.

click to expand
Figure 6.6: Syntax tree for id+id*id.

Three-Address Code

Three address code is a sequence of statements of the form x = y op z . Since a statement involves no more than three references, it is called a "three-address statement," and a sequence of such statements is referred to as three-address code. For example, the three-address code for the expression a + b * c + d is:

Sometimes a statement might contain less than three references; but it is still called a three-address statement. The following are the three-address statements used to represent various programming language constructs:

  • Used for representing arithmetic expressions:

  • Used for representing Boolean expressions:

  • Used for representing array references and dereferencing operations:

  • Used for representing a procedure call:



6.6 REPRESENTING THREE-ADDRESS STATEMENTS

Records with fields for the operators and operands can be used to represent three-address statements. It is possible to use a record structure with four fields: the first holds the operator, the next two hold the operand1 and operand2, respectively, and the last one holds the result. This representation of a three-address statement is called a "quadruple representation".

Quadruple Representation

Using quadruple representation, the three-address statement x = y op z is represented by placing op in the operator field, y in the operand1 field, z in the operand2 field, and x in the result field. The statement x = op y , where op is a unary operator, is represented by placing op in the operator field, y in the operand1 field, and x in the result field; the operand2 field is not used. A statement like param t 1 is represented by placing param in the operator field and t 1 in the operand1 field; neither operand2 nor the result field are used. Unconditional and conditional jump statements are represented by placing the target labels in the result field. For example, a quadruple representation of the three-address code for the statement x = ( a + b ) * - c/d is shown in Table 6.1. The numbers in parentheses represent the pointers to the triple structure.

Table 6.1: Quadruple Representation of x = ( a + b ) * ˆ’ c/d
 

Operator

Operand1

Operand2

Result

(1)

+

a

b

t 1

(2)

ˆ’

c

 

t 2

(3)

*

t 1

t 2

t 3

(4)

/

t 3

d

t 4

(5)

=

t 4

 

x

Triple Representation

The contents of the operand1, operand2, and result fields are therefore normally the pointers to the symbol records for the names represented by these fields. Hence, it becomes necessary to enter temporary names into the symbol table as they are created. This can be avoided by using the position of the statement to refer to a temporary value. If this is done, then a record structure with three fields is enough to represent the three-address statements: the first holds the operator value, and the next two holding values for the operand1 and operand2, respectively. Such a representation is called a "triple representation". The contents of the operand1 and operand2 fields are either pointers to the symbol table records, or they are pointers to records (for temporary names ) within the triple representation itself. For example, a triple representation of the three-address code for the statement x = ( a + b )* ˆ’ c/d is shown in Table 6.2.

Table 6.2: Triple Representation of x = ( a + b ) * ˆ’ c/d
 

Operator

Operand1

Operand2

(1)

+

a

b

(2)

ˆ’

c

 

(3)

*

(1)

(2)

(4)

/

(3)

d

(5)

=

x

(4)

Indirect Triple Representation

Another representation uses an additional array to list the pointers to the triples in the desired order. This is called an indirect triple representation. For example, a triple representation of the three-address code for the statement x = ( a + b )* ˆ’ c/d is shown in Table 6.3.

Table 6.3: Indirect Triple Representation of x = ( a + b ) * ˆ’ c/d
   

Operator

Operand1

Operand2

(1)

(1)

+

a

b

(2)

(2)

ˆ’

c

 

(3)

(3)

*

(1)

(2)

(4)

(4)

/

(3)

d

(5)

(5)

=

x

(4)

Comparison

By using quadruples, we can move a statement that computes A without requiring any changes in the statements using A , because the result field is explicit. However, in a triple representation, if we want to move a statement that defines a temporary value, then we must change all of the pointers in the operand1 and operand2 fields of the records in which this temporary value is used. Thus, quadruple representation is easier to work with when using an optimizing compiler, which entails a lot of code movement. Indirect triple representation presents no such problems, because a separate list of pointers to the triple structure is maintained . When statements are moved, this list is reordered, and no change in the triple structure is necessary; hence, the utility of indirect triples is almost the same as that of quadruples.