11.5 USING DAG FOR CODE GENERATION


11.4 STRAIGHTFORWARD CODE GENERATION

Given a sequence of three-address statements partitioned into basic blocks, straightforward code generation involves generating code for each three-address statement in turn by taking the advantage of any of the operands of the three-address statements that are in the register, and leaving the computed result in the register as long as possible. We store it only if the register is needed for another computation or just before a procedure call, jump, or labeled statement, such as at the end of a basic block. The reason for this is that after leaving a basic block, we may go to several different blocks, or we may go to one particular block that can be reached from several others. In either case, we cannot assume that a datum used by a block appears in the same register, no matter how the program's control reached that block. Hence, to avoid possible error, our code-generation strategy stores everything across the basic block boundaries.

When generating code by using the above strategy, we need to keep track of what is currently in each register. For this, we maintain what is called a "register descriptor," which is simply a pointer to a list that contains information about what is currently in each of the registers. Initially, all of the registers are empty.

We also need to keep track of the locations for each name ”where the current value of the name can be found at run time. For this, we maintain what is called an "address descriptor" for each name in the block. This information can be stored in the symbol table.

We also need a location to perform the computation specified by each of the three-address statements. For this, we make use of the function getreg(). When called, getreg() returns a location specifying the computation performed by a three-address statement. For example, if x = y op z is performed, getreg() returns a location L where the computation y op z should be performed; and if possible, it returns a register.

Algorithm for the Function Getreg()

What follows is an algorithm for storing and returning the register locations for three-address statements by using the function getreg().

 { For every three-address statement of the form  x  =  y op z  in the basic block do { 
  1. Call getreg() to obtain the location L in which the computation y op z should be performed. /* This requires passing the three-address statement x = y op z as a parameter to getreg(), which can be done by passing the index of this statement in the quadruple array.

  2. Obtain the current location of the operand y by consulting its address descriptor, and if the value of y is currently both in the memory location as well as in the register, then prefer the register. If the value of y is currently not available in L , then generate an instruction MOV y, L (where y as assumed to represent the current location of y ).

  3. Generate the instruction OP z, L, and update the address descriptor of x to indicate that x is now available in L , and if L is in a register, then update its descriptor to indicate that it will contain the run-time value of x .

  4. If the current values of y and /or z are in the register, we have no further uses for them, and they are not live at the end of the block, then alter the register descriptor to indicate that after the execution of the statement x = y op z , those registers will no longer contain y and /or z .

 } Store all the results. } 

The function getreg(), when called upon to return a location where the computation specified by the three-address statement x = y op z should be performed, returns a location L as follows:

  1. First, it searches for a register already containing the name y . If such a register exists, and if y has no further use after the execution of x = y op z , and if it is not live at the end of the block and holds the value of no other name, then return the register for L .

  2. Otherwise, getreg() searches for an empty register; and if an empty register is available, then it returns it for L .

  3. If no empty register exists, and if x has further use in the block, or op is an operator such as indexing that requires a register, then getreg() it finds a suitable, occupied register. The register is emptied by storing its value in the proper memory location M , the address descriptor is updated, the register is returned for L . (The least-recently used strategy can be used to find a suitable, occupied register to be emptied.)

  4. If x is not used in the block or no suitable, occupied register can be found, getreg() selects a memory location of x and returns it for L .

EXAMPLE 11.1
start example

Consider the expression:

The three-address code for this is:

Applying the algorithm above results in Table 11.4.

end example
 
Table 11.4: Computation for the Expression x = ( a + b ) ˆ’ (( c + d ) ˆ’ e )

Statement

L

Instructions Generated

Register Descriptor

Address Descriptor

     

All registers empty

 

t 1 = a + b

R

MOV a , R 0 ADD b, R

R 0 will hold t 1

t 1 is in R

t 2 = c + d

R 1

MOV c, R 1 ADD d, R 1

R 1 will hold t 2

t 2 is in R 1

t 3 = t 2 ˆ’ e

R 1

SUB e, R 1

R 1 will hold t 3

t 3 is in R 1

x = t 1 ˆ’ t 3

R

SUB R 1, R

R 0 will hold x

x is in R

   

MOV R 0, x

 

x is in R 0 and memory

The algorithm makes use of the next-use information of each name in order to make more-informed decisions regarding register allocation. Therefore, it is required to compute the next -use information. If:

  • A statement at the index i in a block assigns a value to name x ,

  • And if a statement at the index j in the same block uses x as an operand,

  • And if the path from the statement at index i to the statement at index j is a path without any intervening assignment to name x , then

we say that the value of x computed by the statement at index i is used in the statement at index j . Hence, the next use of the name x in the statement i is statement j . For each three-address statement i , we need to compute information about those three-address statements in the block that are the next uses of the names coming in statement i . This requires the backward scanning of the basic block, which will allow us to attach to every statement i under consideration the information about those statements that are the next uses of each name in the statement i . The algorithm is as follows:

For each statement i of the form x = y op z do

 {  attach information about the next uses of  x, y  , and  z  to statement  i  set the information for  x  to no next-use /* This information  can be kept into the symbol table */  set the information for  y  and  z  to be the next use  in statement  i  } 

Consider the basic block:

When straightforward code generation is done using the above algorithm, and if only two registers, R 0 and R 1, are available, then the generated code is as shown in Table 11.5.

Table 11.5: Generated Code with Only Two Available Registers, R0 and R1

Statement

L

Instructions Generated

Cost

Register Descriptor

Address Descriptor

       

R 0 and R 1 empty

 

t 1 = a + b

R

MOV a, R
ADD b, R

2 words
2 words

R 0 will hold

t 1 is in t 1 R

t 2 = c + d

R 1

MOV c, R 1
ADD d, R 1

2 words
2 words

R 1 will hold t 2

t 2 is in R 1

t 3 = e ˆ’ t 2

 

MOV R 0, t 1 (generated memory by getreg())

2 words

 

t 1 is in

         

t 3 is in R

 

R

MOV e, R 0SUB R 1, R

2 words
1 word

R 0 will hold t 3
R 1 will be empty because t 2 has no next use

 

x = t 1 ˆ’ t 3

R 1

MOV t 1, R 1 SUB R 0, R 1

2 words
1 word

R 1 will hold x
R 0 will be empty because t 3 has no next use

x is in R 1

   

MOV R 1, x

2 words

 

x is in R 1 and memory

We see that the total length of the instruction sequence generated is 18 memory words. If we rearrange the final computations as:

and then generate the code, we get Table 11.6.

Table 11.6: Generated Code with Rearranged Computations

Statement

L

Instructions Generated

Cost

Register Descriptor

Address Descriptor

       

R 0 and R 1 empty

 

t 2 = c + d

R0

MOV c, R 0 ADD d, R

2 words
2 words

R 0 will hold t 2

t 2 is in R

t 3 = e ˆ’ t 2

R 1

MOV e, R 1SUB R 0, R 1

2 words
1 word

R 1 will hold t 3 R 0 will be empty because t 2 has no next use

t 3 is in R 1

t 1 = a + b

R

MOV a, R 0 ADD b, R

2 words
2 words

R 0 will hold t 1

t 1 is in R

x = t 1 ˆ’ t 3

R 1

SUB R 1, R

1 word

R 0 will hold x R 1 will be empty because t 3 has no next use

x is in R

   

MOV R 0, x

2 words

 

x is in R 0 and memory

Here, the length of the instruction sequence generated is 14 memory words. This indicates that the order of the computation is a deciding factor in the cost of the code generated. In the above example, the cost is reduced when the order t 2- t 3- t 1- t 4 is used, because t 1 gets computed immediately before the statement that computes t 4, which uses t 1 as its left operand. Hence, no intermediate store-and-load is required, as is the case when the order t 1- t 2- t 3- t 4 is used. Good code generation requires rearranging the final computation order, and this can be done conveniently with a DAG representation of a basic block rather than with a linear sequence of three-address statements.




Algorithms for Compiler Design
Algorithms for Compiler Design (Electrical and Computer Engineering Series)
ISBN: 1584501006
EAN: 2147483647
Year: 2005
Pages: 108
Authors: O G Kakde

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net