2.3. Assembly Language ExampleWe can now create a simple program to see how the different architectures produce assembly language for the same C code. For this experiment, we use the gcc compiler that came with Red Hat 9 and the gcc cross compiler for PowerPC. We present the C program and then, for comparison, the x86 code and the PPC code. It might startle you to see how much assembly code is generated with just a few lines of C. Because we are just compiling from C to assembler, we are not linking in any environment code, such as the C runtime libraries or local stack creation/destruction, so the size is much smaller than an actual ELF executable. Note that with assembler, you are closest to seeing exactly what the processor is fetching from cycle to cycle. Another way to look at it is that you have complete control of your code and the system. It is important to mention that even though instructions are fetched from memory in order, they might not always be executed in exactly the same order read in. Some architectures order load and store operations separately. Here is the example C code: ----------------------------------------------------------------------- count.c 1 int main() 2 { 3 int i,j=0; 4 5 for(i=0;i<8;i++) 6 j=j+i; 7 8 return 0; 9 } ----------------------------------------------------------------------- Line 1This is the function definition main. Line 3This line initializes the local variables i and j to 0. Line 5The for loop: While i takes values from 0 to 7, set j equal to j plus i. Line 8The return marks the jump back to the calling program. 2.3.1. x86 Assembly ExampleHere is the code generated for x86 by entering gcc S count.c on the command line. Upon entering the code, the base of the stack is pointed to by ss:ebp. The code is produced in "AT&T" format, in which registers are prefixed with a % and constants are prefixed with a $. The assembly instruction samples previously provided in this section should have prepared you for this simple program, but one variant of indirect addressing should be discussed before we go further. When referencing a location in memory (for example, stack), the assembler uses a specific syntax for indexed addressing. By putting a base register in parentheses and an index (or offset) just outside the parentheses, the effective address is found by adding the index to the value in the register. For example, if %ebp was assigned the value 20, the effective address of 8(%ebp) would be (8) + (20)= 12: ----------------------------------------------------------------------- count.s 1 .file "count.c" 2 .version "01.01" 3 gcc2_compiled.: 4 .text 5 .align 4 6 .globl main 7 .type main,@function 8 main: #create a local memory area of 8 bytes for i and j. 9 pushl %ebp 10 movl %esp, %ebp 11 subl $8, %esp #initialize i (ebp-4) and j (ebp-8) to zero. 12 movl $0, -8(%ebp) 13 movl $0, -4(%ebp) 14 .p2align 2 15 .L3: #This is the for-loop test 16 cmpl $7, -4(%ebp) 17 jle .L6 18 jmp .L4 19 .p2align 2 20 .L6: #This is the body of the for-loop 21 movl -4(%ebp), %eax 22 leal -8(%ebp), %edx 23 addl %eax, (%edx) 24 leal -4(%ebp), %eax 25 incl (%eax) 26 jmp .L3 27 .p2align 2 28 .L4: #Setup to exit the function 29 movl $0, %eax 30 leave 31 ret ----------------------------------------------------------------------- Line 9Push stack base pointer onto the stack. Line 10Move the stack pointer into the base pointer. Line 11Get 8 bytes of stack mem starting at ebp. Line 12Move 0 into address ebp8 (j). Line 13Move 0 into address ebp4 (i). Line 14This is an assembler directive that indicates the instruction should be half-word aligned. Line 15This is an assembler-created label called .L3. Line 16This instruction compares the value of i to 7. Line 17Jump to label .L6 if 4(%ebp) is less than or equal to 7. Line 18Otherwise, jump to label .L4. Line 19Align. Line 20Label .L6. Line 21Move i into eax. Line 22Load the address of j into edx. Line 23Add i to the address pointed to by edx (j). Line 24Move the new value of i into eax. Line 25Increment i. Line 26Jump back to the for loop test. Line 27Align as described in Line 14 code commentary. Line 28Label .L4. Line 29Set the return code in eax. Line 30Release the local memory area. Line 31Pop any variable off stack, pop the return address, and jump back to the caller. 2.3.2. PowerPC Assembly ExampleThe following is the resulting PPC assembly code for the C program. If you are familiar with assembly language (and acronyms), the function of many PPC instructions is clear. There are, however, several derivative forms of the basic instructions that we must discuss here:
The following code was generated by entering gcc S count.c on the command line: ----------------------------------------------------------------------- countppc.s 1 .file "count.c" 2 .section ".text" 3 .align 2 4 .globl main 5 .type main,@function 6 main: #Create 32 byte memory area from stack space and initialize i and j. 7 stwu 1,-32(1) #Store stack ptr (r1) 32 bytes into the stack 8 stw 31,28(1) #Store word r31 into lower end of memory area 9 mr 31,1 #Move contents of r1 into r31 10 li 0,0 #Load 0 into r0 11 stw 0,12(31) #Store word r0 into effective address 12(r31), var j 12 li 0,0 #Load 0 into r0 13 stw 0,8(31) #Store word r0 into effective address 8(r31) , var i 14 .L2: #For-loop test 15 lwz 0,8(31) #Load i into r0 16 cmpwi 0,0,7 #Compare word immediate r0 with integer value 7 17 ble 0,.L5 #Branch if less than or equal to label .L5 18 b .L3 #Branch unconditional to label .L3 19 .L5: #The body of the for-loop 20 lwz 9,12(31) #Load j into r9 21 lwz 0,8(31) #Load i into r0 22 add 0,9,0 #Add r0 to r9 and put result in r0 23 stw 0,12(31) #Store r0 into j 24 lwz 9,8(31) #load i into r9 25 addi 0,9,1 #Add 1 to r9 and store in r0 26 stw 0,8(31) #Store r0 into i 27 b .L2 28 .L3: 29 li 0,0 #Load 0 into r0 30 mr 3,0 #move r0 to r3 31 lwz 11,0(1) #load r1 into r11 32 lwz 31,-4(11) #Restore r31 33 mr 1,11 #Restore r1 34 blr #Branch to Link Register contents -------------------------------------------------------------------- Line 7Store stack ptr (r1) 32 bytes into the stack. Line 8Store word r31 into the lower end of the memory area. Line 9Move the contents of r1 into r31. Line 10Load 0 into r0. Line 11Store word r0 into effective address 12(r31), var j. Line 12Load 0 into r0. Line 13Store word r0 into effective address 8(r31), var i. Line 14Label .L2:. Line 15Load i into r0. Line 16Compare word immediate r0 with integer value 7. Line 17Branch to label .L5 if r0 is less than or equal to 7. Line 18Branch unconditional to label .L3. Line 19Label .L5:. Line 20Load j into r9. Line 21Load i into r0. Line 22Add r0 to r9 and put the result in r0. Line 23Store r0 into j. Line 24Load i into r9. Line 25Add 1 to r9 and store in r0. Line 26Store r0 into i. Line 27This is an unconditional branch to label .L2. Line 28Label .L3:. Line 29Load 0 into r0. Line 30Move r0 to r3. Line 31Load r1 into r11. Line 32Restore r31. Line 33Restore r1. Line 34This is an unconditional branch to the location indicated by Link Register contents. Contrasting the two assembler files, they have nearly the same number of lines. Upon further inspection, you can see that the RISC (PPC) processor is characteristically using many load and store instructions while the CISC (x86) tends to use the mov instruction more often. |