Section 2.3. Assembly Language Example | The Linux Kernel Primer. A Top-Down Approach for x86 and PowerPC Architectures

2.3. Assembly Language Example

We can now create a simple program to see how the different architectures produce assembly language for the same C code. For this experiment, we use the gcc compiler that came with Red Hat 9 and the gcc cross compiler for PowerPC. We present the C program and then, for comparison, the x86 code and the PPC code.

It might startle you to see how much assembly code is generated with just a few lines of C. Because we are just compiling from C to assembler, we are not linking in any environment code, such as the C runtime libraries or local stack creation/destruction, so the size is much smaller than an actual ELF executable.

Note that with assembler, you are closest to seeing exactly what the processor is fetching from cycle to cycle. Another way to look at it is that you have complete control of your code and the system. It is important to mention that even though instructions are fetched from memory in order, they might not always be executed in exactly the same order read in. Some architectures order load and store operations separately.

Here is the example C code:

 ----------------------------------------------------------------------- count.c 1 int main() 2 { 3  int i,j=0; 4 5  for(i=0;i<8;i++)  6  j=j+i; 7 8  return 0; 9 } -----------------------------------------------------------------------

Line 1

This is the function definition main.

Line 3

This line initializes the local variables i and j to 0.

Line 5

The for loop: While i takes values from 0 to 7, set j equal to j plus i.

Line 8

The return marks the jump back to the calling program.

2.3.1. x86 Assembly Example

Here is the code generated for x86 by entering gcc S count.c on the command line. Upon entering the code, the base of the stack is pointed to by ss:ebp. The code is produced in "AT&T" format, in which registers are prefixed with a % and constants are prefixed with a $. The assembly instruction samples previously provided in this section should have prepared you for this simple program, but one variant of indirect addressing should be discussed before we go further.

When referencing a location in memory (for example, stack), the assembler uses a specific syntax for indexed addressing. By putting a base register in parentheses and an index (or offset) just outside the parentheses, the effective address is found by adding the index to the value in the register. For example, if %ebp was assigned the value 20, the effective address of 8(%ebp) would be (8) + (20)= 12:

 ----------------------------------------------------------------------- count.s 1  .file  "count.c" 2  .version  "01.01" 3  gcc2_compiled.: 4  .text 5  .align 4 6  .globl main 7  .type  main,@function 8 main:   #create a local memory area of 8 bytes for i and j. 9  pushl  %ebp    10  movl  %esp, %ebp   11  subl  $8, %esp     #initialize i (ebp-4) and j (ebp-8) to zero. 12  movl  $0, -8(%ebp)   13  movl  $0, -4(%ebp)   14  .p2align 2    15 .L3:     #This is the for-loop test 16  cmpl  $7, -4(%ebp)    17  jle  .L6    18  jmp  .L4     19  .p2align 2 20 .L6: #This is the body of the for-loop  21  movl  -4(%ebp), %eax    22  leal  -8(%ebp), %edx    23  addl  %eax, (%edx)    24  leal  -4(%ebp), %eax    25  incl  (%eax)     26  jmp  .L3    27  .p2align 2 28 .L4:   #Setup to exit the function 29  movl  $0, %eax   30  leave     31  ret     -----------------------------------------------------------------------

Line 9

Push stack base pointer onto the stack.

Line 10

Move the stack pointer into the base pointer.

Line 11

Get 8 bytes of stack mem starting at ebp.

Line 12

Move 0 into address ebp8 (j).

Line 13

Move 0 into address ebp4 (i).

Line 14

This is an assembler directive that indicates the instruction should be half-word aligned.

Line 15

This is an assembler-created label called .L3.

Line 16

This instruction compares the value of i to 7.

Line 17

Jump to label .L6 if 4(%ebp) is less than or equal to 7.

Line 18

Otherwise, jump to label .L4.

Line 19

Align.

Line 20

Label .L6.

Line 21

Move i into eax.

Line 22

Load the address of j into edx.

Line 23

Add i to the address pointed to by edx (j).

Line 24

Move the new value of i into eax.

Line 25

Increment i.

Line 26

Jump back to the for loop test.

Line 27

Align as described in Line 14 code commentary.

Line 28

Label .L4.

Line 29

Set the return code in eax.

Line 30

Release the local memory area.

Line 31

Pop any variable off stack, pop the return address, and jump back to the caller.

2.3.2. PowerPC Assembly Example

The following is the resulting PPC assembly code for the C program. If you are familiar with assembly language (and acronyms), the function of many PPC instructions is clear. There are, however, several derivative forms of the basic instructions that we must discuss here:

stwu RS, D(RA) (Store Word with Update). This instruction takes the value in (GPR) register RS and stores it into the effective address formed by RA+D. The (GPR) register RA is then updated with this new effective address.
li RT, RS, SI (Load Immediate). This is an extended mnemonic for a fixed-point load instruction. It is equivalent to adding RT, RS, S1, where the sum of (GPR) RS and S1, the 16-bit 2s complement integer is stored in RT. If RS is (GPR) R0, the value SI is stored in RT. Note that the value being only 16 bit has to do with the fact that the opcode, registers, and value must all be encoded into a 32-bit instruction.
lwz RT, D(RA) (Load Word and Zero). This instruction forms an effective address as in stwu and loads a word of data from memory into (GPR) RT. The "and Zero" indicates that the upper 32 bits of the calculated effective address are set to 0 if this is a 64-bit implementation running in 32-bit mode. (See the PowerPC Architecture Book I for more on implementations.)
blr (Branch to Link Register). This instruction is an unconditional branch to the 32-bit address in the link register. When calling a function, the caller puts the return address into the link register. Similar to the x86 ret instruction, blr is the common method of returning from a function.

The following code was generated by entering gcc S count.c on the command line:

 ----------------------------------------------------------------------- countppc.s 1  .file  "count.c" 2  .section  ".text" 3  .align 2 4  .globl main 5  .type  main,@function 6 main: #Create 32 byte memory area from stack space and initialize i and j. 7  stwu 1,-32(1)  #Store stack ptr (r1) 32 bytes into the stack 8  stw 31,28(1)  #Store word r31 into lower end of memory area 9  mr 31,1   #Move contents of r1 into r31 10  li 0,0   #Load 0 into r0 11  stw 0,12(31)  #Store word r0 into effective address 12(r31), var j 12  li 0,0   #Load 0 into r0 13  stw 0,8(31)  #Store word r0 into effective address 8(r31) , var i 14 .L2: #For-loop test 15  lwz 0,8(31)  #Load i into r0 16  cmpwi 0,0,7  #Compare word immediate r0 with integer value 7 17  ble 0,.L5  #Branch if less than or equal to label .L5 18  b .L3   #Branch unconditional to label .L3 19 .L5: #The body of the for-loop 20  lwz 9,12(31)  #Load j into r9 21  lwz 0,8(31)  #Load i into r0 22  add 0,9,0  #Add r0 to r9 and put result in r0 23  stw 0,12(31)  #Store r0 into j 24  lwz 9,8(31)  #load i into r9 25  addi 0,9,1  #Add 1 to r9 and store in r0 26  stw 0,8(31)  #Store r0 into i 27  b .L2 28 .L3: 29  li 0,0   #Load 0 into r0 30  mr 3,0   #move r0 to r3 31  lwz 11,0(1)  #load r1 into r11 32  lwz 31,-4(11)  #Restore r31 33  mr 1,11   #Restore r1 34  blr   #Branch to Link Register contents --------------------------------------------------------------------

Line 7

Store stack ptr (r1) 32 bytes into the stack.

Line 8

Store word r31 into the lower end of the memory area.

Line 9

Move the contents of r1 into r31.

Line 10

Load 0 into r0.

Line 11

Store word r0 into effective address 12(r31), var j.

Line 12

Load 0 into r0.

Line 13

Store word r0 into effective address 8(r31), var i.

Line 14

Label .L2:.

Line 15

Load i into r0.

Line 16

Compare word immediate r0 with integer value 7.

Line 17

Branch to label .L5 if r0 is less than or equal to 7.

Line 18

Branch unconditional to label .L3.

Line 19

Label .L5:.

Line 20

Load j into r9.

Line 21

Load i into r0.

Line 22

Add r0 to r9 and put the result in r0.

Line 23

Store r0 into j.

Line 24

Load i into r9.

Line 25

Add 1 to r9 and store in r0.

Line 26

Store r0 into i.

Line 27

This is an unconditional branch to label .L2.

Line 28

Label .L3:.

Line 29

Load 0 into r0.

Line 30

Move r0 to r3.

Line 31

Load r1 into r11.

Line 32

Restore r31.

Line 33

Restore r1.

Line 34

This is an unconditional branch to the location indicated by Link Register contents.

Contrasting the two assembler files, they have nearly the same number of lines. Upon further inspection, you can see that the RISC (PPC) processor is characteristically using many load and store instructions while the CISC (x86) tends to use the mov instruction more often.