Assembly-Language Functions using the C Calling Convention

You cannot write assembly-language functions without understanding how the computer's stack works. Each computer program that runs uses a region of memory called the stack to enable functions to work properly. Think of a stack as a pile of papers on your desk which can be added to indefinitely. You generally keep the things that you are working on toward the top, and you take things off as you are finished working with them.

Your computer has a stack, too. The computer's stack lives at the very top addresses of memory. You can push values onto the top of the stack through an instruction called pushl, which pushes either a register or memory value onto the top of the stack. Well, we say it's the top, but the "top" of the stack is actually the bottom of the stack's memory. Although this is confusing, the reason for it is that when we think of a stack of anything - dishes, papers, etc. - we think of adding and removing to the top of it. However, in memory the stack starts at the top of memory and grows downward due to architectural considerations. Therefore, when we refer to the "top of the stack" remember it's at the bottom of the stack's memory. You can also pop values off the top using an instruction called popl. This removes the top value from the stack and places it into a register or memory location of your choosing..

When we push a value onto the stack, the top of the stack moves to accomodate the additional value. We can actually continually push values onto the stack and it will keep growing further and further down in memory until we hit our code or data. So how do we know where the current "top" of the stack is? The stack register, %esp, always contains a pointer to the current top of the stack, wherever it is.

Every time we push something onto the stack with pushl, %esp gets subtracted by 4 so that it points to the new top of the stack (remember, each word is four bytes long, and the stack grows downward). If we want to remove something from the stack, we simply use the popl instruction, which adds 4 to %esp and puts the previous top value in whatever register you specified. pushl and popl each take one operand - the register to push onto the stack for pushl, or receive the data that is popped off the stack for popl.

If we simply want to access the value on the top of the stack without removing it, we can simply use the %esp register in indirect addressing mode. For example, the following code moves whatever is at the top of the stack into %eax:

 movl (%esp), %eax

If we were to just do this:

 movl %esp, %eax

then %eax would just hold the pointer to the top of the stack rather than the value at the top. Putting %esp in parenthesis causes the computer to go to indirect addressing mode, and therefore we get the value pointed to by %esp. If we want to access the value right below the top of the stack, we can simply issue this instruction:

 movl 4 (%esp), %eax

This instruction uses the base pointer addressing mode (see the Section called Data Accessing Methods in Chapter 2) which simply adds 4 to %esp before looking up the value being pointed to.

In the C language calling convention, the stack is the key element for implementing a function's local variables, parameters, and return address.

Before executing a function, a program pushes all of the parameters for the function onto the stack in the reverse order that they are documented. Then the program issues a call instruction indicating which function it wishes to start. The call instruction does two things. First it pushes the address of the next instruction, which is the return address, onto the stack. Then it modifies the instruction pointer (%eip) to point to the start of the function. So, at the time the function starts, the stack looks like this (the "top" of the stack is at the bottom on this example):

 Parameter #N ... Parameter 2 Parameter 1 Return Address <--- (%esp)

Each of the parameters of the function have been pushed onto the stack, and finally the return address is there. Now the function itself has some work to do.

The first thing it does is save the current base pointer register, %ebp, by doing pushl %ebp. The base pointer is a special register used for accessing function parameters and local variables. Next, it copies the stack pointer to %ebp by doing movl %esp, %ebp. This allows you to be able to access the function parameters as fixed indexes from the base pointer. You may think that you can use the stack pointer for this. However, during your program you may do other things with the stack such as pushing arguments to other functions.

Copying the stack pointer into the base pointer at the beginning of a function allows you to always know where your parameters are (and as we will see, local variables too), even while you may be pushing things on and off the stack. %ebp will always be where the stack pointer was at the beginning of the function, so it is more or less a constant reference to the stack frame (the stack frame consists of all of the stack variables used within a function, including parameters, local variables, and the return address).

At this point, the stack looks like this:

 Parameter #N   <--- N*4+4(%ebp) ... Parameter 2    <--- 12(%ebp) Parameter 1    <--- 8(%ebp) Return Address <--- 4(%ebp) Old %ebp       <--- (%esp) and (%ebp)

As you can see, each parameter can be accessed using base pointer addressing mode using the %ebp register.

Next, the function reserves space on the stack for any local variables it needs. This is done by simply moving the stack pointer out of the way. Let's say that we are going to need two words of memory to run a function. We can simply move the stack pointer down two words to reserve the space. This is done like this:

 subl $8, %esp

This subtracts 8 from %esp (remember, a word is four bytes long).^[4] This way, we can use the stack for variable storage without worring about clobbering them with pushes that we may make for function calls. Also, since it is allocated on the stack frame for this function call, the variable will only be alive during this function. When we return, the stack frame will go away, and so will these variables. That's why they are called local - they only exist while this function is being called.

Now we have two words for local storage. Our stack now looks like this:

 Parameter #N     <--- N*4+4(%ebp) ... Parameter 2      <--- 12(%ebp) Parameter 1      <--- 8(%ebp) Return Address   <--- 4(%ebp) Old %ebp         <--- (%ebp) Local Variable 1 <--- -4(%ebp) Local Variable 2 <--- -8(%ebp) and (%esp)

So we can now access all of the data we need for this function by using base pointer addressing using different offsets from %ebp. %ebp was made specifically for this purpose, which is why it is called the base pointer. You can use other registers in base pointer addressing mode, but the x86 architecture makes using the %ebp register a lot faster.

Global variables and static variables are accessed just like the memory we have been accessing memory in previous chapters. The only difference between the global and static variables is that static variables are only used by one function, while global variables are used by many functions. Assembly language treats them exactly the same, although most other languages distinguish them.

When a function is done executing, it does three things:

It stores its return value in %eax.
It resets the stack to what it was when it was called (it gets rid of the current stack frame and puts the stack frame of the calling code back into effect).
It returns control back to wherever it was called from. This is done using the ret instruction, which pops whatever value is at the top of the stack, and sets the instruction pointer, %eip, to that value.

So, before a function returns control to the code that called it, it must restore the previous stack frame. Note also that without doing this, ret wouldn't work, because in our current stack frame, the return address is not at the top of the stack. Therefore, before we return, we have to reset the stack pointer %esp and base pointer %ebp to what they were when the function began.

Therefore to return from the function you have to do the following:

 movl %ebp, %esp popl %ebp  ret

At this point, you should consider all local variables to be disposed of. The reason is that after you move the stack pointer back, future stack pushes will likely overwrite everything you put there. Therefore, you should never save the address of a local variable past the life of the function it was created in, or else it will be overwritten after the life of its stack frame ends.

Control has now been handed back to the calling code, which can now examine %eax for the return value. The calling code also needs to pop off all of the parameters it pushed onto the stack in order to get the stack pointer back where it was (you can also simply add 4 * number of parameters to %esp using the addl instruction, if you don't need the values of the parameters anymore).^[5]

Destruction of Registers

When you call a function, you should assume that everything currently in your registers will be wiped out. The only register that is guaranteed to be left with the value it started with are %ebp and a few others (the Linux C calling convention requires functions to preserve the values of %ebx, %edi, and %esi if they are altered - this is not strictly held during this book because these programs are self-contained and not called by outside functions). %ebx also has some other uses in position-independent code, which is not covered in this book. %eax is guaranteed to be overwritten with the return value, and the others likely are. If there are registers you want to save before calling a function, you need to save them by pushing them on the stack before pushing the function's parameters. You can then pop them back off in reverse order after popping off the parameters. Even if you know a function does not overwrite a register you should save it, because future versions of that function may.

Note that in Linux assembly language, functions are

Other languages' calling conventions may be different. For example, other calling conventions may place the burden on the function to save any registers it uses. Be sure to check to make sure the calling conventions of your languages are compatible before trying to mix languages. Or in the case of assembly language, be sure you know how to call the other language's functions.

Extended Specification: Details of the C language calling convention (also known as the ABI, or Application Binary Interface) is available online. We have oversimplified and left out several important pieces to make this simpler for new programmers. For full details, you should check out the documents available at http://www.linuxbase.org/spec/refspecs/ Specifically, you should look for the System V Application Binary Interface - Intel386 Architecture Processor Supplement.

^[4]Just a reminder - the dollar sign in front of the eight indicates immediate mode addressing, meaning that we subtract the number 8 itself from %esp rather than the value at address 8.

^[5]This is not always strictly needed unless you are saving registers on the stack before a function call. The base pointer keeps the stack frame in a reasonably consistent state. However, it is still a good idea, and is absolutely necessary if you are temporarily saving registers on the stack..