The Stack | Hacking Ubuntu: Serious Hacks Mods and Customizations (ExtremeTech)

As discussed in Chapter 1, the stack is a LIFO data structure. Much like a stack of plates in a cafeteria, the last element placed on the stack is the first element that must be removed. The boundary of the stack is defined by the extended stack pointer (ESP) register, which points to the top of the stack. Stack-specific instructions, PUSH and POP , use ESP to know where the stack is in memory. In most architectures, especially IA32, on which this chapter is focused, ESP points to the last address used by the stack. In other implementations , it points to the first free address.

Data is placed onto the stack using the PUSH instruction; it is removed from the stack using the POP instruction. These instructions are highly optimized and efficient at moving data onto and off of the stack. Let's execute two PUSH instructions and see how the stack changes.

 PUSH 1 PUSH ADDR VAR

These two instructions will first place the value 1 on the stack, then place the address of variable VAR on top of it. The stack will look like that shown in Figure 2.1.

Figure 2.1: PUSHing values onto the stack

The ESP register will point to the top of the stack, address 643410h . Values are pushed onto the stack in the order of execution, so we have the value 1 pushed on first, and then the address of variable VAR . When a PUSH instruction is executed, ESP is decremented by four, and the dword is written to the new address stored in the ESP register.

Once we have put something on the stack, inevitably, we will want to retrieve itthis is done with the POP instruction. Using the same example, let's retrieve our data and address from the stack.

 POP EAX POP EBX

First, we load the value at the top of the stack (where ESP is pointing) into EAX . Next, we repeat the POP instruction, but copy the data into EBX . The stack now looks like that shown in Figure 2.2.

Figure 2.2: POPing values from the stack

As you may have already guessed, the POP instruction only moves ESP down address spaceit does not write or erase data from the stack. Rather, POP writes data to the operand, in this case first writing the address of variable VAR to EAX and then writing the value 1 to EBX .

Another relevant register to the stack is EBP . The EBP register is usually used to calculate an address relative to another address, sometimes called a frame pointer . Although it can be used as a general-purpose register, EBP has historically been used for working with the stack. For example, the following instruction makes use of EBP as an index:

 MOV EAX,[EBP+10h]

This instruction will move a dword from 16 bytes down the stack (remember, the stack grows downward) into EAX .

Functions and the Stack

The stack's primary purpose is to make the use of functions more efficient. From a low-level perspective, a function alters the flow of control of a program, so that an instruction or group of instructions can be executed independently from the rest of the program. More important, when a function has completed executing its instructions, it returns control to the original function caller. This concept of functions is most efficiently implemented with the use of the stack.

Let's take a look at a simple C function and how the stack is used by the function.

 void function(int a, int b){      int array[5]; }     main() {  function(1,2);      printf("This is where the return address points"); }

In this example, instructions in main are executed until a function call is encountered . The consecutive execution of the program now needs to be interrupted , and the instructions in function need to be executed. The first step is to push the arguments for function , a and b, backwards onto the stack. When the arguments are placed onto the stack, the function is called, placing the return address, or RET , onto the stack. RET is the address stored in the instruction pointer ( EIP ) at the time function is called. RET is the location at which to continue execution when the function has completed, so the rest of the program can execute. In this example, the address of the printf("This is where the return address points"); instruction will be pushed onto the stack.

Before any function instructions can be executed, the prolog is executed. In essence, the prolog stores some values onto the stack so that the function can execute cleanly. The current value of EBP is pushed onto the stack, because the value of EBP must be changed in order to reference values on the stack. When the function has completed, we will need this stored value of EBP in order to calculate address locations in main . Once EBP is stored on the stack, we are free to copy the current stack pointer ( ESP ) into EBP . Now we can easily reference addresses local to the stack.

The last thing the prolog does is to calculate the address space required for the variables local to function and reserve this space on the stack. Subtracting the size of the variables from ESP reserves the required space. Finally, the variables local to function , in this case simply array , are pushed onto the stack. Figure 2.3 represents how the stack looks at this point.

Figure 2.3: Visual representation of the stack after a function has been called

Now you should have a good understanding of how a function works with the stack. Let's get a little more in-depth and look at what is going on from an assembly perspective. Compile our simple C function with the following command:

 [root@localhost /]#  gcc mpreferred-stack-boundary=2 ggdb function.c o function

Make sure you use the ggdb switch since we want to compile gdb output for debugging purposes. gdb is the GNU project debugger; you can read more about it at www.gnu.org/manual/gdb-4.17/gdb.html . We also want to use the preferred stack boundary switch, which will set up our stack into dword size increments . Otherwise, gcc will optimize the stack and make things more difficult than they need to be at this point. Load your results into gdb.

 [root@localhost /]#  gdb function GNU gdb 5.2.1 Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you  are welcome to change it and/or distribute copies of it under certain  conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for  details. This GDB was configured as "i386-redhat-linux"... (gdb)

First, look at how our function, function , is called. Disassemble main :

 (gdb) disas main Dump of assembler code for function main: 0x8048438 <main>:          push    %ebp 0x8048439 <main+1>:        move    %esp,%ebp 0x804843b <main+3>:        sub  (gdb) disas main Dump of assembler code for function main: 0x8048438 <main>:          push    %ebp 0x8048439 <main+1>:        move    %esp,%ebp 0x804843b <main+3>:        sub     $0x8,%esp  0x804843e <main+6>:        sub     $0x8,%esp 0x8048441 <main+9>:        push    $0x2 0x8048443 <main+11>:       push    $0x1 0x8048445 <main+13>:                        call    0x8048430 <function> 0x804844a <main+18>:                                   add    $0x10,%esp 0x804844d <main+21>:                                               leave 0x804844e <main+22>:                                                 ret End of assembler dump. 
 x8,%esp  0x804843e <main+6>:        sub  (gdb) disas main Dump of assembler code for function main: 0x8048438 <main>:          push    %ebp 0x8048439 <main+1>:        move    %esp,%ebp 0x804843b <main+3>:        sub     $0x8,%esp  0x804843e <main+6>:        sub     $0x8,%esp 0x8048441 <main+9>:        push    $0x2 0x8048443 <main+11>:       push    $0x1 0x8048445 <main+13>:                        call    0x8048430 <function> 0x804844a <main+18>:                                   add    $0x10,%esp 0x804844d <main+21>:                                               leave 0x804844e <main+22>:                                                 ret End of assembler dump. 
 x8,%esp 0x8048441 <main+9>:        push  (gdb) disas main Dump of assembler code for function main: 0x8048438 <main>:          push    %ebp 0x8048439 <main+1>:        move    %esp,%ebp 0x804843b <main+3>:        sub     $0x8,%esp  0x804843e <main+6>:        sub     $0x8,%esp 0x8048441 <main+9>:        push    $0x2 0x8048443 <main+11>:       push    $0x1 0x8048445 <main+13>:                        call    0x8048430 <function> 0x804844a <main+18>:                                   add    $0x10,%esp 0x804844d <main+21>:                                               leave 0x804844e <main+22>:                                                 ret End of assembler dump. 
 x2 0x8048443 <main+11>:       push  (gdb) disas main Dump of assembler code for function main: 0x8048438 <main>:          push    %ebp 0x8048439 <main+1>:        move    %esp,%ebp 0x804843b <main+3>:        sub     $0x8,%esp  0x804843e <main+6>:        sub     $0x8,%esp 0x8048441 <main+9>:        push    $0x2 0x8048443 <main+11>:       push    $0x1 0x8048445 <main+13>:                        call    0x8048430 <function> 0x804844a <main+18>:                                   add    $0x10,%esp 0x804844d <main+21>:                                               leave 0x804844e <main+22>:                                                 ret End of assembler dump. 
 x1 0x8048445 <main+13>:                        call    0x8048430 <function> 0x804844a <main+18>:                                   add  (gdb) disas main Dump of assembler code for function main: 0x8048438 <main>:          push    %ebp 0x8048439 <main+1>:        move    %esp,%ebp 0x804843b <main+3>:        sub     $0x8,%esp  0x804843e <main+6>:        sub     $0x8,%esp 0x8048441 <main+9>:        push    $0x2 0x8048443 <main+11>:       push    $0x1 0x8048445 <main+13>:                        call    0x8048430 <function> 0x804844a <main+18>:                                   add    $0x10,%esp 0x804844d <main+21>:                                               leave 0x804844e <main+22>:                                                 ret End of assembler dump. 
 x10,%esp 0x804844d <main+21>:                                               leave 0x804844e <main+22>:                                                 ret End of assembler dump.

At <main+9> and <main+11> , we see that the values of our two parameters ( 0x1 and 0x2 ) are pushed backwards onto the stack. At <main+13> , we see the call instruction, which, although it is not expressly shown, pushes RET ( EIP ) onto the stack. Call then transfers flow of execution to function , at address 0x8048430 . Now, disassemble function and see what happens when control is transferred there.

 (gdb) disas main Dump of assembler code for function function: 0x8048430 <function>:      push    %ebp 0x8048431 <function+1>:    move    %esp,%ebp 0x8048433 <function+3>:    sub  (gdb) disas main Dump of assembler code for function function: 0x8048430 <function>:      push    %ebp 0x8048431 <function+1>:    move    %esp,%ebp 0x8048433 <function+3>:    sub     $0x8,%esp  0x8048436 <function+6>:    leave 0x8048437 <function+9>:    ret End of assembler dump. 
 x8,%esp  0x8048436 <function+6>:    leave 0x8048437 <function+9>:    ret End of assembler dump.

Since our function does nothing but set up a local variable, array , the disassembly output is relatively simple. Essentially, all we have is the function prolog, and the function returning control to main . The prolog first stores the current frame pointer, EBP , onto the stack. It then copies the current stack pointer into EBP at <function+1> . Finally, the prolog creates enough space on the stack for our local variable, array , at <function+3> . array is only 5 bytes in size, but the stack must allocate memory in 4-byte chunks , so we end up reserving 8 bytes of stack space for our locals.