The Sin Explained

The classic incarnation of a buffer overrun is known as smashing the stack. In a compiled program, the stack is used to hold control information, such as arguments, where the application needs to return to once it is done with the function and because of the small number of registers available on x86 processors, quite often registers get stored temporarily on the stack. Unfortunately, variables that are locally allocated are also stored on the stack. These stack variables are sometimes inaccurately referred to as being statically allocated, as opposed to being dynamically allocated heap memory. If you hear someone talking about a static buffer overrun, what they really mean is a stack buffer overrun. The root of the problem is that if the application writes beyond the bounds of an array allocated on the stack, the attacker gets to specify control information. And this is critical to success; the attacker wants to modify control data to values of his bidding.

One might ask why we continue to use such an obviously dangerous system. We had an opportunity to escape the problem, at least in part, with a migration to Intels 64-bit Itanium chip, where return addresses are stored in a register. The problem is that wed have to tolerate a significant backwards compatibility loss, and as of this writing, it appears that the x64 chip will likely end up the more popular chip.

You may also be asking why we just dont all migrate to code that performs strict array checking and disallows direct memory access. The problem is that for many types of applications, the performance characteristics of higher-level languages are not adequate. One middle ground is to use higher-level languages for the top-level interfaces that interact with dangerous things (like users!), and lower-level languages for the core code. Another solution is to fully use the capabilities of C++, and use string libraries and collection classes. For example, the Internet Information Server (IIS) 6.0 web server switched entirely to a C++ string class for handling input, and one brave developer claimed hed amputate his little finger if any buffer overruns were found in his code. As of this writing, the developer still has his finger and no security bulletins have been issued against the web server in the nearly two years since its release. Modern compilers deal well with templatized classes, and it is possible to write very high-performance C++ code.

Enough theorylets consider an example:

 #include <stdio.h> void DontDoThis(char* input) {  char buf[16];  strcpy(buf, input);  printf("%s\n", buf); } int main(int argc, char* argv[]) {  //so we're not checking arguments  //what do you expect from an app that uses strcpy?  DontDoThis(argv[1]);  return 0; } 

Now lets compile the application and take a look at what happens. For this demonstration, the author used a release build with debugging symbols enabled and stack checking disabled. A good compiler will also want to inline a function as small as DontDoThis, especially if it is only called once, so he also disabled optimizations. Heres what the stack looks like on his system immediately prior to calling strcpy:

 0x0012FEC0 c8 fe 12 00 .. <- address of the buf argument 0x0012FEC4 c4 18 32 00 .2. <- address of the input argument 0x0012FEC8 d0 fe 12 00 .. <- start of buf 0x0012FECC 04 80 40 00 .<<Unicode: 80>>@. 0x0012FED0 e7 02 3f 4f .?O 0x0012FED4 66 00 00 00 f... <- end of buf 0x0012FED8 e4 fe 12 00 .. <- contents of EBP register 0x0012FEDC 3f 10 40 00 ?.@. <- return address 0x0012FEE0 c4 18 32 00 .2. <- address of argument to DontDoThis 0x0012FEE4 c0 ff 12 00 ..  0x0012FEE8 10 13 40 00 ..@. <- address main() will return to 

Remember that all of the values on the stack are backwards. This example is from an Intel system, which is little-endian. This means the least significant byte of a value comes first, so if you see a return address in memory as 3f104000, its really address 0x0040103f.

Now lets look at what happens when buf is overwritten. The first control information on the stack is the contents of the Extended Base Pointer (EBP) register. EBP contains the frame pointer, and if an off-by-one overflow happens, EBP will be truncated. If the attacker can control the memory at 0x0012fe00 (the off-by-one zeros out the last byte), the program jumps to that location and executes attacker-supplied code.

If the overrun isnt constrained to one byte, the next item to go is the return address. If the attacker can control this value, and is able to place enough assembly into a buffer that they know the location of, youre looking at a classic exploitable buffer overrun. Note that the assembly code (often known as shell code because the most common exploit is to invoke a command shell) doesnt have to be placed into the buffer thats being overwritten. Its the classic case, but in general, the arbitrary code that the attacker has placed into your program could be located elsewhere. Dont take any comfort from thinking that the overrun is confined to a small area.

Once the return address has been overwritten, the attacker gets to play with the arguments of the exploitable function. If the program writes to any of these arguments before returning, it represents an opportunity for additional mayhem. This point becomes important when considering the effectiveness of stack tampering countermeasures such as Crispin Cowans Stackguard, IBMs ProPolice, and Microsofts /GS compiler flag.

As you can see, weve just given the attacker at least three ways to take control of our application, and this is only in a very simple function. If a C++ class with virtual functions is declared on the stack, then the virtual function pointer table will be available, and this can easily lead to exploits. If one of the arguments to the function happens to be a function pointer, which is quite common in any windowing system (for example, X Window System or Microsoft Windows), then overwriting the function pointer prior to use is an obvious way to divert control of the application.

Many, many more clever ways to seize control of an application exist than our feeble brains can think of. There is an imbalance between our abilities as developers and the abilities and resources of the attacker. Youre not allowed an infinite amount of time to write your application, but attackers may not have anything else to do with their copious spare time than figure out how to make your code do what they want. Your code may protect an asset thats valuable enough to justify months of effort to subvert your application. Attackers spend a great deal of time learning about the latest developments in causing mayhem, and have resources like www.metasploit.com where they can point and click their way to shell code that does nearly anything they want while operating within a constrained character set.

If you try to determine whether something is exploitable, it is highly likely that you will get it wrong. In most cases, it is only possible to prove that something is either exploitable or that you are not smart enough (or possibly have not spent enough time) to determine how to write an exploit. It is extremely rare to be able to prove with any confidence at all that an overrun is not exploitable.

The point of this diatribe is that the smart thing to do is to just fix the bugs ! There have been multiple times that code quality improvements have turned out to be security fixes in retrospect. This author just spent more than three hours arguing with a development team about whether they ought to fix a bug. The e-mail thread had a total of eight people on it, and we easily spent 20 hours (half a person-week) debating whether to fix the problem or not because the development team wanted proof that the code was exploitable. Once the security experts proved the bug was really a problem, the fix was estimated at one hour of developer time and a few hours of test time. Thats an incredible waste of time.

The one time when you want to be analytical is immediately prior to shipping an application. If an application is in the final stages, youd like to be able to make a good guess whether the problem is exploitable to justify the risk of regressions and destabilizing the product.

Its a common misconception that overruns in heap buffers are less exploitable than stack overruns, but this turns out not to be the case. Most heap implementations suffer from the same basic flaw as the stackthe user data and the control data are intermingled. Depending on the implementation of the memory allocator, it is often possible to get the heap manager to place four bytes of the attackers choice into the location specified by the attacker. The details of how to attack a heap are somewhat arcane. A recent and clearly written presentation on the topic, Reliable Windows Heap Exploits by Matthew shok Conover & Oded Horovitz, can be found at http://cansecwest.com/ csw04/csw04-Oded+Connover.ppt. Even if the heap manager cannot be subverted to do an attackers bidding, the data in the adjoining allocations may contain function pointers, or pointers that will be used to write information. At one time, exploiting heap overflows was considered exotic and hardheap overflows are now some of the more frequent types of exploited errors.

Sinful C/C++

There are many, many ways to overrun a buffer in C/C++. Heres what caused the Morris finger worm:

 char buf[20];  gets(buf); 

There is absolutely no way to use gets to read input from stdin without risking an overflow of the bufferuse fgets instead. Perhaps the second most popular way to overflow buffers is to use strcpy (see the previous example). This is another way to cause problems:

 char buf[20];  char prefix[] = "http://";  strcpy(buf, prefix);  strncat(buf, path, sizeof(buf)); 

What went wrong? The problem here is that strncat has a poorly designed interface. The function wants the number of characters of available buffer, or space left, not the total size of the destination buffer. Heres another favorite way to cause overflows:

 char buf[MAX_PATH];  sprintf(buf, "%s - %d\n", path, errno); 

Its nearly impossible , except for in a few corner cases, to use sprintf safely. A critical security bulletin for Microsoft Windows was released because sprintf was used in a debug logging function. Refer to bulletin MS04-011 for more information (see the link in the Other Resources section).

Heres another favorite:

 char buf[32]; strncpy(buf, data, strlen(data)); 

So whats wrong with this? The last argument is the length of the incoming buffer, not the size of the destination buffer!

Another way to cause problems is by mistaking character count for byte count. If youre dealing with ASCII characters, these are the same, but if youre dealing with Unicode, there are two bytes to one character. Heres an example:

 _snwprintf(wbuf, sizeof(wbuf), "%s\n", input); 

The following overrun is a little more interesting:

 bool CopyStructs(InputFile* pInFile, unsigned long count) {  unsigned long i;  m_pStructs = new Structs[count];  for(i = 0; i < count; i++)  {  if(!ReadFromFile(pInFile, &(m_pStructs[i])))  break;  } } 

How can this fail? Consider that when you call the C++ new[] operator, it is similar to the following code:

 ptr = malloc(sizeof(type) * count); 

If the user supplies the count, it isnt hard to specify a value that overflows the multiplication operation internally. Youll then allocate a buffer much smaller than you need, and the attacker is able to write over your buffer. The upcoming C++ compiler in Microsoft Visual Studio 2005 contains an internal check to prevent this problem. The same problem can happen internally in many implementations of calloc , which performs the same operation. This is the crux of many integer overflow bugs: Its not the integer overflow that causes the security problem; its the buffer overrun that follows swiftly that causes the headaches . But more about this in Sin 3.

Heres another way a buffer overrun can get created:

 #define MAX_BUF 256 void BadCode(char* input) {  short len;  char buf[MAX_BUF];  len = strlen(input);  //of course we can use strcpy safely  if(len < MAX_BUF)  strcpy(buf, input); } 

This looks as if it ought to work, right? The code is actually riddled with problems. Well get into this in more detail when we discuss integer overflows in Sin 3, but first consider that literals are always of type signed int. An input longer than 32K will flip len to a negative number; it will get upcast to an int and maintain sign; and now it is always smaller than MAX_BUF, causing an overflow. A second way youll encounter problems is if the string is larger than 64K. Now you have a truncation error: len will be a small positive number. The main fix is to remember that size_t is defined in the language as the correct type to use for variables that represent sizes by the language specification. Another problem thats lurking is that input may not be null- terminated . Heres what better code looks like:

 const size_t MAX_BUF = 256; void LessBadCode(char* input) {  size_t len;  char buf[MAX_BUF];  len = strlen(input);  //of course we can use strcpy safely  if(len < MAX_BUF)  strcpy(buf, input); } 

Related Sins

One closely related sin is integer overflows. If you do choose to mitigate buffer overruns by using counted string handling calls, or are trying to determine how much room to allocate on the heap, the arithmetic becomes critical to the safety of the application.

Format string bugs can be used to accomplish the same effect as a buffer overrun, but arent truly overruns. A format string bug is normally accomplished without overrunning any buffers at all.

A variant on a buffer overrun is an unbounded write to an array. If the attacker can supply the index of your array, and you dont correctly validate whether its within the correct bounds of the array, a targeted write to a memory location of the attackers choosing will be performed. Not only can all of the same diversion of program flow happen, but also the attacker may not have to disrupt adjacent memory, which hampers any countermeasures you might have in place against buffer overruns.



19 Deadly Sins of Software Security. Programming Flaws and How to Fix Them
Writing Secure Code
ISBN: 71626751
EAN: 2147483647
Year: 2003
Pages: 239

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net