Basic Exploit Methodology (t_delete) | Hacking Ubuntu: Serious Hacks Mods and Customizations (ExtremeTech)

Traditional heap overflow exploit methodology on Solaris is based on chunk consolidation. By overflowing outside the bounds of the current chunk , the header of the next chunk in memory is corrupted. When the corrupted chunk is processed by heap management routines, an arbitrary memory overwrite is achieved that eventually leads to shellcode execution.

The overflow results in the size of the next chunk being changed. If it is overwritten with an appropriate negative value, the next chunk will be found farther back in the overflow string. This is useful because a negative chunk size does not contain any null bytes, and can be copied by string library functions. A TREE structure can be constructed farther back in the overflow string. This can function as a fake chunk with which the corrupted chunk will be consolidated.

The simplest construction for this fake chunk is that which causes the function t_delete () to be called. This methodology was first outlined in the article in Phrack #57 entitled "Once Upon a free()" (August 11, 2001). The following code snippets can be found within malloc.c and mallint.h :

Within realfree() :

 /* see if coalescing with next block is warranted */      np = NEXT(tp);      if (!ISBIT0(SIZE(np))) {           if (np != Bottom)                t_delete(np);

And the function t_delete() :

 /*  * Delete a tree element  */ static void t_delete(TREE *op) {      TREE     *tp, *sp, *gp;          /* if this is a non-tree node */      if (ISNOTREE(op)) {           tp = LINKBAK(op);           if ((sp = LINKFOR(op)) != NULL)                LINKBAK(sp) = tp;           LINKFOR(tp) = sp;           return;      }

Some relevant macros are defined as:

 #define     SIZE(b)     (((b)->t_s).w_i) #define     PARENT(b)     (((b)->t_p).w_p) #define     LEFT(b)     (((b)->t_l).w_p) #define     RIGHT(b)     (((b)->t_r).w_p) #define     LINKFOR(b)     (((b)->t_n).w_p) #define     LINKBAK(b)     (((b)->t_p).w_p) #define     ISNOTREE(b)     (LEFT(b) == (TREE *)(-1))

As can be seen in the code, a TREE op structure is passed to t_delete() . This structure op is the fake chunk constructed and pointed to by the overflow. If ISNOTREE() is true, then two pointers tp and sp will be taken from the fake TREE structure op . These pointers are completely controlled by the attacker, and are TREE structure pointers. A field of each is set to a pointer to the other TREE structure.

The LINKFOR macro refers to the t_n field within the TREE structure, which is located at an offset 32 bytes into the structure, while the LINKBAK macro refers to the t_p field located 8 bytes into the structure. ISNOTREE is true if the t_l field of the TREE structure is -1 , and this field is located 16 bytes into the structure.

While this may seem slightly confusing, the ultimate result of the above code is the following:

If the t_l field of the TREE op is equal to -1, the resulting steps occur. This field is at an offset 16 bytes into the structure.
The TREE pointer tp is initialized via the LINKBAK macro, which takes the t_p field from op . This field is at an offset 8 bytes into the structure.
he TREE pointer sp is initialized via the LINKFOR macro, which takes the t_n field from op . This field is at an offset 32 bytes into the structure.
The t_p field of sp is set to the pointer tp via the macro LINKBAK . This field is located at an offset 8 bytes into the structure.
The t_n field of tp is set to the pointer sp via the macro LINKFOR . This field is located at an offset 32 bytes into the structure.

Steps 4 and 5 are the most interesting in this procedure, and may result in an arbitrary value being written to an arbitrary address in what is best described as a reciprocal write situation. This operation is analogous to removing an entry in the middle of a doubly linked list and re-linking the adjacent members . The TREE structure construction that can achieve this looks like that shown in Table 10.9.

Table 10.9: Required TREE Structure for a Reciprocal Write
FF FF FF F8 AA AA AA AA	TP TP TP TP AA AA AA AA
FF FF FF FF AA AA AA AA	AA AA AA AA AA AA AA AA
SP SP SP SP AA AA AA AA	AA AA AA AA AA AA AA AA

The preceding TREE construction will result in the value of tp being written to sp plus 8 bytes, as well as the value of sp being written to tp plus 32 bytes. For example, sp might point at a function pointer location minus 7 bytes, and tp might point at a location containing an NOP sled and shellcode. When the code within t_delete is executed, the function pointer will be overwritten with the value of tp which points to the shellcode. However, a value 32 bytes into the shellcode will also be overwritten with the value of sp .

The value 16 bytes into the tree structure of FF FF FF FF is the -1 needed to indicate that this structure is not part of a tree. The value at offset zero of FF FF FF F8 is the chunk size. It is convenient to make this value negative to avoid null bytes; however, it can be any realistic chunk size provided that the lowest two bits are not set. If the first bit is set, it would indicate that the chunk was in use and not suitable for consolidation. The second bit should also be clear to avoid consolidation with a previous chunk. All bytes indicated by AA are filler and can be any value.

Standard Heap Overflow Limitations

We previously touched on the first limitation of the non-tree deletion heap overflow mechanism. A 4-byte value at a predictable offset into the shellcode is corrupted in the free operation. A practical solution is to use NOP padding that consists of branch operations that jump ahead a fixed distance. This can be used to jump past the corruption that occurs with the reciprocal write, and continue to execute shellcode as normal.

If it is possible to include at least 256 padding instructions before the shellcode, the following branch instruction can be used as a padding instruction in heap overflows. It will jump ahead 0x404 bytes, skipping past the modification made by the reciprocal write. The branch distance is large in order to avoid null bytes, but if null bytes can be included in your shellcode then by all means reduce the branch distance.

 #define BRANCH_AHEAD "\x10\x80\x01\x01"

Note that if you choose to overwrite a return address on the stack, the sp member of the TREE structure must be made to point to this location minus 8 bytes. You could not point the tp member to the return location minus 32 bytes, because this would result in a value at the new return address plus 8 bytes being overwritten with a pointer that isn't valid code. Remember that ret is really a synthetic instruction that does jmpl %i7 + 8, %g0 . The register %i7 holds the address of the original call, so execution goes to that address plus 8 bytes (4 for the call , and 4 for the delay slot). If an address at an offset of 8 bytes into the return address were overwritten, this would be the first instruction executed, causing a crash for certain. If you instead overwrite a value 32 bytes into the shellcode and 24 past the first instruction, you then have a chance to branch past the corrupted address.

The reciprocal write situation introduces another limitation that is not generally critical in most cases, but is worth mentioning. Both the target address being overwritten and the value used to overwrite it must be valid writable addresses. They are both written to, and using a non-writable memory region for either value will result in a segmentation fault. Since normal code is not writable, this precludes return to libc type attacks, which try to make use of preexisting code found within the process address space.

Another limitation of exploiting the Solaris heap implementation is that a malloc or realloc must be called after a corrupted chunk is freed. Since free() only places a chunk into a free list, but does not actually perform any processing on it, it is necessary to cause realfree() to be called for the corrupted chunk. This is done almost immediately within malloc or realloc (via cleanfree ). If this is not possible, the corrupted chunk can be truly freed by causing free() to be called many times in a row. The free list holds a maximum of 32 entries, and when it is full each subsequent free() results in one entry being flushed from the free list via realfree() . malloc and realloc calls are fairly common in most applications and often isn't a huge limitation; however, in some cases where heap corruption isn't fully controllable, it is difficult to prevent an application from crashing before a malloc or realloc call occurs.

Certain characters are essential in order to use the method described above, including, specifically , the character 0xFF , which is necessary to make ISNOTREE() true. If character restrictions placed on input prevent these characters from being used as part of an overflow, it is always possible to perform an arbitrary overwrite by taking advantage of code farther down within t_delete() , as well as t_splay(). This code will process the TREE structure as though it is actually part of the free tree, making this overwrite much more complicated. More restrictions will be placed on the values written and addresses written to.

Targets for Overwrite

The ability to overwrite 4 bytes of memory at an arbitrary location is enough to cause arbitrary code execution; however, an attacker must be exact about what is overwritten in order to achieve this.

Overwriting a saved program counter on the stack is always a viable option, especially if an attack can be repeated. Small variations in command-line arguments or environment variables tend to shift stack addresses slightly, resulting in them varying from system to system. However, if the attack isn't one-shot, or an attacker has specific knowledge about the system, it's possible to perform a stack overwrite with success.

Unlike many other platforms, code within the Procedure Linkage Table (PLT) on Solaris/SPARC doesn't dereference a value within the Global Offset Table (GOT). As a result, there aren't many convenient function pointers to overwrite. Once lazy binding on external references is resolved on demand, and once external references have been resolved, the PLT is initialized to load the address of an external reference into %g1 and then JMP to that address. Although some attacks allow overwriting of the PLT with SPARC instructions, heap overflows aren't conducive to that in general. Since both the tp and sp members of the TREE structure must be valid writable addresses, the possibility of creating a single instruction that points to your shellcode and is also a valid writable address is slim at best.

However, there are many useful function pointers within libraries on Solaris. Simply tracing from the point of overflow in gdb is likely to reveal useful addresses to overwrite. It will likely be necessary to create a large list of library versions to make an exploit portable across multiple versions and installations of Solaris. For example, the function mutex_lock is commonly called by libc functions to execute non-thread-safe code. It's called immediately on malloc and free , among many others. This function accesses an address table called ti_jmp_table within the .data section of libc, and calls a function pointer located 4 bytes into this table.

Another possibly useful example is a function pointer called when a process calls exit() . Within a function called _exithandle , a function pointer is retrieved from an area of memory within the .data section of libc called static_mem . This function pointer normally points at the fini() routine called on exit to cleanup , but it can be overwritten to cause arbitrary code execution upon exit. Code such as this is relatively common throughout libc and other Solaris libraries, and provides a good opportunity for arbitrary code execution.

The Bottom Chunk

The Bottom chunk is the final chunk before the end of the heap and unpaged memory. This chunk is treated as a special case in most heap implementations , and Solaris is no exception. The Bottom chunk is almost always free if present, and therefore even if its header is corrupted it will never actually be freed. An alternative is necessary if you are unfortunate enough to be able to corrupt only the bottom chunk.

The following code can be found within _malloc_unlocked :

 /* if found none fitted in the tree */      if (!sp) {           if (Bottom && size <= SIZE(Bottom)) {                sp = Bottom;          .....          /* if the leftover is enough for a new free piece */      if ((n = (SIZE(sp) - size)) >= MINSIZE + WORDSIZE) {           n -= WORDSIZE;           SIZE(sp) = size;           tp = NEXT(sp);           SIZE(tp) = nBIT0;           realfree(DATA(tp));

In this case, if the size of the Bottom chunk were overwritten with a negative size, realfree() could be caused to be called on user -controlled data at an offset into the Bottom chunk.

In the code sample above, sp points at the Bottom chunk with a corrupted size. A portion of the Bottom chunk will be taken for the new memory allocation, and the new chunk tp will have its size set to n . The variable n in this case is the corrupted negative size, minus the size of the new allocation and WORDSIZE . Realfree() is then called on the newly constructed chunk, tp , which has a negative size. At this point the methodology mentioned previously using t_delete() will work well.

Small Chunk Corruption

The minimum size for a true malloc chunk is the 48 bytes necessary to store the TREE structure (this includes the size header). Rather than rounding all small malloc requests up to this rather large size, the Solaris heap implementation has an alternative way of dealing with small chunks . Any malloc() request for a size less than 40 bytes results in different processing than requests for larger sizes. This is implemented by the function _smalloc within malloc.c . Requests that round up in size to 8, 16, 24, or 32 bytes are handled by this code.

The function _smalloc allocates an array of same- sized memory blocks to fill small malloc requests. These blocks are arranged in a linked list, and when an allocation request is made for an appropriate size the head of the linked list is returned. When a small chunk is freed, it doesn't go through normal processing but simply is put back into the right linked list at its head. Libc maintains a static buffer containing the heads of the linked lists. Since these memory chunks do not go through normal processing, certain alternatives are needed to deal with overflows that occur in them.

The structure of a small malloc chunk is shown in Table 10.10.

Table 10.10: Structure of a Small malloc Chunk
WORD size (8 bytes)	WORD next (8 bytes)	User data (8, 16, 24 or 32 bytes large)

Because small chunks are differentiated from large chunks solely by their size field, it is possible to overwrite the size field of a small malloc chunk with a large or negative size. This would result in it going through normal chunk processing when it is freed and allowing for standard heap exploitation methods .

The linked-list nature of the small malloc chunks allows for another interesting exploit mechanism. In some situations, it is not possible to corrupt nearby chunk headers with attacker-controlled data. Personal experience has shown that this situation is not completely uncommon, and often occurs when the data that overwrites the chunk header is an arbitrary string or some other uncontrollable data. If it is possible to overwrite other portions of the heap with attacker-defined data, however, it is often possible to write into the small malloc chunk linked lists. By overwriting the next pointer in this linked list, it is possible to make malloc() return an arbitrary pointer anywhere in memory. Whatever program data is written to pointer returned from malloc() will then corrupt the address you have specified. This can be used to achieve an overwrite of more than four bytes via a heap overflow, and can make some otherwise tricky overflows exploitable.