Searching for Yourself | Shellcoders Programming Uncovered (Uncovered series)

The first task of the shellcode is to determine its own location in the memory or, to be more precise, the current value of the instruction-pointer register (in particular, in x86 processors it is the EIP register).

Static buffers located in the data section are located at predictable addresses easily revealed by disassembling a vulnerable application. However, they are oversensitive to the version of the attacked application and, to a smaller degree, to the operating-system model (different operating systems have a different lower address for loading applications). DLLs in most cases are relocatable and can be loaded into the memory at different base addresses, although, in case of static linking, each individual set of DLLs always is loaded in the same way. Automatic buffers located in the stack and dynamic buffers located in the heap always have barely predictable or even unpredictable addresses.

The use of absolute addressing (or strict linking to specific addresses such as mov eax, [406090h] ) makes shellcode dependent on the surrounding environment and results in multiple crashes of vulnerable applications, for which the buffer happened to be located in an unexpected position. The earlier generations of hackers growl, complaining that contemporary hackers cannot even cause buffer overflow without crashing the system. To avoid crashes, the shellcode must be fully relocatable; in other words, it must be capable of working in any addresses, which are not known to it beforehand.

The formulated problem can be solved using two approaches ” either using only relative addressing (which cannot be achieved on the x86 platform) or determining the base loading address and then counting from that address. Both methods are considered below, with the detail level typical for hackers.

The x86 family of processors doesn't have a close relationship with relative addressing. Development of the shellcode for such processors is an excellent intellectual puzzle, providing a large application area for various tricks. There are two relative commands ( call and jmp/jx with the E8h and Ebh, E9h/7xh, OF 8xh opcodes, respectively). Both commands are flow-control commands. Direct use of the EIP register in address expressions is not allowed.

The use of relative CALL instructions in the 32-bit mode is characterized by its own typical difficulties. The command argument is specified by a signed 4-byte integer number, counting from the start of the next command. Consequently, when calling underlying subroutines, most significant bits of the command argument contain only zeros. Because in string buffers the zero character can be encountered only once, such shellcode won't be able to work. If zeros are replaced by something else, it is possible to make a long jump, going far beyond the limits of the chosen memory block.

To carry out a jump to an absolute address (for instance, to call some system function or a function of a vulnerable program), it is possible to use the construct like call register/ jmp register, previously loading the register using the command that appears as follows : mov register, direct operand (zero characters can be excluded by using commands of address arithmetic). It is also possible to use commands like call direct operand with opcodes FF /2, 9A or FF /3 for near jump, far jump, and jump by the operand in memory jumps , respectively.

Relative addressing of the data (including the self-modifying code) is ensured by a more difficult approach. All commands available to the hacker are addressed exclusively in relation to the stack-top pointer register (in x86 processors, it is the ESP register). This is attractive but characterized by a certain risk. In general, the position of the stack pointer after the overflow is undefined, and the availability of the required amount of stack memory is not guaranteed . Thus, hackers must act at their own risk.

The stack can also be used for preparing string or numerical arguments of the system functions by forming them using the push command and passing them using the relative ESP + X pointer, where X can be either the number or the register. Preparation of the self-modified code is carried out in a similar way ” it is necessary to first push the code into the stack and then modify it basing on the value of the ESP register.

Supporters and fans of the classical methods can choose another route, determining the current EIP position using the call $ + 5/ret construct. It should be pointed out, however, that it is impossible to pass such a sequence of machine commands into the string buffer, because the 32-bit argument of the call command contains several zero characters. In the simplest case, they can be eliminated by the 66 E8 FF FF CO "magic spell," which is equivalent to the instructions call $ - 3/inc eax superimposed over each other (these might not be only EAX and only INC ). Then, it only remains to pop the contents of the stack top into any general-purpose register, for example, EBP or EBX . Unfortunately, it is impossible to do without the stack, and the suggested method requires the stack-top pointer to point at the allocated memory region available for writing. To be on the safe side (if the overflowing buffer actually overflows the stack), the hacker must initialize the ESP register independently. This goal can be easily achieved because most register variables of the vulnerable program contain predictable values or, to be more precise, are used in a predictable way. For example, in C++ programs, ECX is guaranteed to contain the this pointer, and the this pointer is guaranteed to contain at least 4 bytes of available memory.

To further develop this idea, it is necessary to point out that ignoring the values of registers available to the shellcode when it starts execution is inexpedient. Most of them point to useful data structures and allocated memory regions , which surely can be used without any risk of causing an exception or other unexpected problems. Some register variables are sensitive to the version of the vulnerable applications; some are sensitive to the compiler version and command-line options used when compiling. Thus, these " guarantees " are relative (like everything that exists on the Earth).