Decompiling Shellcode | Shellcoders Programming Uncovered (Uncovered series)

Describing various aspects of the shellcode compiling can be done in a straightforward way. However, when explaining an inverse problem, the situation becomes more complicated. All accumulated skills of translating shellcode become useless when it comes to analyzing shellcode written by someone else. The art of disassembling shellcode is based on some unobvious tricks, some of which will be covered in this section.

The first and most fundamental problem with shellcode analysis is searching for the entry point. Most carriers of the shellcode (exploits and worms) encountered in the wild are supplied to investigators in the form of either the memory dump taken from the infected machine or either the chopped off head of a worm; sometimes, they appear in the form of source code published in some e-zine.

At the first glance, it might seem that availability of the source code leaves no room for questions. This is not so. Consider a fragment of the source code of IIS-Worm with shellcode inside (Listing 15.5).

Listing 15.5: Fragment of IIS-Worm with shellcode inside

 char sploit[] = { 0x47, 0x45, 0x54, 0x20, 0x2F, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, ... 0x21, 0x21, 0x21, 0x21, 0x21, 0x21, 0x21, 0x21, 0x21, 0x21, 0x21, 0x21, 0x21, 0x21, 0x2E, 0x68, 0x74, 0x72, 0x20, 0x48, 0x54, 0x54, 0x50, 0x2F, 0x31, 0x2E, 0x30, 0x0D, 0x0A, 0x0D,0x0A };

Any attempts at directly disassembling the shellcode won't produce any positive results, because the worm's head starts from the GET /AAAAAAAAAAAAAAAAAA... string, which doesn't need any disassembling. It is not known beforehand, from which byte the actual code begins. To determine the actual location of the entry point, it is necessary to feed the worm's head to some vulnerable application and see where the EIP register would point. Theoretically, this will be the entry point. In practice, however, this is an excellent method of wasting the time, and nothing else.

To begin with, recall that debugging is a potentially dangerous and unjustifiably aggressive method of investigation. No one would allow experiments with the "live" server. Thus, the vulnerable software must be installed at a standalone computer that doesn't contain anything the hacker would be sorry to lose. At the same time, this must be exactly the same version of the software that the virus is capable of infecting without ruining anything else; otherwise , who knows what will gain control instead of the true entry point. However, not every investigator has a collection of various versions of software and lots of operating systems.

Furthermore, no one can guarantee that you'll correctly determine the instance the control is passed to the shellcode at. Dumb tracing is of no use here because contemporary software is too bulky, and control might be passed after thousands or even hundreds thousands of machine instructions, which might be carried out in parallel threads. As far as I know, there are no debuggers capable of tracing several threads simultaneously . It is possible to set an "executable" breakpoint to the memory region containing the target buffer. However, this won't help when the shellcode is passed through the chain of buffers, of which only one is vulnerable to overflow and the other ones are normal.

On the other hand, it is possible to visually determine the entry point. To achieve this, it is enough to load the shellcode into some disassembler and check various starting addresses. Among them, choose the one that provides the most meaningful code. The most convenient way of carrying out this operation is using HIEW or any other HEX editor with similar functional capabilities, because IDA is too bulky and not flexible enough for these goals. Be prepared to discover that the main body of the shellcode will be encrypted, and only the decryptor will appear as something meaningful. Worse still, this decryptor might be spread over the entire worm's head and intentionally "stuffed" with garbage instructions.

If the shellcode passes control to itself using jmp esp (which most frequently is the case), then the entry point will be moved to the first byte of the worm's head ” in other words, to the GET /AAAAAAAAAAAAAAAAAA... string, not to the first byte located directly after its end, in contrast to the popular opinion declared in some manuals. For example, the Code Red 1, 2 and IIS-Worm worms are organized in exactly this way.

The situations, in which the control is passed into the middle of the shell-code, occur more rarely. In this case, it makes some sense to search for the chain of NOP instructions located near the entry point. This chain of NOP s is used by the worm to ensure "compatibility" among different versions of the vulnerable software. In the course of recompiling, the location of the overflowing buffer might slightly change. The NOP s, therefore, come to the worm's rescue, playing the same role as a funnel in filling a bottle . The decryptor provides another clue. If you are capable of finding the decryptor, you'll also find the entry point. In addition, it is possible to use the IDA's flow chart visualizer, which displays flow control as something much like a large bunch of grapes, where the entry point plays the role of a graft (see Fig. 15.1).

Figure 15.1: IDA visualizer displaying flow control in the form of a diagram (large scale)

Now consider a more complicated case, namely, the self-modifying head of the Code Red worm, which dynamically changes the unconditional jmp for passing control to a specific code section. IDA won't be able to automatically restore all cross-references, and part of functions will "hang" separately from the main "bunch." As a result, four candidates for the role of the entry point will be obtained. Three of them can be discarded immediately, because they contain meaningless code accessing uninitialized registers and variables . Only the true entry point produces meaningful code (Fig. 15.1). On the diagram, this is the fourth point from the left.

It is more difficult to solve the problem of the shellcode "binding" to the surrounding environment, such as the contents of the registers inherited by the worm from the vulnerable program. How is it possible to discover, which values they take, without accessing a vulnerable program? Well, although it is impossible to tell beforehand for sure, in most cases this can be guessed. More precisely, by analyzing the nature of interactions with these registers, it is possible to determine what the worm expects from them. It is unlikely that the worm would rely on specific constants. It is more probable that the worm would try to invade a specific memory block, the pointer to which is stored in a specific register (for example, the ECX register usually stores the this pointer).

It is much worse if the virus accesses the functions of the vulnerable program by calling them by fixed addresses. It is difficult to guess the responsibilities of each function. The only clue is provided by the arguments passed to the function. However, this clue is too weak to allow the investigation result to be considered reliable enough. In this case, it is impossible to do an adequate analysis without disassembling the vulnerable application.