Principles of Building Self-Modifying Code | Shellcoders Programming Uncovered (Uncovered series)

Early models of x86 processors didn't support machine code coherence and didn't trace any attempts at modifying commands that have already entered the pipeline. On one hand, this has complicated the development of self-modifying code; on the other hand, it has allowed you to deceive the debugger operating in the tracing mode.

Consider the simple example in Listing 12.5.

Listing 12.5: Modifying a machine command that has already entered the pipeline

 MOV AL,  90h         LEA DI,  foo         STOSB foo:         INC AL

When this program is run on a "live" processor, the INC AL command is replaced by NOP ; however, because inc al has already entered the pipeline, the AL register still is incremented by one. Step-by-step tracing of the program behaves in a different manner. The debug exception generated directly after executing the STOSB instruction clears the pipeline, and NOP gains control instead of INC AL . Consequently, the AL register isn't incremented. If the al value is used for decrypting the program, the debugger will swear at the hacker.

Processors of the Pentium family trace modifications of commands that have already entered the pipeline; therefore, the software length of the pipeline equals zero. Consequently, protection mechanisms of the pipeline type, being executed on Pentium, erroneously assume that they are always executed under the debugger. This is a well-documented feature of the processor, which is expected to be preserved in the future models. The use of self-modifying code is legal. However, it is necessary to remember that excessive use will negatively affect the performance of the application being protected. The code cache of the first level is available only for reading, and direct writing into it is impossible . When machine commands are modified in memory, the data cache is being modified. Then the code cache is flushed urgently and modified cache lines are reloaded, which requires a large number of the processor ticks . Never execute self-modifying code in deeply-nested loops , unless you want to freeze your program so that it executes at the speed of a blacktop spreader.

It is rumored that self-modifying code is possible only in MS-DOS and that Windows prohibits its execution. This is only partially true; it is possible to bypass all prohibitions and limitations. First, it is necessary to grasp the idea of page or segment access attributes. Processors of the x86 family support three attributes for segment access (read, write, and execute) and two attributes for page access (access and write). Operating systems of the Windows family combine the code segment with the data segment within the unified address space; therefore, read and execute attributes are equivalent for them.

Executable code can reside anywhere in the available memory area ” stack, heap, global variables area, etc. By default, the stack and heap are available for writing and are quite suitable for holding self-modifying code. Constants and global and static variables are usually located in the .rdata section, which is available only for reading (and, naturally, for execution); therefore, any attempt at modifying them throws an exception.

Thus, all you need to do is copy the self-modifying code into the stack (heap), where it is free to do whatever it likes to itself. Consider the example in Listing 12.6.

Listing 12.6: Self-modifying code in the stack (heap)

 // Define the size of the self-modifying function. #define SELF_SIZE       ((int) x_self_mod_end - (int) x_self_mod_code) // Start of the self-modifying function. The naked // qualifier supported by the Microsoft Visual C compiler // instructs the compiler to create a naked // Assembly function, into which the compiler // must not include any unrelated garbage. __declspec(naked) int x_self_mod_code(int a, int b) { __asm{   begin_sm:                       ; Start of the self-modifying code.          MOV EAX, [ESP + 4]       ; Get the first argument.          CALL get_eip             ; Define the current position in memory.   get_eip:          ADD EAX, [ESP + 8 + 4]   ; Add or subtract the second argument                                   ; from the first one.          POP EDX                  ; EDX contains the starting address of                                   ; the ADD EAX, ... instruction.          XOR byte ptr [edx], 28h  ; Change ADD to SUB and vice versa.          RET                      ; Return to the parent function.   } } x_self_mod_end(){/* End of the self-modifying function */ } main() {   int a;   int (__cdecl *self_mod_code)(int a, int b);   // Uncomment the next string to make sure   // that self-modification under Windows is   // impossible (the system will throw an exception).   // self_mod_code(4, 2);   // Allocate memory from the heap (where   // self-modification is allowed). With the same success,   // it is possible to allocate memory in the stack:   // self_mod_code[SELF_SIZE];   self_mod_code = (int (__cdecl*)(int, int)) malloc(SELF_SIZE);   // Copy the self-modifying code into the stack or heap.   memcpy(self_mod_code, x_self_mod_code, SELF_SIZE);   // Call the self-modifying procedure ten times.  for (a=1; a<10; a++) printf("%02X",self_mod_code(4,2)); printf("\n"); }

Self-modifying code replaces the ADD machine code with SUB and sub with ADD ; therefore, calling self_mod_code in a loop returns the following sequence of numbers : 06 02 06 02... , thus confirming successful completion of self-modification.

Some programmers consider this technology too awkward . Others complain that the code being copied must be fully relocatable, which means that it must fully preserve its ability of working independently of the current location in memory. The code generated by the compiler, in general, doesn't provide this possibility, which forces the programmer to go down to the naked Assembly level. Stone Age! Haven't programmers invented any better and more advanced technologies since the time of Neanderthals, who made fire by friction and punched their punch cards with an awl made of bone? In fact, programmers have!

For diversity, try to create a simple encrypted procedure written entirely in a high-level language (C, for example, although the same techniques are suitable for Pascal with its crippled descendant called Delphi). When doing so, make the following assumptions: (l) the order of functions in memory coincides with the order, in which they are declared in the program (practically all compilers behave this way), and (2) the function being encrypted doesn't contain relocatable elements, also called fixups (this is true for most executable files; however, DLLs can't do without relocations).

To successfully decrypt the procedure, you'll need to determine its image base. This is not difficult. Contemporary high-level programming languages support operations with pointers to a function. In C/C++, this will look approximately as follows : void *p; p = (void*) func ;. Measuring the function length is considerably more difficult. Legal language tools do not provide such a possibility; therefore, it is necessary to resort to tricks, such as defining the length as the difference between two pointers: the pointer to the encrypted function and the pointer to the function located directly after its tail. If the compiler wants to violate the natural order of functions, this trick won't work and decryption will fail.

Finally, as far as I know, no compiler allows you to generate encrypted code. Therefore, this procedure must be carried out manually using HIEW or custom utilities developed on your own. How is it possible to find the function being encrypted in a binary file? Hackers use several competing technologies, giving preference to a specific one depending on the situation.

In the simplest case, the function being encrypted is enclosed in markers ” unique byte sequences guaranteed not to be encountered in any other part of the program. Usually, markers are specified using the _emit directive, which represents an analogue of the DB Assembly instruction. For example, the following construct will create the KPNC text string: __asm_emit 'K' __asm_emit 'P' __asm_emit 'N' _asm_emit 'C' . Do not try to place the markers within the function being encrypted. The processor won't understand this humor and will throw an exception. Place the markers at the top and bottom of the function, but never touch its body.

The choice of encryption algorithm is not a matter of critical importance. Some programmers use XOR , and others prefer the Data Encryption Standard (DES) or RSA. Naturally, XOR is much easier to crack, especially if the key length is small. However, in the example provided in Listing 12.7, I have chosen XOR because DES and RSA are too bulky. Furthermore, they are not illustrative .

Listing 12.7: Self-modification used for encryption

 #define CRYPT_LEN ((int)crypt_end - (int)for_crypt) // Starting marker mark_begin(){__asm _emit 'K' __asm _emit 'P' __asm _emit 'N' __asm _emit 'C'} // Encrypted function for_crypt(int a, int b) {        return a + b; } crypt_end(){} // End marker mark_end (){__asm _emit 'K' __asm. _emit 'P' __asm _emit 'N' __asm._emit 'C'} // Decryptor crypt_it(unsigned char *p, int c) {        int a;  for (a = 0; a < c; a++) *p++ ^= 0x66; } main() {        // Decrypt the protection function        crypt_it((unsigned char*) for_crypt, CRYPT_LEN);        // Call the protection function        printf("%02Xh\n", for_crypt(0x69, 0x66));        // Encrypt it again        crypt_it((unsigned char*) for_crypt, CRYPT_LEN); }

Having compiled this program in a normal way (for example, using the cl.exe /c filename.c command), you'll end up with the filename.obj object file. Now it is necessary to build the executable file, first disabling the protection of the code section against writing. In Microsoft's Link utility, this is achieved using the /SECTION command-line option followed by the section name and attributes assigned to it, for example, link.exe FileName.obj /FIXED /SECTION:.text, ERW . Here, /FIXED is the option that removes relocations. (Recall that the relocations must be deleted. When linking executable files, Microsoft Link uses this option by default, so if you happen to omit it nothing horrible will happen.) In addition, .text is the name of the code section, and ERW stands for Executable, Readable, Writable ” although executable might be omitted if desired because this has no effect on the usability of the executable file. Other linkers use other options, the descriptions of which can be found in the appropriate documentation. The name of the code section need not be .text , so if something goes wrong, use the Microsoft dumpbin utility to clarify the specific situation.

The file built by the linker is not still suitable for execution because the protected function has not been encrypted yet. To encrypt this function, start HIEW, switch to the HEX mode, and run the context search to find the marker string (<F7>, kPNC , <Enter>) (Fig. 12.1). Now, it only remains to encrypt everything enclosed by the KPNC markers. Press <F3> to switch to the editing mode, then press <F8> and specify the encryption mask (in this case, it is equal to 66h ). Every further stroke on the <F8> key encrypts 1 byte, moving the cursor over the text. Save the modifications by pressing <F9>. After completing the file encryption, markers are no longer needed. Therefore, if desired, you can overwrite them with some unintelligible garbage to make the protected procedure less noticeable.

Figure 12.1: Encrypting the protected procedure using HIEW

Now the file is ready for execution. Start it and, naturally, it will fail. Well! You must spoil before you spin, especially when dealing with self-modifying code. Try to determine what's wrong using the debugger, common sense, and disassembler. As The Matrix saying goes, "Everybody falls the first time."

Having succeeded, load the executable file into IDA Pro and view how the encrypted function appears ” ravings of a madman, if not something worse (Listing 12.8).

Listing 12.8: What the encrypted procedure looks likey

 .text:0040100C loc_40100C:                 ; CODE XREF: sub_40102E + 50p .text:0040100C           CMP    EAX, 0ED33A53Bh .text:00401011           MOV    CH, CH .text:00401013           AND    EBP, [ESI + 65h] .text:00401016           AND    EBP, [EDX + 3Bh] .text:00401019           MOVSD .text:0040101A .text:0040101A loc_40101A:                  ; DATA XREF: sub_40102E + 6   o .text:0040101A           XOR    EBP, EBP .text:0040101C           MOV    BH, [EBX] .text:0040101E           MOVSD .text:0040101F           XOR    EBP, EBP .text:00401021           MOV    DH, DS:502D3130h

Naturally, a spell of superimposed encryption can be easily removed ( experienced hackers can do this without exiting IDA Pro), so the level of this protection should not be overestimated. In addition, protection of the code section against writing was invented on purpose; therefore, disabling it is far from reasonable.

The virtualProtect API function allows manipulations with page attributes at your discretion. Using this function, it is possible to assign the writeable attributes only to pages that need modification and, immediately after completion of decryption, restore the protection against writing.

An improved variant of the crypt_it function might look as in Listing 12.9.

Listing 12.9: Using VirtualProtect to temporarily disable write protection on a local section

 crypt_it(unsigned char *p, int c) {        int a;        // Disable protection against writing.        VirtualProtect(p, c, PAGE_READWRITE, (DWORD*) &a);        // Decrypt the function.        for (a  =  0; a < c; a++) *p++ ^= 0x66;        // Restore protection.        VirtualProtect(p, c, PAGE_READONLY, (DWORD*) &a); }

Having compiled the file in a normal way, encrypt it according the preceding technique and then start it for execution. I hope that you'll succeed on the first attempt.

The Matrix

Full-value work with self-modifying code is impossible without knowing the opcodes of machine instructions and the principles of their encoding. Assume, for example, that you want to replace the X machine command with the Y machine command (for this example, let these be add eax, ebx and sub eax, ebx , respectively).

What specific actions should you take to achieve this goal? The simplest way is starting HIEW and, having switched into assembly mode, comparing their opcodes (without failing to remember about the correct choice of the mode ” 16 or 32 bit). As you can see, these commands differ from each other by a single byte, the replacement of which allows you to work wonders (Listing 12.10).

Listing 12.10: Using HIEW to obtain the opcodes of machine commands

 00000000: 03C3        ADD  EAX, EBX 00000002: 2BC3        SUB  EAX, EBX

Unfortunately, experiments with HIEW give catastrophically contradictory results; furthermore, they are not illustrative. Some Assembly instructions correspond to several machine commands. HIEW chooses the shortest one, which is not always convenient , because when designing self-modifying code it is necessary to choose instructions of strictly defined lengths.

For example, try to feed HIEW with the xor eax, 66 instruction. With immutable obstinacy, HIEW will assemble it into the following 3-byte machine command: 83 F0 66 . However, other variants exist (Listing 12.11).

Listing 12.11: Machine commands corresponding to the xor eax,66 command

 00000000: 83F066              XOR  EAX, 066       ; "f" 00000003: 3566000000          XOR  EAX, 000000066 ; "   f" 00000008: 81F066000000        XOR  EAX, 000000066 ; "   f"

The Tech Help electronic reference manual, available at many hacking sites, along with other invaluable information contains illustrative tables of opcodes that allow you to quickly determine, which machine commands can be obtained from a given instruction. Fig. 12.2 presents a fragment of the table of opcodes provided in this reference manual. Consider its use on examples.

	x0	x1	x2	x3	x4	x5	x6	x7
0x	ADD r/m,r8	ADD r/m,r16	ADD r8,r/m	ADD r16,r/m	ADD AL,im8	ADD AX,im16	PUSH ES	POP ES
1x	ADC r/m,r8	ADC r/m,r16	ADC r8,r/m	ADC r16,r/m	ADC AL,im8	ADC AX,im16	PUSH SS	POP SS
2x	AND r/m,r8	AND r/m,r16	AND r8,r/m	AND r16,r/m	AND AL,im8	AND AX,im16	SEG ES	DAA
3x	XOR r/m,r8	XOR r/m,r16	XOR r8,r/m	XOR r16,r/m	XOR AL,im8	XOR AX,im16	SEG SS	AAA
4x	INC AX	INC CX	INC DX	INC BX	INC SP	INC BP	INC ST	INC DI
5x	PUSH AX	PUSH CX	PUSH DX	PUSH BX	PUSH SP	PUSH BP	PUSH SI	PUSH DI
6x	* PUSHA	* POPA	* BOUND	ARPL	: SEG FS	: SEG GS	:opSize	: addrSiz

Figure 12.2: Fragment of the opcodes table from the Tech Help reference manual

Rows contain the least significant half byte of the first byte of the command opcode, and the most significant half bytes are contained in columns . Assembly instruction corresponding to the given machine code is contained at the row and column intersection.

For example, 40h stands for inc ax/inc eax (AX for 16-bit mode and EAX for 32-bit mode), and 16h stands for push ss . Consider a more sophisticated example: 03h corresponds to the add r16/32, r/m machine command. Here, r16/32 designates any 16/32-bit general-purpose register, and r/m designates any register/memory cell addressable through that register (for example, [EBX] ).

Now consider the generalized structure of a machine command in more detail. It comprises the following six fields:

Prefix ” An instruction can contain up to four prefixes (the Prefix field) or no prefixes. The size of each prefix is 1 byte.
Opcode ” The Opcode field contains 1- or 2-byte instruction code.
Mod R/M ” The choice of individual registers and methods of addressing is carried out in the second byte of the opcode, also called the Mod R/M field. The Mod R/M field specifies the required method of addressing and the registers to be used. It comprises three fields: Mod, Reg/Opcode , and R/M (Fig. 12.3), which requires a sophisticated interpretation (Fig. 12.4).
SIB ” The single-byte scale-Index-Base field defines the addressing method more precisely.
Offset ” The Offset field, depending on the addressing method, occupies 1, 2, or 4 bytes and contains the operand's offset.
Direct value ” The Direct value field represents the direct operand. This field might take 1, 2, or 4 bytes.

Figure 12.3: Generalized structure of the machine command

Figure 12.4: Possible values of the Mod R/M field

More detailed information on this topic is provided in the third volume of the Intel's reference manual (" Instruction Set Reference ").

Assume that it is necessary to add the EAX register to the EBX register. The first byte of the opcode of the ADD command has been defined already ” 03h . Now it is time to determine more precisely the registers and addressing methods. Consider (Fig. 12.4) that for direct addressing, the first 2 most significant bits of the second byte of the opcode must be equal to 11 . The next 3 bits (encoding the target register) are equal to 000 in this case, which corresponds to the EAX register. The 3 least significant bits of the second byte of opcode (encoding the source register) are equal to 011 , which corresponds to the EAX register. Thus, the entire byte is equal to C3h . The add eax, ebx instruction is assembled into 03 C3 , and HIEW confirms this.

Problems of Code Modification through the Internet

The technique of self-modification is closely related to the task of automatic code modification using the Internet. This is a complicated task that requires extensive knowledge and an engineering approach. What follows is an overview of the pitfalls you are likely to encounter on the way. How is it possible to build binary code into an executable file? How is it possible to inform all instances of the remote program about the update? How do you protect yourself against fictitious updates? Note that this list of questions is far from complete. Ideally, the answers require a separate book. Within the limited space here, it is only possible to briefly outline the problem.

To begin with, it is necessary to note that the concepts of modular and procedural programming (which are impossible to do without nowadays) need certain mechanisms for interprocedural communications. At the least, your procedures must be capable of calling each other (Listing 12.12).

Listing 12.12: Classical method of function calling makes code unrelocatable

 my_func() {        printf("go away\n"); }

What's wrong here? The printf function is outside the my_func function, and its address is not known beforehand. Normally, this problem is solved by the linker; however, you are not going to build it into the self-updating program, are you? Therefore, it is necessary to develop custom mechanisms of importing and exporting all required functions. Don't be afraid! Programming this mechanism is much easier than declaring the intention to do so.

In the simplest case, it would be sufficient to pass the function the pointers to all functions that it requires as arguments. In this case, the function will not be bound to its memory location and will be fully relocatable (Listing 12.13). Global and static variables and constant strings must not be used, because the compiler places them in another section. In addition, it is necessary to make sure that the compiler won't insert any garbage into the code (such as calls to functions that control stack boundaries to eliminate overflow). In most cases, this option can be easily disabled using the command-line options, a detailed description of which must be supplied in the companion documentation for the compiler.

Listing 12.13: Calling functions by pointers passed through arguments ensures the possibility of relocating the code

 my_func(void *f1, void *f2, void *f3, void *f4, void *f5...) {        int (__cdecl *f_1)(int a);        ...        f_1 = (int (__cdecl*)(int))f1;        ...        f_1(0x666); }

Having compiled the resulting file, it is necessary to link it into a 32-bit binary file. Not every linker is capable of doing so, and often the binary code can be cut off from the executable file by any HEX editor available to the user (such as HIEW).

Now you have a ready-to-use update module and have the program to be updated. It only remains to combine them. Because Windows blocks writing to all executable files that are currently being executed, the file cannot update itself. Consequently, this operation must be carried out in several stages. First, the executable file (conventionally designated as file A) renames itself file B (note that Windows doesn't prevent files currently being executed from being renamed ). Then, file B creates its copy under the name file A and adds the update module to its end as an overlay (experienced hackers can correct the value of the ImageSize field). After this, it terminates its execution and passes control to file A, which removes temporary file B from the disk. This is not the only possible method, and, to tell the truth, it is not the best one. However, at first even this method will do.

The topic of distributing updates over the Internet is more urgent. Why not upload updates to a specific server? Why not let remote applications (such as worms) periodically visit it and download the required updates? Well, how long would such a server exist? If it doesn't go down under the onslaught of exponentially self-reproducing worms, it will be closed by an infuriated administrator. It is necessary to proceed in strict accordance to the distributed design.

The simplest algorithm looks as follows: Let every worm save in its body the IP addresses of all machines that it must infect . In this case, all "parents" will know their "children" and the children will remember their parents up to the last generation. However, the opposite statement is not true. "Grandparents" will know only their direct descendants and will have no information about their "grandchildren," provided that they do not establish a direct connection to their grandparents and do not inform them about their addresses. The main goal here is evaluating the intensity of the information exchange to avoid congesting the network. Then, having updated a single worm, you'll be able to access all of the other ones. Note that this situation is hard to control. A distributed-updates system has no single center of coordination and, even if 99.9% of it is destroyed , retains its functionality.

To thwart worms, it is possible to start a kamikaze update, which would automatically destroy all worms that have tried to update it. Therefore, advanced virus writers actively use digital signature mechanisms and asymmetric cryptographic algorithms. If you are too lazy to develop your own custom engine, you can use PGP (because its source code is available).

The most urgent aspect here is being creative and knowing how to use the compiler and debugger. Everything else is a matter of time. Without fresh ideas, the self-modification technique is condemned to extinction . To keep it afloat, it is necessary to find the correct point, to which you should apply your forces, using self-modifying code only where it is useful and helpful.

Notes on Self-Modifying

Self-modifying code is possible to implement only on computers that have von Neumann's architecture (the same memory cells at different time instances can be interpreted both as code and as data).
Representatives of the Pentium family of processors are built according to Harvard architecture (code and data are processed separately). They only emulate von Neumann's architecture, and self-modifying code considerably degrades their performance.
Assembly fans state that Assembly language supports self-modifying code. This is not true. Assembly has no tools for working with self-modifying code except for the DB directive. Such "support," if you could call it that, is also present in C.