1.6. Debugging and Disassembling Assembly Programs

This section is dedicated to Assembly language because debugging and disassembling programs written in this language usually is convenient and easy.

1.6.1. Examples of Code Disassembling

Consider several examples that, in my opinion, will help you quickly master this process.

Searching for Imported Functions

Consider an elementary example program written in Assembly language. The source code of this program is shown in Listing 1.43.

Listing 1.43: An elementary Assembly program

 .586P .MODEL FLAT, STDCALL includelib f:\masm32\lib\user32.lib EXTERN        MessageBoxA@16:NEAR ; Data segment _DATA SEGMENT TEXT1 DB 'No problem!', 0 TEXT2 DB 'Message', 0 _DATA ENDS ; Code segment _TEXT SEGMENT START:         PUSH OFFSET 0         PUSH OFFSET TEXT2         PUSH OFFSET TEXT1         PUSH 0         CALL MessageBoxA@16         RETN _TEXT ENDS END START

The program in Listing 1.43 is a trivial one. Its only goal is to display the MessageBox dialog. To obtain an executable module, issue the following two commands:

       ML  /c /coff  prog.asm       LINK /subsystem:console prog.obj

That you are building a console application doesn't matter in this case. Alternatively, you could use the /subsystem:windows linking options and try to explain the difference in the behavior of both programs.

As a result of translation, you'll obtain the executable file called prog.exe. All of these issues are self-evident for any programmer involved in Assembly programming. However, the disassembling procedure is difficult, even for such a trivial example. For disassembling, it is possible to use any suitable program, such as dumpbin.exe supplied as part of Microsoft Visual Studio .NET. Issue the following command: dumpbin /disasm prog.exe >prog.txt. The contents of the output file, called prog.txt, are shown in Listing 1.44.

Listing 1.44: The result of disassembling the prog.exe program using the dumpbin.exe utility

 Microsoft (R) COFF/PE Dumper Version 7.10.3077 Copyright (C) Microsoft Corporation.  All rights reserved. Dump of file r8.exe File Type: EXECUTABLE IMAGE   00401000: 6A 00              push        0   00401002: 68 0C 30 40 00     push        40300Ch   00401007: 68 00 30 40 00     push        403000h   0040100C: 6A 00              push        0   0040100E: E8 01 00 00 00     call        00401014   00401013: C3                 ret   00401014: FF 25 00 20 40 00  jmp         dword ptr ds:[00402000h]   Summary         1000 .data         1000 .rdata         1000 .text

The dumpbin.exe program turned out to be efficient enough and disassembled this module satisfactorily. On the basis of the disassembled listing, it is easy to recognize the call to the imported MessageBox function. In particular, this follows from the parameter values. After carrying out the dumpbin /rawdata /section:.data prog.exe >prog.txt command, you'll obtain the contents of the .data section, where initialized data must reside (Listing 1.45).

Listing 1.45: The contents of the .data section of the test example in Listing 1.44

 RAW DATA #3 00403000: 4E 6F 20 70 72 6F 62 6C 65 6D 21 00 4D 65 73 73  No problem!.Mess 00403010: 61 67 65 00                                      age.

If you compare the parameter addresses from Listing 1.43 with the data from Listing 1.44, you can make sure that the CALL instruction is the call to the imported MessageBox function.

However, not all questions that arise when viewing Listing 1.44 have been solved. The call is carried out at the address where the JMP command is located. To understand what this means, it is necessary to recall Section 1.5.4, where the import table was considered. I'd like to remind you that the import table is made up of an array of IMAGE_IMPORT_DESCRIPTOR structures (see Listing 1.37). The number of structures in the array equals the number of the DLLs in use. The matter concerns implicit linking. In this structure, there is the FirstThunk field, which must point to the array of the IMAGE_THUNK_DATA32 structures (for every DLL). In essence, these structures are made up of the pointers to the names of imported functions. After loading the executable module, the loader places there the addresses of the functions in the DLL instead of the addresses of the function names. The jmp dword ptr ds:[00402000h] command calls the imported function, the address of which must be located at the 00402000h address. Thus, it is possible to conclude that the 00402000h virtual address is the virtual address of the array element pointed at by the FirstThunk field. If you use the program presented in Appendix 1, you'll be able to obtain the relative virtual address and the offset for the array of IMAGE_THUNK_DATA32 structures (in the program being described, this array is called AdresImpArray). The relative virtual address turns out to equal 2000h. Everything is correct here, because the virtual loading address is 400000h. As relates to the offset, it is 600h. Having obtained this information, you can locate the array of IMAGE_THUNK_DATA32 structures within the prog.exe file. This can be done using the simplest 16-bit hex viewer (for instance, you can use the one that is part of the FAR Manager). As it turns out, the 38 20 00 00 sequence of bytes is located at the 600h address, in other words, the number 2038h. This number is nothing but the relative virtual address (minus 2 bytes) of the name of the imported MessageBox function. In other words, the relative virtual address of the function name and, after loading, of the function as such is 203Ah. Again, you can use the program from Appendix 1 to make sure that everything is correct and that the offset of the function name within the prog.exe file must be located at the 63Ah address. Open the prog.exe file and make sure that the MessageBoxA string is located at this offset.

Perhaps, these considerations seem too complicated and bulky to you. If so, try to draw the same conclusion using the hiew.exe program. This program is one of the best hex editors, indispensable when correcting executable modules. In addition, it provides the possibility of disassembling PE modules. All examples and explanations provided in this book relate to version 6.11 of this program. To proceed, load the prog.exe executable module into hiew.exe. Consider what you'd discover at the 401000H address in the disassembling mode (Listing 1.46).

Listing 1.46: The results produced by hiew.exe when disassembling the prog.exe test program

 .00401000:        6A00               push 00 .00401002:        680C304000         push 000403000C .00401007:        680C304000         push 0004030000 .0040100C:        6A00               push 00 .0040100E:        E801000000         call .000401014 .00401013:        C3                 retn .00401014:        FF2500204000       jmp MessageBoxA

As you can see, hiew.exe is a more advanced program than dumpbin.exe, because it has recognized the call to the MessageBoxA function. On the basis of the jmp command, it is possible to determine the jump address. This is the 402000h address, as should be expected (do not forget how the bytes of integer numbers are stored, and that the first 2 bytes of the command code are the code byte and the MOD R/M byte, as explained in Section 1.4). Now, switch to the hex viewing mode and go to the obtained address. At that address, as expected, the following sequence of bytes is located: 38 20 00 00. This is the 2038h number, representing a relative virtual address. To obtain the virtual address, it is necessary to add the base loading address of the module, which is 400000h. The address of the string that must contain the name of the imported function (which, in this case, is MessageBoxA) is obtained as follows: 400000h + 2038h + 2h = 40203Ah. Go to the obtained address, and you'd discover the required name.

It would be interesting to view what the result would be if you compiled the program using TASM32. To achieve this, replace the MessageBoxA@16 name (Listing 1.43) with MessageBoxA, and the user32.lib import library with the Borland import32.lib library. To compile and link the program, use the following commands:

    tasm32 /ml  prog.asm    tlink32 -ap  prog.obj

After compiling and linking, run hiew.exe and load the prog.exe executable module. Note that Borland's compiler creates larger executable modules than the similar Microsoft compiler. Listing 1.47 shows the disassembled text of the prog.exe module compiled and linked using TASM32. Compare it to the text provided in Listing 1.46. As you can see, the text is practically identical, but the addressing is slightly different.

Listing 1.47: The disassembled text of the prog.exe module compiled and linked using TASM32

 .00401000:        6A00               push 00 .00401002:        680C204000         push 00040200C .00401007:        680C204000         push 0004020000 .0040100C:        6A00               push 00 .0040100E:        E801000000         call .000401014 .00401013:        C3                 retn .00401014:        FF2530304000       jmp MessageBoxA

Go to the 403030h address, which is the address of the array element pointing at the name of the imported function. There you'll find the following sequence of bytes: 44 30 00 00. This means that the name of the imported function must be located at the following address: 40000h + 3044h.

Difficulties with Recognizing Executable Code

Although it might seem that there mustn't be any special problems related to disassembling executable modules written in Assembly languages, some problems still arise.

Consider the following test program (Listing 1.48). First compile it using MASM32.

Listing 1.48: The test Assembly program for illustrating difficulties with disassembling

 .586P .MODEL FLAT, STDCALL includelib f:\masm32\lib\user32.lib EXTERN        MessageBoxA@16:NEAR ; Data segment _DATA SEGMENT TEXT1 DB 'No problem!', 0 TEXT2 DB 'Message', 0 _DATA ENDS ; Code segment _TEXT SEGMENT START:         PUSH OFFSET 0         PUSH OFFSET TEXT2         PUSH OFFSET TEXT1         PUSH 0         CALL MessageBoxA@16         RETN         DB 50 11:         RETN _TEXT ENDS END START

The program in Listing 1.48 appears strange. For example, for what purpose is the 11 label intended if there are no jumps to it? Note that the label will be needed in the future. What purpose does the DB 50/RETN sequence serve if it doesn't execute? All of these issues will be clarified in due order. The main goal of this program is to determine how contemporary disassemblers would react to such a fragment. I assume that all disassemblers would understand the entire code fragment following the first RETN command incorrectly. By the way, how should it be interpreted? This is simply the 32 C3 sequence of bytes that corresponds to the XOR AL, BL command. This assumption turns out to be true, because all disassemblers, including the fabulous IDA Pro, considered the RETN command to be followed by the XOR AL, BL command.

Note

I was truly surprised by the OllyDbg debugger and disassembler. After loading this code, it displayed the DB 50/RETN sequence. I thought that this was mystical, and for a couple of seconds believed in the eminence of this debugger. Then I replaced the byte sequence with a single XOR AL, BL command. The debugger continued to blindly state that this was DB 50/RETN. So, it was a disappointment.

Now, modify this program by a single command: MOV EBX, OFFSET 11 (Listing 1.49). This command is meaningless. However, consider what the popular disassemblers would state.

Listing 1.49: The modified code of the test program shown in Listing 1.48

 .586P .MODEL FLAT, STDCALL includelib f:\masm32\lib\user32.lib EXTERN        MessageBoxA@16:NEAR ; Data segment _DATA SEGMENT TEXT1 DB 'No problem!', 0 TEXT2 DB 'Message', 0 _DATA ENDS ; Code segment _TEXT SEGMENT START:         MOV  EBX, OFFSET 11         PUSH OFFSET START         PUSH OFFSET 0         PUSH OFFSET TEXT2         PUSH OFFSET TEXT1         PUSH 0         CALL MessageBoxA@16         POP  EDX         ADD  EDX, 11 - START         CALL EDX         RETN         DB 50 11:         RETN _TEXT ENDS END START

Now check the compiled code using three different disassemblers. Hiew.exe doesn't notice anything, which means that its interpretation of the code that follows the RETN command didn't change. The respectable W32Dasm program behaves the same way. IDA Pro (admittedly a superior product) reacts to the new command immediately. The fragment of the disassembled listing produced by IDA Pro is shown in Listing 1.50.

Listing 1.50: A fragment of the disassembled text produced by IDA Pro

 .text=00401000 ; ------- S U B R O U T I N E --------------------------- .text:00401000 .text:00401000 .text:00401000          public start .text:00401000 start    proc near               ; DATA XREF: start + 5↓o .text:00401000          mov     ebx, offset nullsub_1 .text:00401005          push    offset start .text:0040100A          push    0               ; uType .text:0040100C          push    offset Caption  ; lpCaption .text:00401011          push    offset Text     ; lpText .text:00401016          push    0               ; hWnd .text:00401018          call    MessageBoxA .text:0040101D          pop     edx .text:0040101E          add     edx, 28h .text:00401024          call    edx .text:00401026          retn .text:00401026 start    endp .text:00401026 .text:00401026 ;---------------------------------------------------------------- .text:00401027           db 32h .text:00401028 ; [00000001 BYTES: COLLAPSED FUNCTION nullsub_1. PRESS KEYPAD "+" .text:00401028 ; TO EXPAND]

Note how the MOV EBX, OFFSET 11 command has been disassembled. The nullsub_1 name means that this label points at the procedure comprising only one RETN command — a blank procedure (null). The comment inserted by the 00401028 address means that the procedure is collapsed. To expand the procedure (in other words, to view its text), it is enough to press the <+> key on the numeric keypad. In this case, the expanded procedure contains only one command— RETN.

Thus, IDA Pro has separated the husk from the grain. In other words, it has separated the RETN command from the 32H code. Is this an advantage or is it a drawback? Are you surprised by this question? Assume that the source code contained simply a MOV EBX, N command, where N is some number. This number would happen to fall into some address range; however, it isn't an address of any command. Nevertheless, the disassembler would conclude that a procedure is located at this address. Such errors are not serious because there are no jumps to this address. There are also no jumps to the window procedure; however, in that case the address is determined by the call of one of the API functions (see Section 1.3).

Anyway, such erroneous detection of a procedure doesn't imply serious complications. However, if this turns out to be an address of some command, to which there will later be a "secret" jump (secret jumps will be covered further on in more detail), this might be helpful for the purposes of analyzing the code. Developers of IDA Pro were thinking logically when they considered that a number that has fallen into the range of command addresses is likely to represent an address. In my opinion, this was a correct choice.

Now continue the empirical investigation. This time, replace the MOV EBX, OFFSET 11 command with the CALL 11 command. How would the most popular disassemblers handle this situation? IDA Pro tracks the procedure address and marks it in the listing. Hiew.exe still doesn't recognize a procedure, although it displays the CALL command. There is no reason to expect anything different from it, because its main goal is not disassembling. As relates to W32Dasm, this time this disassembler has put on a good show. Listing 1.51 shows a fragment of the listing produced by this program.

Listing 1.51: A fragment of the disassembled listing produced by W32Dasm

 //******************** Program Entry Point ******** :00401000 E823000000              call 00401028 :00401005 6800104000              push 00401000 :0040100A 6A00                    push 00000000 * Possible StringData Ref from Data Obj ->"Message"                                   | :0040100C 680C304000              push 0040300C * Possible StringData Ref from Data Obj ->"No problem!"                                   | :00401011 6800304000              push 00403000 :00401016 6A00                    push 00000000 * Reference To: user32.MessageBoxA, Ord:019Dh                                   | :00401018 E80D000000              Call 0040102A :0040101D 5A                      pop edx :0040101E 81C228000000            add edx, 00000028 :00401024 FFD2                    call edx :00401026 C3                      ret :00401027 32                      BYTE 32h * Referenced by a CALL at Address: |:00401000 | :00401028 C3                      ret

As you can see, W32Dasm recognizes the 0040l028h address as a procedure address (Referenced by a CALL at Address 00401000).

Thus, the material provided in this section demonstrates that there are certain difficulties with disassembling code written in Assembly programming language. No disassembler is capable of exhaustively analyzing the code, so human investigators won't remain jobless.

Secret Jumps and Secrets of Jumps

I'd like to explain secret jumps. There are the following widely used commands for passing control: JMP and the group of conditional jumps, such as JXX, CALL, RETN, and LOOP. At the same time, jump commands can imitate different jump commands from the same group. The only reason such a programming style might be used is to confuse potential investigators of the program code. This section will cover this topic to help you take countermeasures against such tricks.

Consider the JMP command. This is the simplest command from the preceding list, provided that the jumps are considered within the framework of the flat memory model. At first glance, everything is clear and straightforward. The command carries out the jump to the specified address. When this happens, the contents of all registers (except for EIP) don't change. However, in addition to the standard jumps such as JMP 11 (where 11 is simply a label), there are indirect jumps:

jmp dword ptr [1o]—The lo variable specifies some jump address.
JMP EBX — The EBX register contains the jump address.
JMP DWORD PTR [EBX] — The EBX register contains the address of some variable which, in turn, contains the jump address.

For example, what should you do if you see the JMP EAX command but do not know what is contained in the EAX register? This content might be formed several hundred commands from the given command. In such a situation, no disassembler would help you. There are only two ways out: Manually analyze the text of the disassembled program, or resort to the debugger. After you finally determine the address contained in the required register, you'll be able to use the disassembler and insert a comment, specifying this value. Most contemporary disassemblers have already implemented this function. However, I'm not going to rush forward. In Chapter 2, when considering contemporary disassemblers, I'll cover this functional capability in more detail. For the moment, the most important goal is to understand the essence of the problem and find approaches to solving it.

However, the problem being considered is complicated because any of the previously-listed commands can "masquerade" as a different command. For example, the LOOP command might play the role of a near jump (127 bytes forward or 128 bytes back) instead of providing evidence of the presence of a loop.

Here are several examples. For instance, consider the program presented in Listing 1.52.

Listing 1.52: An example demonstrating nonstandard use of the RET command

 .586P .MODEL FLAT, STDCALL includelib f:\masm32\lib\user32.lib EXTERN        MessageBoxA@16:NEAR _DATA SEGMENT ; The address is stored here. meml  DD OFFSET 12 TEXT1 DB 'No problem!', 0 TEXT2 DB 'Message', 0 _DATA ENDS _TEXT SEGMENT START:         MOV  EAX, meml ; The next two commands are equivalent to JMP 12.         PUSH EAX         RETN 11:         RETN 12:         PUSH OFFSET 0         PUSH OFFSET TEXT2         PUSH OFFSET TEXT1         PUSH 0         CALL MessageBoxA@16         RETN _TEXT ENDS END START

The program shown in Listing 1.52 demonstrates nonstandard use of the RET command. The PUSH/RET combination of commands is equivalent to the JMP command. Furthermore, it is possible to invent even trickier code; for example, consider Listing 1.53.

Listing 1.53: A fragment of code demonstrating another imitation of the JMP command

 MOV  EAX, meml SUB  ESP, 4 MOV  DWORD PTR [ESP], EAX RETN

As a result, a simple jump to the 12 address takes place. At the same time, commands might be mixed with other commands, in which case it would be difficult to determine correctly, to which location the jump actually takes place. The main idea here is that the tricks with jumps to addresses stored in the stack can be complicated indefinitely because it is possible to place an arbitrary number of jump addresses into the stack. Furthermore, the commands can be mixed in any order. Assembly language provides unlimited possibilities in this respect.

The situation with conditional jumps is similar. It often is impossible to determine whether the jump condition is satisfied by analyzing the code. As a result, it becomes unclear, to which branch of the program control would be passed, and it becomes difficult even to determine whether any of the branches would be executed. For example, consider the sequences of commands provided in Listing 1.54.

Listing 1.54: A sequence of commands, for which it is hard to guess where the jump takes place

 ... CMP EAX, 100 JA 11 ... 11:

For such code fragments, it is difficult to determine where the jump takes place. This is because it is difficult to track what might be contained in the EAX register.

In essence, the JA command in Listing 1.54 might play the role of the JMP command, because in practice the number contained in the EAX register might be greater than 100, in which case the program fragment that follows the JA command has no practical meaning. Only a debugger might be of any help. Nevertheless, even the debugger cannot ensure the necessary results, because there is always a nonzero probability of program execution going the other way.

The technique I will describe now is called code overlapping. The main essence of this technique is as follows: Part of the command code might become a standalone command, the meaning of which is often difficult to guess. Consider the program fragment presented in Listing 1.55.

Listing 1.55: An example illustrating the code overlapping technique

 MOV  AX, 015EBH JMP  $ - 2 PUSH OFFSET 0 PUSH OFFSET TEXT2 PUSH OFFSET TEXT1 PUSH 0 CALL MessageBoxA@16 11: RETN

Guessing that the 015EBH code is simply JMP SHORT 11 and that the MOV AX, 015EBH command simply disguises this jump to the 11 label is not a trivial task.

Using Debug Information

The previous section explained the possibilities of confusing potential code investigators, in other words, protecting the program from anyone who would analyze it with malicious intentions. There also is the reverse of the coin. Often, the developer must disassemble his or her own program to understand how it works and to eliminate implementation errors and bugs. For this purpose, debug info is often used (see Section 1.5.7).

Most contemporary debuggers and disassemblers interpret the debug info well and are capable of correctly reconstructing the program being investigated. Assembly language, unfortunately, uses the debug info inefficiently and mainly relates to the variable names. Principally, variable names are satisfactorily identified by disassemblers, such as IDA Pro. The only fault of IDA Pro is that it cannot determine the true name of a variable if the module doesn't contain the debug info.

To include the debug info when translating a program using MASM32, it is necessary to include the /Zi command-line option in the ml.exe command line and use the /DEBUG command-line option in the link.exe command line. The debug info is added into the file that has the same name as the executable module and the PDB file name extension (PDB stands for program database). It is also possible to use the /PDB:NONE command-line options, in which case the debug info will be placed into the executable module. Finally, it is possible to specify the type of the debug info, such as /DEBUGTYPE: {CV|COFF}, where CV designates the debug info intended for the CodeView debugger and COFF stands for the debug info in COFF. Similarly, when using the TASM32 assembler, the debug info can be included in the executable module. To achieve this, the tasm32.exe command line must include the /zi command-line option (include all debug info), and the tlink32.exe must include the /v command-line option. If these requirements have been observed, all information related to variables and operations over them will be placed into the executable module. Later, this information will be available to disassemblers and debuggers. Note that all information will be stored, even information about variables that are not used in the program.

1.6.2. About Dynamic Modification of the Executable Code

On one hand, self-modifying code doesn't correspond to the "code and data" programming paradigm, according to which the program is made up of the code that must be executed and the data that must be read and, if necessary, modified. On the other hand, there exists the Von Neumann principle, the rough interpretation of which doesn't make any principal difference between the data and the code. According to this interpretation, both the code and the data are simply sequences of bytes or bits (according to your preference). Therefore, dynamic code modification is an excellent technique that allows you to disguise the intentions of the program.

Programmers who have experience with MS-DOS programming know that code modification during its execution is a simple matter. Under MS-DOS, it is possible to modify the content of any memory cell, no matter what is contained there — the code or the data. Under Windows, code cannot be modified directly. Also, it is impossible to execute code located in the data segment or in the dynamic memory area. To obtain the possibility of doing so, the program must run in ring 0. Thus, for a normal program, all possibilities of dynamic code modification are prohibited. However, there are several ways out, which will be covered in this section.

Execution in the Stack

Code execution in the stack is probably the best method of self-modification that a program can implement. Memory pages allocated for the stack have attributes that allow reading and writing of data from and to the stack and even allow code to be executed there. The code can be modified as it is moved. Finally, Assembly commands can be stored in the data segment then moved to the stack and executed there. High-level programming languages allow you to use the stack, although with several limitations — sometimes considerable. Assembly language freely allows you to use the stack without encountering any serious difficulties. However, some problems can arise even here.

For instance, consider a simple console application (Listing 1.56).

Listing 1.56: A simple console application intended for investigating the code self-modification problem

 .586P .MODEL FLAT, STDCALL includelib f:\masm32\lib\user32.lib EXTERN MessageBoxA@16:NEAR ;----------------------------------------------- _DATA SEGMENT TEXT1 DB 'I am in the stack!', 0 TEXT2 DB 'Message from the stack', 0 _DATA ENDS _TEXT SEGMENT START: ; Call a procedure       CALL PROC1       RETN       PROC1 PROC       PUSH 0       PUSH OFFSET TEXT2       PUSH OFFSET TEXT1       PUSH 0       CALL MessageBoxA@16       RETN PROC1 ENDP _TEXT ENDS END START

Name this program prog.asm. To compile and link it, issue the following commands:

    ML /c /coff progl    LINK /SUBSYSTEM:CONSOLE progl.obj

As a result, the prog.exe executable module will appear, which would display MessageBox with a corresponding message when it is started for execution.

Now, try to launch a frontal attack at the problem. Copy the contents of the PROC1 procedure into the stack and try to run the procedure there. The program that illustrates this approach is shown in Listing 1.57.

Listing 1.57: A program that copies the contents of PROC1 into the stack and tries to run it there

 .586P .MODEL FLAT, STDCALL includelib f:\masm32\lib\user32.lib EXTERN MessageBoxA@16:NEAR ;---------------------------------------------- _DATA SEGMENT TEXT1 DB 'I'm in the stack!', 0 TEXT2 DB 'Message from the stack', 0 _DATA ENDS _TEXT SEGMENT START: ; Prepare the stack.       MOV EDX, ESP       MOV ECX, OFFSET L1       SUB ECX, PROC1 ; Allocate space in the stack.       SUB ESP, ECX ; Copy the code into the allocated space.       MOV EDI, ESP       LEA ESI, PROC1       CLD       REP MOVSB ; Call the procedure from the stack.       CALL ESP ; Restore the stack.       MOV ESP, EDX       RETN       PROC1 PROC       PUSH 0       PUSH OFFSET TEXT2       PUSH OFFSET TEXT1       PUSH 0       CALL MessageBoxA@16       RETN PROC1 ENDP _TEXT ENDS END START

The result will be disappointing. When you start this program for execution, the operating system would display an error message. Using the OllyDbg debugger, try to find out why this happens. Start the program under the debugger, and execute it in step-by-step mode. Having reached the CALL ESP command, press the <F7> key. You'll find yourself in the stack location, where the procedure was copied. At first glance, it seems that the code has been copied correctly (Listing 1.58).

Listing 1.58: The stack location, to which the code of the PROC1 procedure has been copied

 000CFFB0        6A        00              PUSH        0 000CFFB2        68        0B304000        PUSH        40300B 000CFFB7        68        00304000        PUSH        403000 000CFFBC        6A        00              PUSH        0 000CFFBE        E8        02000000        CALL        000CFFC5 000CFFC3        C3                        RETN

However, the address at which the procedure was called, also resides in the stack. Is it possible to find any jump to MessageBox? Everything is straightforward. In the CALL MessageBoxA@16 command, the Assembly translator substitutes relative addresses. This is the cause of the problem! What could be done about it? Do you really need to correct the address when moving the code to the stack? Fortunately, there is another way of calling the procedure. This call appears as follows: LEA EBX, MessageBoxA@16/CALL EBX. Check whether this works by rewriting the program (Listing 1.59).

Listing 1.59: A modified version of the program presented in Listing 1.57

 .586P .MODEL FLAT, STDCALL includelib f:\masm32\lib\user32.lib EXTERN MessageBoxA@16:NEAR ;---------------------------------------------- _DATA SEGMENT TEXT1 DB 'I'm in the stack!', 0 TEXT2 DB 'Message from the stack', 0 _DATA ENDS _TEXT SEGMENT START: ; Prepare the stack.       MOV EBP, ESP       MOV ECX, OFFSET L1       SUB ECX, PROC1 ; Allocate space in the stack.       SUB ESP, ECX ; Copy the code.       MOV EDI, ESP       LEA ESI, PROC1       CLD       REP MOVSB ; Call the procedure from the stack.       CALL ESP ; Restore the stack       MOV ESP, EBP       RETN       PROC1 PROC       PUSH 0       PUSH OFFSET TEXT2       PUSH OFFSET TEXT1       PUSH 0       LEA EBX, MessageBoxA@16       CALL EBX       RETN PROC1 ENDP L1: _TEXT ENDS END START

Translate and run the program. This time there is no error; however, the MessageBox looks somewhat crippled. To be more precise, it doesn't contain any text. An attempt at running the program under the debugger doesn't provide any positive result. Although, this time it is possible to make sure that the call to MessageBox is carried out at the correct address. What's wrong? Conduct the following experiment. Replace the CALL ESP command with CALL PROC1; in other words, check whether the procedure as such would execute. Strangely, the result will be the same. What could be the cause of this error? Because the procedure executed correctly earlier, try to remove commands for copying the procedure into the stack, removing one command at a time. This will help you detect, which command produces the error. As it turns out, this is the SUB ESP, ECX command. At this point, some suspicions should arise. What's wrong with this command? Such commands are widely and extensively used by all assemblers and compilers. The value stored in ECX is not large enough to go beyond the stack boundaries, and even if this happened it would cause a different error. After some consideration, the following idea comes to mind: The address in the stack must be a multiple of four. In the program under consideration, this requirement has not been met. Try to correct the contents of ECX before subtracting it from ESP. There are different methods of achieving this goal. For instance, this might be done as follows: SHL ECX, 2. In other words, multiply the ECX content by four. The same result might be obtained as follows (if you consider four to be too large): AND ECX, FFFFFFFCH/SHL ECX, 1. In both cases, the result will be positive, because the code copied into the stack will work correctly. However, the simplest way of correcting this program is using the ALIGN 4 directive to ensure that the addresses of the PROC1 procedure and the L1 label are aligned by a double word. The final version of the program that copies the procedure into the stack and executes it there appears as in Listing 1.60.

Listing 1.60: The final version that copies the procedure code into the stack and executes it there

 .586P .MODEL FLAT, STDCALL includelib f:\masm32\lib\user32.lib EXTERN MessageBoxA@16:NEAR ;---------------------------------------------- _DATA SEGMENT TEXT1 DB 'I'm in the stack!', 0 TEXT2 DB 'Message from the stack', 0 _DATA ENDS _TEXT SEGMENT START: ; Prepare the stack.       MOV EBP, ESP       MOV ECX, OFFSET L1       SUB ECX, PROC1 ; Allocate the space in the stack.       SUB ESP, ECX ; Copy the code into the stack.       MOV EDI, ESP       LEA ESI, PROC1       CLD       REP MOVSB ; Call the procedure from the stack.       CALL ESP ; Restore the stack.       MOV ESP, EBP       RETN       ALIGN 4       PROC1 PROC       PUSH 0       PUSH OFFSET TEXT2       PUSH OFFSET TEXT1       PUSH 0       LEA EBX, MessageBoxA@16       CALL EBX       RETN       PROC1 ENDP       ALIGN 4 L1: _TEXT ENDS END START

Thus, everything is straightforward — provided that you follow some simple rules: Procedures must be called through a register, and code must be aligned by the 4-byte boundary.

However, there is another problem. What should you do with jumps? If a jump uses a 4-byte address that must be stored in relocatable fragment, then the code copied into the stack won't work correctly. This problem also has a simple solution: All such jumps must be short jumps. No special steps must be taken, because the assembler automatically makes all jumps short if they are carried out within the range of 128 bytes. You'll only need to ensure that all required jumps and procedure calls fall within this interval.

Using the WriteProcessMemory Function

Another method of modifying the code dynamically at run time is to use the WriteProcessMemory API function. Using this function, it is possible to write the data into the process address space. The area, into which it is necessary to write the data, must be available for writing; otherwise, the write operation won't be carried out and the function would return a nonzero value (in a successful write operation, the function returns zero). Consider parameters of this function in more detail.

Parameter 1 — This is the descriptor of the process into whose address space the function is going to write the data.
Parameter 2 — This is the address of the process memory, into which the function is going to write.
Parameter 3 — This is the pointer to the data buffer, from which the data will be written into the process memory.
Parameter 4 — This is the number of bytes that will be written into the process memory.
Parameter 5 — This is the pointer to the variable that will store the number of bytes written into the process memory. If this parameter is zero, it will be ignored.

As already mentioned, before writing anything into the process memory, it is necessary to obtain the process descriptor. To achieve this, it is enough to open the process using the OpenProcess function. This function is used any time some other function requires the descriptor of the process to execute. Consider the parameters of this function:

Parameter 1 — This is the desired level of access to the process. All access levels are mapped to constants and listed in the documentation and header files. The names of these constants start with the PROCESS_ prefix. For writing into the process memory, the combination of the following two constants is needed: PROCESS_VM_OPERATION and PROCESS_VM_WRITE.
Parameter 2 — This parameter can take two values. If this parameter is set to one, then the descriptor can be inherited; otherwise (the parameter is set to zero), the descriptor cannot be inherited.
Parameter 3 — This is the identifier of the process that you need to open.

Finally, it is necessary to describe how the process identifier can be obtained. Because you are studying the task of writing into your own code, it is possible to use the GetCurrentProcessId API function. This function doesn't require any parameter and returns the identifier of the calling process.

An example of a self-modifying program is shown in Listing 1.61. This console program writes the C3H code at the RETE address. If this hasn't been done, the program will fall into an endless loop and will never complete its execution without external influence.

Listing 1.61: An example of a self-modifying program that uses the WriteProcessMemory function

 .586P .MODEL FLAT, STDCALL PROCESS_VM_OPERATION    =        0008H PROCESS_VM_WRITE        =        0020H PROCESS_VM_OW           =        PROCESS_VM_OPERATION OR PROCESS_VM_WRITE includelib f:\masm32\lib\user32.lib includelib f:\masm32\lib\kernel32.lib EXTERN OpenProcess@12:NEAR EXTERN WriteProcessMemory@20:NEAR EXTERN GetCurrentProcessId@0:NEAR ;----------------------------------------------- _DATA SEGMENT OPC        DB OC3H _DATA ENDS _TEXT SEGMENT START:         CALL GetCurrentProcessId@0 ; EAX contains the identifier of the current process.         PUSH EAX         PUSH 1         PUSH PROCESS_VM_OW         CALL OpenProcess@12 ; EAX contains the descriptor of the opened process.         PUSH 0         PUSH 1         PUSH OFFSET OPC         PUSH OFFSET RETE         PUSH EAX         CALL WriteProcessMemory@20 RETE:         JMP  RETE         RETN _TEXT ENDS END START

Note

After the descriptor of some object has been used, it is necessary to close it using the CloseHandle function. In the preceding example, the system closes all handles automatically.

The use of the WriteProcessMemory function is characterized by certain drawbacks compared with code execution in the stack. First, this function corrects the code of the current process; however, it cannot increase the memory size to add new code. Furthermore, code execution in the stack is stealthier than the use of the WriteProcessMemory function, which can be easily detected by any literate code digger.

Using the VirtualProtectEx Function

Instead of writing into the process memory using the WriteProcessMemory function, it is possible to use the VirtualProtectEx API function to allow access to the required bytes (or pages where the required bytes reside) then use the normal MOV command.

Listing 1.62 presents a program similar to the one shown in Listing 1.61. The difference between these programs is that the program in Listing 1.62 uses the VirtualProtectEx function. Like the previous example, the C3H byte is written at the RETE address; however, this time this goal is achieved using a simple MOV command.

Listing 1.62: An example of self-modifying code that uses the VirtualProtectEx command

 .586P .MODEL FLAT, STDCALL PROCESS_VM_OPERATION    = 0008H PROCESS_VM_WRITE        = 0020H PROCESS_VM_OW           = PROCESS_VM_OPERATION OR PROCESS_VM_WRITE PAGE_WRITECOPY          = 8 PAGE_EXECUTE            = 10h includelib f:\masm32\lib\user32.lib includelib f:\masm32\lib\kernel32.lib ; Imported functions EXTERN OpenProcess@12:NEAR EXTERN FlushInstructionCache@12:NEAR EXTERN VirtualProtectEx@20:NEAR EXTERN GetCurrentProcessId@0:NEAR :-------------------------------------------------- _DATA SEGMENT HANDLE  DD ? NN      DD ? _DATA ENDS _TEXT SEGMENT START:         CALL GetCurrentProcessId@0 ; Open the current process.         PUSH EAX         PUSH 1         PUSH PROCESS_VM_OW         CALL OpenProcess@12 ; Allow copying of the byte at the RETE address.         MOV  HANDLE, EAX         PUSH OFFSET NN         PUSH PAGE_WRITECOPY         PUSH 1         PUSH OFFSET RETE         PUSH EAX         CALL VirtualProtectEx@20 ; Change the byte at the RETE address.         LEA  EAX, RETE         MOV  BYTE PTR [EAX], OC3H ; Return the initial attribute to the byte.         PUSH OFFSET NN         PUSH PAGE_EXECUTE         PUSH 1         PUSH OFFSET RETE         PUSH HANDLE         CALL VirtualProtectEx@20 ; Flush the cache.         PUSH 1         PUSH OFFSET RETE         PUSH HANDLE         CALL FlushInstructionCache@12 RETE:         JMP  RETE         RETN _TEXT ENDS END START

Consider the parameters accepted by the VirtualProtectEx function:

Parameter 1 — Handle of the process whose memory has to be modified.
Parameter 2 — Address of the memory region whose attribute is going to be modified.
Parameter 3 — Size of the memory region to be modified. The attribute is changed for all memory pages containing the bytes of the memory region to be modified.
Parameter 4 — Set of attributes (see Listing 1.62).
Parameter 5 — Address of the variable that will store the old attribute of the first of the range of pages (if there are several pages).

In addition, the program contains a function that wasn't described earlier. This is the FlushInstructionCache function. It is needed to flush the buffer containing commands. If this has not been done, the processor will probably used old commands for execution without noticing any changes introduced into the memory. The parameters of this function are as follows.

Parameter 1 — Descriptor of the process whose memory is going to be changed.
Parameter 2 — Address of the memory region that has been changed.
Parameter 3 — Size of the modified memory region.

At this point, coverage of self-modifying code has been completed. Do not forget about this capability when starting code analysis.