Understanding the program structure of the executable module is often more important than recognizing variables, because it allows you to understand the program's operating logic.
You have already encountered procedures and functions [7] many times. The main goal of this section is to generalize accumulated experience and investigate new features.
Until now, it was silently assumed that data are passed to the procedure through the stack. This mechanism, which will be considered in the next section, is common. However, this approach is not the only available one.
For the moment, abstract from compilers and simply consider how and in which way it is possible to pass parameters to the procedure. If you are working in Assembly, you'll be able to add all of these mechanisms to your arsenal. Furthermore, nothing can prevent you from combining several such mechanisms simultaneously. However, when working with compilers created for high-level programming languages, it is necessary to account for generally-adopted conventions, which will be covered in the next few sections.
Passing parameters through the stack is the most common and widely used mechanism. This approach allows you to create recursive procedures, but the use of other approaches makes recursion problematic. As a rule, parameters are placed into the stack using the PUSH commands. However, another method is possible, which you have encountered multiple times. It is possible to manually change the value of the stack pointer and then use normal MOV commands to place the parameters into the allocated region. For example, if two parameters are loaded into the EAX and EBX registers, respectively, then it is possible to place them into the stack using the following sequence of commands: SUB ESP, 8/MOV [ESP], EAX/MOV [ESP], EBX. This is equivalent to the two PUSH EAX/PUSH EBX commands (recall that the stack grows upward in the direction of smaller addresses).
When passing parameters, the most important issue is the order, in which the parameters appear in the stack. When receiving parameters from the stack, the called procedure follows a strictly defined order, which must be observed when calling that procedure. However, this is only one problem. The second problem is clearing the stack. After the called procedure has executed all required operations and returned control into the calling program fragment, the parameters passed to the procedure remain in the stack. If the procedure is called multiple times, this, in the long run, might crash the program. There are two practical approaches to solving this problem. The first method is used only in the C++ programming language. Using this method, the stack is released after the return from the called procedure. It is convenient because it is possible to use procedures with a variable number of parameters.
Note | The printf standard C library function is an example of such a procedure. The first parameter of this function is always a string that might contain special substrings (called format specifiers). Format specifiers start with the % character. The number of such specifiers is equal to the number of additional parameters of the printf function. |
As a rule, the stack is restored using the ADD ESP, 4*N command, where N is the number of 32-bit parameters. [8] However, alternative ways, such as using SUB ESP, -4*N or even PoP commands, are possible. It is important to understand their meaning. Sometimes the compiler, for economy, restores the stack after calling several procedures.
The second method of stack recovery consists of using the RETN 4*N command immediately after exiting the procedure. Again, N specifies the number of 32-bit parameters. This approach was initially used in Pascal compilers. This approach is slightly faster. However, it makes it problematic to call a procedure with a variable number of parameters.
The use of global variables for passing information into a procedure suggests itself. However, this approach is a persistent source of headaches. To avoid errors, you'll have to allocate an individual set of global parameters for every procedure, which requires additional memory resources. The use of global variables also makes recursive calls problematic, because you won't be able to use the same variables if they are already in use. However, this drawback doesn't mean that this approach is not used. Nothing prevents you from using it when writing a program in C++ or Delphi.
The preceding approach can be improved by using a specially organized memory block for passing parameters. You'll probably have to organize such a block individually for each procedure, although, in theory, it is possible to create a structure of universal buffer for passing parameters to all called procedures. The structure of such a buffer can be organized to make it possible to use recursive procedure calls.
Passing parameters through program code looks somewhat exotic. However, it is a realistic method, provided that you use Assembly language. For example, consider the algorithm in Listing 3.45.
Listing 3.45: Algorithm for passing parameters through the program code
... CALL PROC1 DB "This parameter is passed through the program code", 0 ; The PROC1 procedure will return control here. ... PROC1 PROC ; Pop the return address from the stack. ; Define the parameter addresses and length. ; Modify the return address in the stack. ; Process. Return from the procedure. RETN PROC1 ENDP
As you can see, this method doesn't contain anything too difficult or impossible. However, its implementation in a high-level programming language requires additional effort.
The method of passing parameters through registers is fast. However, it has certain limitations, because registers are few. This approach is mainly used with other mechanisms, such as passing parameters through the stack. When using this combined approach, the first parameters are usually passed through registers and the remaining parameters are passed through the stack. Later in this chapter, this approach will be covered in more detail.
Consider compilers of high-level programming languages. As you would expect, they mainly pass parameters through the stack. The main calling conventions used by contemporary compilers are listed in Table 3.2.
Calling convention | Order of parameters | Stack-clearing method | Comment |
---|---|---|---|
C convention (__cdecl) | From right to left | By the calling program | The compiler automatically inserts the underscore character (_) before the function name. |
Standard calling convention (__stdcall) | From right to left | By the called procedure | The compiler automatically inserts the underscore character (_) before the function name. The function name is terminated by the @ suffix followed by the number specifying the total length of all parameters (in bytes). |
Pascal calling convention (PASCAL) | From left to right | By the called procedure | This calling convention is used in Pascal and Delphi. |
Fast calling convention, also known as register call (__fastcall) | From left to right | By the called procedure | Microsoft's C++ compiler employs two registers (ECX and EDX). If this is not enough for passing all parameters, then the remaining parameters are passed through the stack. The Borland C++ compiler uses three registers (EAX, EDX, and ECX). |
Note | The calling conventions listed in Table 3.2 are not the only available ones. In different programming languages, there are language-specific conventions. For example, Delphi supports the safecall convention, and Basic has its individual calling convention. Some calling conventions have been gradually moved out of use. For example, the Pascal (__pascal) calling convention is no longer supported in Microsoft Visual C++. |
When writing programs in C++, the most common calling conventions are __cdecl (when working with normal and library functions) and __stdcall (when calling most API functions).
As an illustration of the use of register calling conventions (fast function calls), consider the simple program shown in Listing 3.46.
Listing 3.46: Simple program illustrating the use of the__fastcall calling convention
#include <stdio.h> int __fastcall add(int, int, int); void main() { int i = 10, j = 20, k = 30; printf("%d\n", add(i, j, k)); }; int__fastcall add(int a, int b, int c) { return a + b + c; };
As you can see, the program in Listing 3.46 contains a function declared as __fastcall. First, consider the disassembled text of the executable code of this program produced by the Microsoft Visual C++ compiler (Listing 3.47).
Listing 3.47: Disassembled text (Listing 3.46) produced by Microsoft Visual C++
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 var_C = dword ptr -OCh .text:00401000 var_8 = dword ptr -8 .text:00401000 var_4 = dword ptr -4 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, 0Ch .text:00401006 mov [ebp + var_4], 0Ah .text:0040100D mov [ebp + var_C], 14h .text:00401014 mov [ebp + var_8], lEh .text:0040101B mov eax, [ebp + var_8] .text:0040101E push eax .text:0040101F mov edx, [ebp + var_C] .text:00401022 mov ecx, [ebp + var_4] .text:00401025 call sub_401040 .text:0040102A push eax .text:0040102B push offset unk_4060FC .text:00401030 call _printf .text:00401035 add esp, 8 .text:00401038 xor eax, eax .text:0040103A mov esp, ebp .text:0040103C pop ebp .text:0040103D retn .text:0040103D_main endp
The code presented in this listing is well known. However, there is one issue that you did not encounter earlier. According to the program (see Listing 3.46), the add function must have three parameters. Obviously, sub_401040 corresponds to the add function. Later, the mov eax, [ebp + var_8] /push eax commands send the last variable into the stack (this is the k variable). The values of the i and j variables are placed into the ECX and EDX registers, respectively. This corresponds to the fastcall calling convention typical for the Microsoft Visual C++ compiler. The documentation supplied with the compiler states that it uses the __fastcall calling convention whenever possible. As you can see, this is true. If the number of parameters is increased, then the compiler will pass the remaining parameters in a normal way, namely, through the stack. This can be easily explained because the procedure that will be called also needs registers. So, as the number of parameters is increased, the number of general-purpose registers will not be enough and it will be necessary to create local stack variables.
Listing 3.48 presents the disassembled code of the same program compiled using the Borland C++ compiler.
Listing 3.48: Disassembled code (Listing 3.46) compiled using the Borland C++ compiler
.text:00401108 _main proc near ; DATA XREF: .data:0040A0B8;↓o .text:00401108 argc = dword ptr l0h .text:00401108 argv = dword ptr 14h .text:00401108 envp = dword ptr 18h .text:00401108 push ebx .text:00401109 push esi .text:0040110A push edi .text:0040110B mov ebx, OAh .text:00401110 mov esi, 14h .text:00401115 mov edi, 1Eh .text:0040111A mov ecx, edi .text:0040111C mov edx, esi .text:0040111E mov eax, ebx .text:00401120 call sub_401138 .text:00401125 push eax .text:00401126 push offset format ; Format .text:0040112B call _printf .text:00401130 add esp, 8 .text:00401133 pop edi .text:00401134 pop esi .text:00401135 pop ebx .text:00401136 retn .text:00401136 _main endp
As you can see from Listing 3.48, the Borland C++ compiler sends parameters sequentially into the EAX, EDX, and ECX registers. Note that the Borland's compiler uses register variables in the EBX, EST, and EDI registers instead of stack variables. In contrast to the Microsoft Visual C++ compiler, the Borland C++ compiler is serious about the __fastcall modifier and doesn't neglect the instruction for using registers as the number of parameters increases.
Throughout this chapter, I have provided lots of different listings, in which I try to draw your attention to the locations of the return address, parameters, and local and temporary variables within the stack. The main goal of this section is to generalize accumulated experience and supply new information.
The standard stack structure in the course of a procedure call is shown in Fig. 3.2. This illustration shows the stages that the stack undergoes. The process of stack modification starts from the procedure call (stages 1–3), during which the parameters are placed into the stack and the procedure is called. During stages 4–5, memory is allocated and the registers that will be used within the procedure, whose values must not be changed after the call, are saved into the stack.
Figure 3.2: Standard stack structure in the course of a procedure call
Consider the stages shown in Fig. 3.2 in more detail:
Usually, parameters are placed into the stack using reg32 or PUSH DWORD PTR mem commands, where reg32 is a 32-bit register and mem is the address of the memory area (direct or indirect). However, another method of placing parameters into the stack is possible. First, the area for parameters is allocated in the stack. This can be carried out, for example, as follows: SUB ESP, N. Here, Nn is the number of bytes required for storing parameters, aligned by the 4-byte boundary. Then, the parameters are loaded into the stack using standard MOV commands. For example, this task can be carried out as follows: MOV DWORD PTR [ESP], EAX/MOV DWORD PTR [ESP + 41, EBX, etc. When dealing with double operands (which are 8 bytes in size), the ESTP command is used to place them into the stack, for example: FSTP DWORD PTR [ESP]. Thus, 8 bytes from the ST(0) FPU register will be sent to the stack (see Listing 3.12 and the comments that follow it).
The CALL command places the return address into the stack directly after parameters (if there are any). To correctly return from the procedure, this address must be located on the top of the stack. In addition, the CALL command jumps to the address specified to it. Now all work related to stack modification is delegated to the procedure. As a rule, the procedure starts with the PUSH EBP command. This command immediately assumes further use of EBP, and this register probably will be used for addressing the stack variable and parameters. The presence of the Mov EBP, ESP command confirms this assumption. For what purpose is this necessary? The ESP register is bound to the PUSH and POP commands that change it automatically. Consequently, if the parameter nearest to the stack top was located in the start of the procedure at the [ESP + 4] address, then after the execution of the PUSH command it will be located at the [ESP + 8] address. Thus, the EBP register is used to fix the reference point, from which locations of the parameters and stack variables are counted.
The next step in the procedure of forming the stack structure is allocation of the memory area for storing local variables. Note that if the use of local variables is not presumed, then the compiler skips this step. As a rule, stack allocation is carried out by the SUB ESP, N command, where N stands for the number of allocated bytes, aligned by the 4-byte boundary. In some cases, however, it is possible to use the ADD ESP, -N command or several PUSH commands. The use of the PUSH command is convenient, because within the same command it is possible to combine stack allocation and variable initialization (see Listing 3.36 and the comments that follow it). The sequence of the PUSH EBP/Mov EBP, ESP/SUB ESP, N commands can be replaced with a single ENTER N command, which, however, is rarely used by the compilers because of its slowness.
Finally, if it is presumed that the EBX, EST, and EDT registers are used in the procedure, they also must be saved in the stack.
In the end of the procedure, the stack must be returned to the state, in which the address of return from the procedure was located on its top. In addition, it is necessary to restore the EBP, EBX, EST, and EDT registers (provided that they were modified). The most common is the sequence of Mov ESP, EBP/POP EBP commands, which the compiler often replaces with a single leave command.
If the preceding method was strictly observed, then there will be no problems with recognizing the procedure in the course of disassembling, even if the procedure was called using indirect call commands (CALL reg32, CALL [reg32], and CALL[mem]). However, contemporary compilers, because of optimization, abandon the use of the EBP register for addressing stack variables and parameters (see Listing 3.38 and the comments that follow it).
In my opinion, the most interesting issue is the one related to nested procedures. In C++, nested functions are not possible. Pascal, in contrast, allows such constructs (Listing 3.49).
Listing 3.49: Pascal program with nested procedures
program Projectl; var a:integer; procedure procl(al:integer); var b, g, d, e:integer; procedure proc2(al:integer); var c:integer; begin c := 30; writeln(al, b, c, d, e, g); end; begin b := 20; g 30; d:= 40; e := 50; proc2(al); end; begin a := 10; procl(a); end.
Listing 3.50 provides the disassembled starting (main) part of the program (see Listing 3.49) compiled using Delphi.
Listing 3.50: Disassembled code of the program shown in Listing 3.49, compiled using Delphi
CODE:004039B4 public start CODE:004039B4 start: CODE:004039B4 push ebp CODE:004039B5 mov ebp, esp CODE:004039B7 add esp, OFFFFFFFOh CODE:004039BA mov eax, ds:off_4040A8 CODE:004039BF mov byte ptr [eax], 1 CODE:004039C2 mov eax, offset dword_403994 CODE:004039C7 call sub_403860 CODE:004039CC mov ds:dword_40565C, OAh CODE:004039D6 mov eax, ds:dword_40565C CODE:004039DB call sub_403938 CODE:004039EO call sub_403394
Listing 3.50 shows the starting part of the program (see Listing 3.49). Of the three procedure calls shown in this listing, one is the call to the procedure directly present in the application program (procl). This is the sub_403938 procedure. The other two procedures are system procedures executed when starting (start-up initialization) and when exiting the program. The sub_403938 procedure obtains its only parameter through the EAX register. The __fascall calling convention is "flourishing" in Delphi, although it wasn't declared in the program. I have even declined optimization when compiling this program. However, as you can see, Delphi made an independent decision. The dword_40565C name corresponds to the a variable in the program source code, and it is the one passed to the procedure through the register. Also, pay attention to the add esp, 0FFFFFFF0h command. I hope that you without trouble can guess that this is the add esp, -16 command, which is equivalent to sub esp, 16. In other words, 16 bytes are reserved.
Listing 3.51 provides the disassembled text of the compiled procl (sub_403938) procedure.
Listing 3.51: Disassembled text of the compiled proc1 procedure
CODE=00403938 sub_403938 proc near ; CODE XREF: CODE:004039DB↓p CODE:00403938 var_14 = dword ptr -14h CODE:00403938 var_10 = dword ptr -10h CODE:00403938 var_C = dword ptr -0Ch CODE:00403938 var_8 = dword ptr -8 CODE:00403938 var_4 = dword ptr -4 CODE=00403938 push ebp CODE:0040393E mov [ebp + var_14], eax CODE:00403941 mov [ebp + var_4], 14h CODE:00403948 mov [ebp + var_10], 1Eh CODE:0040394F mov [ebp + var_8], 28h CODE:00403956 mov [ebp + var_C], 32h CODE:0040395D push ebp CODE:0040395E mov eax, [ebp + var_14] CODE:00403961 call sub_4038DC CODE:00403966 pop ecx CODE:00403967 mov esp, ebp CODE:00403969 pop ebp CODE:0040396A retn CODE:0040396A sub_403938 endp
Note that four local variables are defined in the procl procedure. However, as you can see, five local variables are defined in the executable code. The var_14 variable is allocated for storing the parameter passed to the procedure (mov [ebp + var_14], eax); in other words, it is a temporary variable. The add esp, 0FFFFFFECh command is equivalent to add esp, -20. Everything is correct here (there are live variables, and 20 = 4*5).
There are even more interesting issues. For instance, consider the call to the proc2 procedure, to which the call sub_4038DC command in Listing 3.51 corresponds. Note that this time the parameter also is passed to the procedure through the EAX register. However, what does the push ebp command mean? Is it another parameter? In the source code of the program, there were no additional parameters. Furthermore, this doesn't correspond to the __fascall convention. Recall that the proc2 procedure is nested, and it must have access to the local variables of the procl procedure. This is why the EBP register is secretly passed to the proc2 procedure through this value. This is necessary to provide the nested procedure with access to local variables of the procl procedure. Also, note that the pop ecx command that follows the procedure call simply releases the stack from this "illegal" parameter.
Listing 3.52 provides the disassembled code of the proc2 procedure (see Listing 3.49).
Listing 3.52: Disassembled code of the proc2 procedure from Listing 3.49
CODE:004038DC sub_4038DC proc near ; CODE XREF: sub_403938 + 29↓p CODE:004038DC var_8 = dword ptr -8 CODE:004038DC var_4 = dword ptr -4 CODE:004038DC arg_0 = dword ptr 8 CODE:004038DC push ebp CODE:004038DD mov ebp, esp CODE:004038DF add esp, OFFFFFFF8h CODE:004038E2 mov [ebp + var_4], eax CODE:004038E5 mov [ebp + var_8], IEh CODE:004038EC mov eax, ds:off_4040A4 CODE:004038F1 mov edx, [ebp + var_4] CODE:004038F4 call sub_402B78 CODE:004038F9 mov edx, [ebp + arg_0] CODE:004038FC mov edx, [edx - 4] CODE:004038FF call sub_402B78 CODE:00403904 mov edx, [ebp + var_8] CODE=00403907 call sub_402B78 CODE:0040390C mov edx, [ebp + arg_0] CODE:0040390F mov edx, [edx - 8] CODE=00403912 call sub_402B78 CODE:00403917 mov edx, [ebp + arg_0] CODE:0040391A mov edx, [edx - 0Ch] CODE:0040391D call sub_402B78 CODE:00403922 mov edx, [ebp + arg_0] CODE=00403925 mov edx, [edx - 10h] CODE=00403928 call sub_402B78 CODE:0040392D call sub_402BA8 CODE=00403932 pop ecx CODE=00403933 pop ecx CODE=00403934 pop ebp CODE=00403935 retn CODE=00403935 sub_4038DC endp
The abundance of procedure calls immediately attracts attention. But you know that in the source code (see Listing 3.49), there is only the writeln function. However, writeln is not a function but an operator. The compiler transforms this operator into two procedure calls. The first procedure (sub_402B78) forms some resulting string, which will be printed. The number of calls to this procedure matches the number of parameters in the writeln operator. When the resulting string is formed, the sub_402BA8 procedure is called, which outputs the string to the console.
Pay special attention to the add esp, OFFFFFFF8h command. The memory for two stack variables is reserved. The parameter passed to the procedure is placed into the var_4 variable. The var_8 variable corresponds to the c local variable, which is assigned the value of 30 (lEh).
In addition to the two local variables, the procedure has the arg_0 parameter, which is nothing but the EBP value passed from the procl procedure, using which it is possible to access local variables of the procl procedure.
If you view the source code of the program, you'll immediately note that the proc2 procedure prints the values of al (the values passed from procl as a parameter), c (local variable of the procl procedure), and the values of four variables defined in procl.
The previously-considered arg_0 parameter is used for obtaining the values of variables defined in procl. For example, consider how the value of the b variable is retrieved: mov edx, [ebp + arg_0]/mov edx, [edx - 4]. Again, parameters are passed through registers. As relates to the EAX register, some ds:off_4040A4 parameter is placed there, the value of which is unknown. As you probably can guess, this parameter is required for the operation of the sub_402B78 procedure.
To identify a specific procedure, you need to determine the addresses of its start and its end. Second, it is necessary to determine the number and type (or at least the size) of the passed parameters, the stack variables used by this procedure, and the type of its return value. Consider the possibilities are available for completing this task:
The procedure call can be used. The CALL addr command explicitly specifies that some procedure is located at the addr address. However, there are two possible problems:
An indirect procedure call, such as CALL [EAX], causes difficulties for disassemblers. It is necessary to either resort to using a debugger or analyze the disassembled text manually. Furthermore, if the value of the EAX register is subject to change depending on the values of some other parameters, it becomes problematic to locate all procedures called using this method.
From Section 1.6.1, you know that there are lots of methods of calling procedures. This can be achieved even using the RET command. If you are dealing with a program written in Assembly or containing Assembly inserts, and the program's author aims to confuse potential code investigators, there are lots of possibilities of achieving this goal. However, if nonstandard procedure calls are used, this might be disclosed by the presence of commands like ADD ESP, N, SUB ESP, -N, or one or more POP instructions. If you encounter such patterns, this must inspire you to investigate the code more closely.
Functions can be identified by locating the standard function prologue. As a rule, the standard function prologue is made up of three commands directly following each other: PUSH EBP, Mov EBP, ESP, and SUB ESP, N. The final command might be different, for example, ADD ESP, -N or simply one or more sequential PUSH commands. It also is possible to do without stack allocation for local and temporary variables. This is the case when there are no such variables or if registers are used for passing such variables. Besides this, the procedure might start with the commands for saving the values of the EBX, EST, and EDT registers. In the course of optimization, the compiler might do without the standard prologues and address all stack variables and parameters using the ESP register. Finally, instead of the standard prologue it is possible to use the ENTER N command. The end of the function is easier to locate when the starting point of that function is known. However, in some cases the end of the function is the first to be located.
The procedure end can be easily located if a standard epilogue is present: Mov ESP, EBP/POP EBP. Sometimes, this sequence of commands is replaced with the LEAVE command. The epilogue is followed by the RETN command. In general, any RETN command (especially RETN N) must make the code investigator vigilant. Having encountered it, you should always check whether this is the procedure end. This is a good criterion; however, it is not always applicable. In particular, if the function contains more than one __try/__except blocks, the Microsoft Visual C++ compiler might generate several standard epilogues (for optimization). Thus, even the IDA Pro disassembler might be easily confused in such situations.
As mentioned earlier, it is easier to find the procedure end if the procedure start has been determined. This usually will be the first RETN command encountered. However, it is possible to exit in the middle of a procedure. In this case, some unconditional jump must precede the RETN command, which passes control to some location beyond the RETN command. For example, the pattern might appear as shown in Listing 3.53.
Listing 3.53: Sequence of commands typical for exiting in the middle of a procedure
CMP EAX, 1 JNZ L1 RETN L1:
Thus, it is possible to search for the procedure end starting from the Ll (label). If the procedure has to return something when terminating its execution (when it is a function), then the command setting the value of the EAX register must be present near its end. Such commands might appear as XOR EAX, EAX (return false) or MOV EAX, 1 (return true), or these might be some commands that modify the EAX value (MOV, ADD, SUB, etc.). If the type of the return value is 8 bytes is size, then this value is returned in the EDX:EAX pair of registers. Finally, values of the double type are returned in the ST(0) FPU register.
Note | If the return value is a structure, then the pointer to that structure, instead of the structure, is returned in the EAX register. The structure itself is created in the calling function. When the function of the "structure" type is called, the pointer to that structure is passed in the EAX register. Thus, the function will work with the structure that has already been created, and after completion it will return the pointer to the same structure. |
Most procedures and functions have either variables defined in the stack or parameters passed through the stack. This is an important indication because you will certainly encounter commands with addressing through the EBP or ESP registers. By carefully viewing the code above and below the encountered command, it is possible to determine the procedure start.
When stack variables are addressed in a standard way (through the EBP register), it won't be difficult to determine the amount of stack memory allocated to them (SUB ESP, N or any similar command). As relates to the parameters passed to the procedure, here the situation is slightly more complicated, because it is not known beforehand how much memory has been allocated for them. The easiest way of solving this problem is to find the call to this procedure, because all parameters are usually loaded into the stack using PUSH commands or another obvious method (for instance, see Listing 3.13 and the comments that follow it). If the location, from which the procedure under consideration was called, is not known beforehand, then it will be necessary to analyze its code. First, it will be necessary to find the maximum offset in relation to the EBP value in the direction of higher addresses. Because the return address and old EBP value were loaded into the stack after parameters, the first parameter (the one with the minimum address) will be located at the [EBP + 8] address (see Fig. 3.2). Thus, if the maximum offset using the [EBP + N] addressing is equal to max_off, then the number of bytes allocated for parameters will be equal to max_off-4. Assuming that all parameters are 32 bits in size and have a simple data type (these are not arrays or structures), an approximate number of parameters will be equal to (max_off-4) /4.
After these theoretical considerations and computations, consider a simple example program written in C++ (Listing 3.54).
Listing 3.54: C++ program illustrating the procedure of identifying function start and end
#include <stdio.h> #include <windows.h> double myfunc(double, __int64, int, BYTE); void main() { double ff = 10.45; __int64 ii = 1000; int jj = 200; BYTE bb = 50; double ss = myfunc(ff, ii, jj, bb); printf("%f\n", ff); }; double myfunc(double f, __int64 i, int j, BYTE b) { double s; s = f + i + j + b; printf("%f\n", s); return s; };
The disassembled text of the main function from this program is shown in Listing 3.55.
Listing 3.55: Disassembled code of the main function from Listing 3.54
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 var_40 = qword ptr -40h .text:00401000 var_30 = qword ptr -30h .text:00401000 var_28 = qword ptr -28h .text:00401000 var_1C = dword ptr -1Ch .text:00401000 var_18 = qword ptr -18h .text:00401000 var_10 = dword ptr -l0h .text:00401000 var_C = dword ptr -0Ch .text:00401000 var_l = byte ptr -1 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, 28h .text:00401006 fld ds:dbl_408108 .text:0040100C fstp [ebp + var_28] .text:0040100F mov [ebp + var_10], 3E8h .text:00401016 mov [ebp + var_C], 0 .text:0040101D mov [ebp + var_1C], 0C8h .text:00401024 mov [ebp + var_1], 32h .text:00401028 mov al, [ebp + var_l] .text:0040102B push eax .text:0040102C mov ecx, [ebp + var 1C] .text:0040102F push ecx .text:00401030 mov edx, [ebp + var_C] .text:00401033 push edx .text:00401034 mov eax, [ebp + var_10] .text:00401037 push eax .text:00401038 fld [ebp + var_28] .text:0040103B sub esp, 8 .text:0040103E fstp [esp + 40h + var_40] .text:00401041 call sub_401070 .text:00401046 add esp, 18h .text:00401049 fstp [ebp + var_18] .text:0040104C fid [ebp + var_28] .text:0040104F sub esp, 8 .text:00401052 fstp [esp + 30h + var_30] .text:00401055 push offset unk_4080FC .text:0040105A call _printf .text:0040105F add esp, 0Ch .text:00401062 xor eax, eax .text:00401064 mov esp, ebp .text:00401066 pop ebp .text:00401067 retn .text:00401067 _main endp
First, identify four local variables defined in the main function. Discard the var_30 and var_40 names, because these are not variables. These are identifiers used by IDA Pro. For local variables, 40 bytes are allocated. This is too much for five variables. Thus, it is necessary to investigate these variables in more detail and in due order. The var_28 variable stands for the ff variable of the double type. Here, everything is clear: The initial value is loaded from the dbl_408108 constant using the fld/fstp commands. The mov [ebp + var_10], 3E8h/mov [ebp + var_C], 0 commands load the 1000 (3E8h) value into the ii variable. The disassembler didn't understand that this was a single 64-bit variable, and it interpreted this data item as two different variables. The var_1C variable designates the jj variable. Then, there is the var_l single-byte variable designating bb. Note that although this is a single-byte variable, it takes 4 bytes. This variable is followed by 4 more free bytes, and only after this interval does the var_c variable start. Thus, the compiler has aligned the data by the 8-byte boundary. This alone appears suspicious and causes the code investigator to assume that instead of two 4-byte variables there is one 8-byte variable. Now only the ss variable remains. Note that after the call to the myfunc function that has the double type, there is the fstp [ebp + var_18] command. This means that the value from the ST(0) FPU register is loaded into the var_18 variable. However, double variables are returned in the ST(0) register. Thus, it is possible to conclude that var_18 stands for the ss variable. Therefore, everything is OK. All variables have been identified, and the extra reservation was caused by data alignment.
Another interesting sequence of commands is as follows: mov al, [ebp + var_l] / push eax. At first glance, it appears that everything is all right here because, although the variable is 1 byte, it is necessary to load a 4-byte value into the stack. However, the most significant bytes of the EAX register were not cleared. Furthermore, the entire double word is sent to the stack as a parameter. This is possible only if the function strictly accounts for the parameter being 1 byte in size. By the way, pay special attention to the order, in which parameters are sent to the stack (from right to left). It is the calling function that clears the stack. This corresponds to the __cdecl calling convention (see Table 3.2). Then all other parameters are sent into the stack. The ii(var_10, var_c) variable is sent into the stack as two independent 4-byte variables. The ff variable is sent into the stack using the fstp command, as required. Then there is the call to the printf function. This isn't anything unusual. However, I would still like to draw your attention to the following issue: The first parameter of this function is the format string, which specifies all other parameters of the function. This specification is often helpful for determining the variable type and size. This is even truer because the C++ library provides several other functions similar to printf and operating over the format string.
The disassembled code of the myfunc function is provided in Listing 3.56.
Listing 3.56: Disassembled code of the myfunc function from Listing 3.54
.text:00401070 sub_401070 proc near ; CODE XREF: _main + 41↑p .text:00401070 var_14 = qword ptr -14h .text:00401070 var_C = dword ptr -0Ch .text:00401070 var_8 = qword ptr -8 .text:00401070 arg_0 = qword ptr 8 .text:00401070 arg_8 = qword ptr l0h .text:00401070 arg_10 = dword ptr 18h .text:00401070 arg_14 = byte ptr 1Ch .text:00401070 push ebp .text:00401071 mov ebp, esp .text:00401073 sub esp, 0Ch .text:00401076 fild [ebp + arg_8] .text:00401079 fadd [ebp + arg_0] .text:0040107C fiadd [ebp + arg_10] .text:0040107F movzx eax, [ebp + arg_14] .text:00401083 mov [ebp + var_C], eax .text:00401086 fild [ebp + var_C] .text:00401089 faddp st(1), st .text:0040108B fst [ebp + var_8] .text:0040108E sub esp, 8 .text:00401091 fstp [esp + 14h + var_14] .text:00401094 push offset byte_408100 .text:00401099 call _printf .text:0040109E add esp, 0Ch .text:004010A1 fld [ebp + var_8] .text:004010A4 mov esp, ebp .text:004010A6 pop ebp .text:004010A7 retn .text:004010A7 sub_401070 endp
Start the analysis by considering stack variables. There are only two such variables (the var_14 variable is not taken into account): var_8 and var_c. The var_8 variable takes 8 bytes, which leads to the conclusion that this is nothing but the s variable. This assumption will be further confirmed. No other variables were declared in the myfunc function. Consequently, the 4-byte var_c variable is simply a temporary variable.
It is time to consider the function parameters. Strangely, there are only four parameters. Recall that although there are four parameters, when disassembling the main function IDA Pro considered there to be five variables, which later are used as parameters. Nevertheless, there isn't anything difficult here. When the disassembler processed the main function, it didn't have groundwork for considering var_10 and var_c as a single variable. When processing the myfunc function, the disassembler has well-grounded reasons for considering arg_8 as a single 8-byte parameter or the __int64 number (see the fild command).
Now, consider the algorithm used for computing the value of the f + i + j + b expression. The fild command loads a long integer number (the i number) onto the top of the FPU stack (namely, into the ST(0) register). The next command, fadd, adds this number to a real number, f. The result is then loaded into ST(0) and interpreted as real. The next command, fiadd, adds the real number stored in ST(0) to the 32-bit integer number j. Again, the result is placed into ST(0). Then the movzx eax, [ebp + arg_14] command places 1 byte into the EAX register and clears the most significant bytes of the register. This issue has already been mentioned in comments that follow Listing 3.55. The byte is sent into the stack as part of a double word, and the calling party doesn't clear the most significant bits, while the called procedure does clear them; otherwise, error would be inevitable. Later, the var_c variable is used. The b number is loaded into it (mov [ebp + var_C], eax), after which the var_c variable is loaded into the ST(0) register and the old value of ST(0) is moved into the ST(1) register. Finally, there is the faddp st(1), st command, and the result of computing the f + i + j + b expression is placed into var_8 (the s variable) by the fst [ebp + var_8] command. The stack is not popped, and the result is still contained in ST(0). Thus, the sequence of the sub esp, 8/ fstp [esp + 14h + var_14] commands places this result into the stack for output using the printf function. The final stroke is the fld[ebp + var_8] command — the value returned by the function. Here, the compiler has made minor error. It wasn't necessary to use the fstp command, and without it the latter command also wouldn't be needed.
Buffer overflow is one of the methods often used by hackers for correcting the software at run time. By skillfully manipulating the input data, the hacker causes buffer overflow and passes control to the shellcode expertly inserted into the program. Here, only one type of overflow error will be considered, namely, stack overflow. Usually, stack overflow manifests in programs written in C++. It consists of intrusion into the executable program code through the program stack.
The stack overflow technique is mainly used for remote attacks. If you need to intrude the program that runs on the local computer, there are more powerful tools for achieving this goal. Besides, cracking some system running on a remote computer requires the intruder to carry out some preliminary investigations. This is why I decided to include this material in this book, even though it mainly relates to remote attacks.
When working with external devices, the program allocates buffers for storing sent and received data. The received data fill the allocated buffers either completely or partially. The program must ensure that the received data do not go beyond the buffer limits. If this happens, other data items of even the executable code might become damaged. If the data being loaded exceed the boundaries of the buffer allocated to them, this might result in a program malfunction or even in a total crash. A specific feature of such errors is that they are exceedingly difficult to detect. In some cases, programmers might gain a false impression that the errors are arbitrary and are not related to specific actions. Exceeding the array boundaries is a typical example of such errors.
Most contemporary compilers are capable of generating the code that can check whether the boundaries of allocated buffers have been exceeded. For example, the Microsoft C++ compiler provides the /GS command-line options. If this option is used, additional code is generated by the compiler, which checks all operations to find out whether the buffer limits have been exceeded. However, only buffers defined in the stack are checked, and this check is superficial. It doesn't guarantee that the data do not fall into another buffer after exceeding limits of the allocated one. The essence of the idea of checking the stack for overflow is to place certain predefined bytes at the boundaries of the allocated buffer. After any operations that write into the allocated buffer, a special procedure that checks these bytes must be called. If these bytes are modified, this is evidence that there was a buffer overflow error. This approach requires additional memory. Furthermore, it considerably slows program execution.
A natural question arises: How can the buffer overflow be exploited by those who try to crack the program or a system? There are several techniques of penetration:
If the buffer is located in the stack, then the most obvious mechanism of intrusion from the outside is modification of the return address from the function. The return address can be modified in such a way that it jumps to another function or to the address that also is located in the stack but contains malicious code instead of normal data. This method is the one that will be covered later in this section.
Another approach is to modify some pointers (including to pointers to functions, the jump table, etc.) at run time in such a way as to point to code other than that initially intended by the program developer. This code was previously inserted by the intruder and will run according to the intruder's plans.
Another approach is to modify data addresses in the course of overflow. The next portion of data will be supplied into a different program location, which would allow the intruder to insert an external code into the program.
Thus, it is possible to state that software is cracked by buffer overflow in the following two stages: It is necessary first to insert malicious code and then to pass control to it. In theory, the first stage might not be needed if the procedure that the hacker requires is part of the program being investigated. In this case, the only thing that the hacker needs is to pass control to that procedure at the required moment.
Why do I pay such attention to stack overflow? Windows protects the executable code from writing there (the topic of self-modification was covered in detail in Section 1.6.2). Also, Windows protects the data section from executing some code there. The stack is the only location where it is possible to both write data and execute commands. This fundamental property of the stack is universally applicable to most operating systems. In other words, it is possible to fill the stack buffer with executable code and then make the processor execute it.
As an example, consider the simple program provided in Listing 3.57. The getpassword function checks whether the supplied password is correct and, depending on the password's correctness, returns either false or true.
Listing 3.57: Example of a simple program vulnerable to stack overflow
#include <stdio.h> #include <string.h> int getpassword(char *); char *passw = "privet"; int main() { printf("Input password:\n"); if(!getpassword(passw))printf("You are registered!\n"); else printf("You are wrong!\n"); return 0; }; int getpassword(char *ss) { char s[13]; gets(s) ; if (!strcmp(s, ss))return 0; else return 1; }
Fig. 3.3 shows the design of the stack of the program presented in Listing 3.57. This is a standard arrangement with prologue and epilogue. As you can see, the general stack structure is divided into the stack structure of the main function and the stack structure of the getpassword function. Note that addresses decrease from bottom to top. The local variable shown in this illustration is the s variable. And, although the size of this parameter was specified to equal 13 bytes, the compiler aligns it by the 4-byte boundary so that the buffer size equals 16 bytes. Data are inserted into the s variable from lower addresses to higher addresses. Thus, the data that overflow the stack overwrite everything located below. The EBP value is the first to be overwritten. After it, there is the address of return from the getpassword function (which is the result longed for by the attacker). If the data go beyond the buffer limits and change the value of the return address, then the function would return control to a different location. But where to? This is the most interesting question. The attacker can place into the stack the address of some function present within the program so that in case of buffer overflow the program would execute according to the different method (which the programmer didn't expect).
Figure 3.3: The stack structure, with addresses decreasing from bottom to top
It is time to conduct an experiment by changing the return address from the getpassword function. First, it is necessary to carefully study the disassembled code of the program. Listings 3.58 and 3.59 show the disassembled texts of the main and getpassword functions, respectively.
Listing 3.58: Disassembled code of the main function from Listing 3.57
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 push offset aInputPassword .text:00401008 call _printf .text:0040100D add esp, 4 .text:00401010 mov eax, dword_409040 .text:00401015 push eax .text:00401016 call sub_401050 .text:0040101B add esp, 4 .text:0040101E test eax, eax .text:00401020 jnz short loc_401031 .text:00401022 push offset aYouAreRegister .text:00401027 call _printf .text:0040102C add esp, 4 .text:0040102F jmp short loc_40103E .text:00401031 loc_401031: ; CODE XREF: _main + 20↑j .text:00401031 push offset aYouAreWrong .text:00401036 call _printf .text:0040103B add esp, 4 .text:0040103E loc_40103E: ;CODE XREF: _main + 2F↑j .text:0040103E xor eax, eax .text:00401040 pop ebp .text:00401041 retn .text:00401041 _main endp
Listing 3.59: Disassembled code of the getpassword function (Listing 3.57)
.text:00401050 sub_401050 proc near ; CODE XREF: _main + 16↑p .text:00401050 var_10 = byte ptr -l0h .text:00401050 arg_0 = dword ptr 8 .text:00401050 push ebp .text:00401051 mov ebp, esp .text:00401053 sub esp, 10h .text:00401056 lea eax, [ebp + var_10] .text:00401059 push eax .text:0040105A call gets .text:0040105F add esp, 4 .text:00401062 mov ecx, [ebp + arg_0] .text:00401065 push ecx ; char .text:00401066 lea edx, [ebp + var_10] .text:00401069 push edx ; char .text:0040106A call _strcmp .text:0040106F add esp, 8 .text:00401072 test eax, eax .text:00401074 jnz short loc_40107A .text:00401076 xor eax, eax .text:00401078 jmp short loc_40107F .text:0040107A loc_40107A: ; CODE XREF: sub_401050 + 24↑j .text:0040107A mov eax, 1 .text:0040107F loc_40107F: ; CODE XREF: sub_401050 + 28↑j .text:0040107F mov esp, ebp .text:00401081 pop ebp .text:00401082 retn .text:00401082 sub_401050 endp
To start this study, first consider the call to the sub_401050 function, which is nothing but the designation of the getpassword function. The sequence of the mov eax, dword _409040/push eax commands corresponds to sending the pointer containing the reference password into the stack. The dword_409040 global variable contains the address of this string; in other words, it is the pointer variable. Thus, the parameter is sent to the stack, then the call command places there the return address (in this case, this is the 0040101Bh value).
Then pay attention to the test eax, eax command and the conditional jump that follows it: jnz short loc_401031. This is a normal conditional construct, and the test command corresponds to the if(!) operator. You have encountered the structure of the main function many times.
First, note that 16 bytes are allocated for storing local variables in this program. This is because all data in the stack are aligned by the 4-byte boundary. So, if you want to overflow the stack, bear in mind the actual size of this buffer.
The lea eax, [ebp + var_10]/push eax sequence of commands simply places the address of this buffer into the stack. It is assumed that the password will be loaded into that buffer. This is the key issue. As you should understand (see Fig. 3.3), this buffer is followed by the content of the EBP register, which, in turn, is followed by the desired return address.
After the call to the gets library function, the strcmp function for comparing strings is called. This function receives the addresses of two strings from the stack. After the execution of this function, there is an ordinary conditional construct (test, then jnz). By the way, note that the strcmp function, like many other string functions, controls the string lengths only by the terminating null; consequently, buffer overflow cannot influence their operation.
Thus, having carefully considered Listings 3.58 and 3.59, it is possible to proceed further. What is the main goal? First, try to modify the return address so that the jump by retn from the getpassword function would pass control to the printf command in the beginning of the main function. From Listing 3.58, it follows that the jump address is equal to 00401003. Recall that in the memory the number is written according to the principle "the most significant byte gets the higher address," and you'll find that the following sequence of bytes must be sent to the buffer: 03 10 40 00. However, it is necessary to fill first the 16-byte buffer then 4 more bytes where the EBP value is stored. Because it is difficult to enter the characters with codes 10h and 03h from the command line, it is recommended that you use the following technique. Prepare the text file with the required string, then use input redirection. Thus, assuming that the program name is progl.exe and the text file is named pasw.txt, it is necessary to issue the following command: progl < pasw.txt. For entering characters with codes smaller than 32, it is possible to use the hiew.exe program (see Section 2.1.3). Well, let the string appear as follows:
qqqqqqqqqqqqqqqqqqqq♥▶@
Exactly 20 bytes (16 bytes is the size of the string buffer, and 4 bytes are for the contents of EBP) are filled with arbitrary characters (for simplicity, I suggested simply the q character. Then there follow the characters with the codes 03h 10h 40h. And where is the character with code 0? Actually, you do not need it. After all, in the address that you will change, it is present in the position where it is required. Thus, prepare and execute the progl < pasw.txt command. The result will be as follows:
Input password: Input password: You are wrong!
Thus, the problem has been solved. After the execution of the getpassword function, the jump passes control to the address specified in the buffer. However, after these strings are displayed, the dialog box warning you that there was an exception will appear (Fig. 3.4). This is natural and can be easily explained. During the second pass, different data fall into the buffer and stack is already corrupted; thus, the application cannot terminate correctly.
Figure 3.4: The exception reported by Windows XP after an artificially-created buffer overflow
Note | The attacker has serious limitations when choosing what bytes are sent into the buffer. For example, the hacker would fail to send characters with codes such as 26 or, say, 0 into the buffer. However, these limitations are not insurmountable for the following two reasons:
|
Recall that it is possible to run executable code in the stack. What if you place the program code into the stack and pass control to that code? This sounds promising! After all, in the primitive program the buffer size is only 16 bytes. Assume that you are dealing with a buffer that is 16 KB. It is possible to place a program into that buffer that would do anything you like and on behalf of a program that probably has a high privilege level in the system. This security hole has been exploited by crackers for more than 10 years.
What can be placed into the buffer in this particular case? For example, consider the code in Listing 3.60.
Listing 3.60: Executable code that can be placed into the vulnerable buffer (Listing 3.57)
MOV EAX, 0 RETN
Is this code suitable? I have intentionally made an error here. Recall that the RETN command has already been executed for jumping to this fragment; so instead of RETN it is necessary to use some kind of JMP command.
If you succeed in achieving this goal, the program will be cracked, because it will always report any supplied password as correct.
From Listing 3.58 it becomes clear that the jump address must be equal to 0040101B. The stack content will not be damaged, and no critical errors will occur.
Disappointingly, the sequence of commands such as MOV EAX, 0/JMP 0040101B is not suitable for passing it into the stack as a string. This is because of the following:
The first command contains zeros. The code o0f the MOV EAX, O command is equal to B8 00000000.
In the second command, the jump address is counted in relation to the command that follows the JMP command. If the starting address of the stack is changed, then this code fragment won't operate correctly.
Thus, to achieve the formulated goal, the two commands from Listing 3.60 must be transformed into a sequence of six commands (Listing 3.61).
Listing 3.61: Executable code that would operate correctly in the vulnerable buffer
XOR EAX, EAX ; 33 CO XOR ECX, ECX ; 33 C9 MOV CL, 40H ; B1 40 SHL ECX, 1OH ; C1 E1 10 MOV CX, 101BH ; 66 B9 1B 10 JMP ECX ; FF E1
The codes of the commands in Listing 3.61 are specified as comments. The best approach to obtaining the correct codes of all commands is to use some debugger, such as OllyDbg. Thus, you'll require 15 bytes out of 20 allocated bytes (recall that 16 bytes are allocated for the string and 4 bytes are allocated for storing EBP). The remaining 5 bytes of the allocated 20 bytes can contain any information. The return address, which is next, must contain the starting address of the buffer. The buffer address can be determined using the same debugger. In this particular example, it turns out to be equal to 0012FEC8. Thus, it is necessary to add 3 more bytes for the 20-byte string that has already been formed: C8 FE 12.
Here is the content of the pasw.txt file: . To avoid errors, fill the text file using hiew.exe, not using a standard text editor. An attempt at passing some characters through the clipboard inevitably corrupts the code, although the character might remain the same. Thus, having prepared the text file, issue the progl < pasw.txt command. The result will be wonderful:
You are registered!
Thus, the shellcode has been inserted into the program, making the program register the user.
Now, only one issue needs clarification. As you have seen, there are problems with the input of characters that can be sent to the program as a string. The input is not always carried out using the console procedure that treats some characters selectively. If you deal with such an input, then there is no problem.
However, return to the case of console input. What is the solution to this problem? The answer is straightforward. It is necessary to encode the sequence of bytes in such a way as to ensure that the codes incorrectly treated by console input are missing from the string. I won't consider different methods of encoding here. I prefer the following approach (which might require additional memory). The main goal of this approach is as follows: All "invalid" bytes must be encoded (for example, by the XOR command). Each of these bytes must be preceded by a byte specifying that the next byte that follows it is encoded. For this purpose, it is natural to use the NOP command that has the 90H code. The entire fragment must start with the decoder for the remainder of the code. Decoding consists of removing the NOP bytes and decoding the bytes that follow them. Because the bytes that console input functions consider invalid or interpret incorrectly are not numerous, the number of NOP instructions must not be too large.
When programming in high-level languages such as C and Pascal, most programmers have become used to completed conditional constructs (if-else) and logical operators (AND, OR, etc.). However, there was a time when such possibilities were not available. For instance, consider Fortran or early versions of Basic. For these, unconditional jump operators, such as goto, were helpful, despite strong dislike by ardent supporters of high style in programming. However, the machine language is entirely built on the basis of conditional and unconditional jumps. Like it or not, it is impossible to do without them if you need to check some condition.
Consider the simple conditional construct shown in Listing 3.62.
Listing 3.62: Simple conditional construct
#include <stdio.h> void main() { int a, b; scanf("%d", &a); scanf("%d", &b); if (a >= b) printf("a >= b\n"); else printf("a < b\n"); }
After compiling the program presented in Listing 3.62 using Microsoft Visual Studio and loading it into the IDA Pro disassembler, you'll obtain the code shown in Listing 3.63.
Listing 3.63: Disassembled code of the program shown in Listing 3.62
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 var_8 = dword ptr -8 .text:00401000 var_4 = dword ptr -4 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, 8 .text:00401006 lea eax, [ebp + var_4] .text:00401009 push eax .text:0040100A push offset unk_4080FC .text:0040100F call _scanf .text:00401014 add esp, 8 .text:00401017 lea ecx, [ebp + var_8] .text:0040101A push ecx .text:0040101B push offset unk_408100 .text:00401020 call _scanf .text:00401025 add esp, 8 .text:00401028 mov edx, [ebp + var_4] .text:0040102B cmp edx, [ebp + var_8] .text:0040102E jl short loc_40103F .text:00401030 push offset aAB ; "a >= b\n" .text:00401035 call _printf .text:0040103A add esp, 4 .text:0040103D jmp short loc_40104C .text:0040103F loc_40103F: ; CODE XREF: _main + 2E↑j .text:0040103F push offset aAB_0 ; "a<b\n" .text:00401044 call printf .text:00401049 add esp, 4 .text:0040104C loc_40104C: ; CODE XREF: _main + 3D↑|j .text:0040104C xor eax, eax .text:0040104E mov esp, ebp .text:00401050 pop ebp .text:00401051 retn .text:00401051 _main endp
Pay special attention to the way, in which the scanf function is called. Because its argument requires a pointer to a variable, lea eax, [ebp + var_4] /push eax sends the pointer to the var_4 variable into the stack. Note that Microsoft's compiler treats the var_8 variable in the same way.
The second important issue is how the complete conditional construct is implemented in the executable code. Schematically, this can be represented as shown in Listing 3.64.
Listing 3.64: Implementation of a complete conditional structure in the executable code
j1 l1 // a >= b ... jmp 12 l1: // a < b ... 12:
As you can see, to implement a complete conditional construct one conditional jump and one unconditional jump are required. Note that the conditional jump command corresponds to the condition that is the negation of the one in the original program text. If you remove the else branch (the incomplete conditional construct) from the program, then the unconditional jump command (jmp 12) will be missing from the executable code. Finally, if you replace the a >= b condition with a > b, then the executable code will contain the jle (jump less or equal) command instead of the jl command. If the source program uses the less-or-equal condition, for example, a =< b, then the executable code will contain jg or jge commands (for the less condition). The fact that instead of a direct condition its negation is checked in the executable code isn't an axiom. Another approach is possible (Listing 3.65).
Listing 3.65: Alternative implementation of complete conditional constructs
jge 11 // a < b ... jmp 12 11: // a >= b ... 12:
As you can see, there isn't anything abnormal in this approach. You can encounter it when analyzing the code created by some compilers. However, I'd like to point out again that decompilation isn't your main goal. You must try to understand the program's operating logic. For this purpose, you do not need to know exactly, which source code was used to generate a specific fragment of the disassembled code.
If variables used within the program being studied are unsigned, then instead of jl(jle) the jb(jbe) commands are used, and the jg(jge) commands are replaced with ja(jae). In case of the check for equality (==) or inequality (!=), the jnz and jz commands are used, respectively. Note that in the previously considered example, the CMP command was used in the executable code for checking the a > b condition. This is obvious. This command is also used for checking other conditions: <, <=, >=, ==, and !=. Everything depends on the conditional check that you'll use later — in other words, on the flag or group of flags that you are checking. If you check for equality (or inequality) to zero, the TEST command is often used instead of CMP. Recall that in many programming languages the false value corresponds to 0, and true corresponds to some nonzero value (1, for example). In this relation, it is instructive to consider a construct typical for the C++ language (Listing 3.66).
Listing 3.66: Typical C++ construct
if(k = (a = b))) { } else { }
It is clear that the k variable will be assigned one of the following two values: 1 (if a is equal to b) or 0 (if a is not equal to b). Here is a fragment of the executable code compiled using the Microsoft Visual C++ compiler (Listing 3.67).
Listing 3.67: Executable code generated using Microsoft Visual C++ (Listing 3.66)
; Load the a variable into the EAX register. mov eax, [ebp + var_4] ; Compute the difference a - b; the variables remain unchanged. sub eax, [ebp + var_8] ; Sign inversion is needed to check whether the EAX register contains 0. neg eax ; Subtraction takes into account the sign. ; If the EAX register contains a nonzero value, ; then the subtraction result in EAX will be -1; ; otherwise, the EAX register would contain 0. sbb eax, eax ; If the EAX register contained -1, the inc command ; will produce 0 (false); otherwise, the result will be 1 (true). inc eax ; The value is assigned to the k variable. mov [ebp + var_C], eax ; Jump depends on the result in the EAX register. jz short loc_401058 ... jmp short loc_401065 loc_401058: ... loc_401065: ...
The algorithm of obtaining the value of the k variable is a notable one. As you can see, the CMP command is not used in this case. Note that the conditional jump is carried out depending on the value contained in the EAX register after the INC EAX operation. The result of this operation, contained in the EAX register, is either 0 (false) or 1 (true).
Comparison of real numbers deserves special attention (Listing 3.68).
Listing 3.68: Sample program illustrating comparison of real numbers
#include <stdio.h> void main() { double a, b; scanf("%Lf", &a); scanf("%Lf", &b); if (a >= b) printf("%Lf\n", a) ; else printf(""-Lf\n", b); }
Listing 3.68 presents a simple program for comparing two real numbers of the double type. From the language syntax point of view, the difference between this program and a similar one that compares integer variables is minor and relates only to the format of the scanf and printf functions. However, comparison of real variables must be principally different from comparison of integer numbers at the level of executable code. Consider Listing 3.69, produced by IDA Pro.
Listing 3.69: Disassembled code of the program in Listing 3.68 created by IDA Pro
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 var_18 = qword ptr -18h .text:00401000 var_10 = qword ptr -10h .text:00401000 var_8 = qword ptr -8 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, 10h .text:00401006 lea eax, [ebp + var_8] .text:00401009 push eax .text:0040100A push offset unk_4090FC .text:0040100F call _scanf .text:00401014 add esp, 8 .text:00401017 lea ecx, [ebp + var_10] .text:0040101A push ecx .text:0040101B push offset unk_409100 .text:00401020 call _scanf .text:00401025 add esp, 8 .text:00401028 fld [ebp + var_8] .text:0040102B fcomp [ebp + var_10] .text:0040102E fnstsw ax .text:00401030 test ah, 1 .text:00401033 jnz short loc_40104D .text:00401035 fld [ebp + var_8] .text:00401038 sub esp, 8 .text:0040103B fstp [esp + 18h + var_18] .text:0040103E push offset aLf ; "%Lf\n" .text:00401043 call _printf .text:00401048 add esp, 0Ch .text:0040104B jmp short loc_401063 .text:0040104D loc_40104D: ; CODE XREF: _main + 33↑j .text:0040104D fld [ebp + var_10] .text:00401050 sub esp, 8 .text:00401053 fstp [esp + 18h + var_18] .text:00401056 push offset aLf_0 ; "%Lf\n" .text:0040105B call _printf .text:00401060 add esp, 0Ch .text:00401063 loc_401063: ; CODE XREF: _main + 4B↑j .text:00401063 xor eax, eax .text:00401065 mov esp, ebp .text:00401067 pop ebp .text:00401068 retn .text:00401068 _main endp
For the moment, you shouldn't see anything unusual in allocating memory for stack variables. The only issue that deserves special mention is that now variables have the double type and, consequently, take 8 bytes. Arguments of the scanf function are pointers to variables rather than actual variables. Hence, the lea (lea eax, [ebp + var_8]/push eax) commands are used. The use of two registers (EAX and ECX) as temporary variables for storing the pointers to variables for further pushing into the stack has no special meaning. It would be possible to do with only one register.
Starting from the 00401028 address, the really interesting commands are encountered. It is necessary to compare two 8-byte floating-point numbers. Here is the sequence of commands that carries out this comparison (Listing 3.70).
Listing 3.70: Sequence of commands that compares two 8-byte floating-point numbers
; Load the a variable into ST(0). fld [ebp + var_8] ; Compare the content of ST(0) to var_10 (the b variable). fcomp [ebp + var_10] ; Save the status word (SW) in the AX register. fnstsw ax ; Check bit 0 of the AH register. test ah, 1 ; Jump if this bit is set. jnz short loc_40104D ... jmp short loc_401063 loc_40104D: ... loc_401063: ...
Although the preceding fragment is supplied with comments, some notes still have to be made. The fcomp command compares two operands, and the result of this comparison is reflected by the C0, C2, and C3 flags that correspond to bits 8, 9, and 10 of the status word (see Section 1.2.3). Table 3.3 outlines the flag values for different situations of comparison.
Checked condition | C3 flag | C2flag | CO flag |
---|---|---|---|
ST(0) > src | 0 | 0 | 0 |
ST(0) < src | 0 | 0 | 1 |
ST(0) = src | 1 | 0 | 0 |
Operands are incomparable | 1 | 1 | 1 |
From this table, it follows that if C0 == 0, this corresponds to the >= condition. Hence, the JNZ (jump if nonzero) command corresponds to printing the message that the b variable is the greatest. Some compilers use different techniques. For instance, a complier may copy the coprocessor flags into the flags register using the SAHF command. In this case, the C0 flag is copied into the CF flag, C2 is copied into PF, and C3 goes into ZF. Later, it is possible to use conditional jumps. In the case under consideration, this will be the JZ command (instead of JNZ). What should you do when checking a strict inequality, such as a > b? According to the data provided in Table 3.3, the checking command would appear as follows: TEST AX, 41H.
In real-world programming, conditional constructs might contain nested logical constructs (Listing 3.71). Consider how nesting might influence the executable code.
Listing 3.71: Example program containing nested conditional constructs
// Searching for the maximum number out of three numbers #include <stdio.h> void main() { int a, b, c; scanf("%d", &a); scanf("%d", &b); scanf("%d", &c); if (a > b) { if(a > c) printf("%d\n", a) ; else print f("%d\n", c); } else { if(b > c) printf("%d\n", b) ; else printf("%d\n", c); } }
The compiler builds the structure of nested conditional constructs according to the same method as the one shown in Listing 3.63. This method is presented in Listing 3.72.
Listing 3.72: Method of nested conditional structures built by the compiler
jl 11 // if (1) [9] // The branch corresponding to a > b jl 14 // if (2) // The branch corresponding to a > c // Output the a value. ... jmp 12 14: // else (2) // The a > c condition is not observed, while the a > b condition is true. // Output the c value. ... 12: // The end of the if operator of the first nesting level jmp 13 // The start of the else operator of the first nesting level 11: // else (1) // The a > b condition hasn't been observed. jl 15 // if (2) // The b > c condition is true, while the a > b condition is false. // Output the b value. ... jmp 13 15: // else (2) // The b > c is condition false, and the a > b condition is false. // Output the c value. ... 13: // End of the nested construct
Carefully consider the method presented in Listing 3.72. As you can see, it is clearly mapped to the method present in the source program (see Listing 3.71). At the same time, any complete conditional construct can be easily converted into incomplete conditional construct by simply discarding an appropriate jmp command.
Contemporary programming languages use OR and AND logical operators instead of large numbers of nested conditional constructs. Using these logical operators, combined conditions are built. In real-world programming, such combined conditions are sophisticated. Kris Kaspersky, in his excellent book Hacker Debugging Uncovered, suggests the diagrams technique for analyzing executable code resulting from complex conditional constructs. This approach can be helpful when reconstructing a complex conditional construct. In my opinion, however, it is not always necessary to reconstruct the source code to understand the conditions implemented in the executable code. After all, programmers often use logical operators more for making the code compact than for making their programs readable. This doesn't improve understanding of their programs. Furthermore, most programmers use both logical operators and nested conditional constructs, which makes it problematic to recover source constructs.
Consider the example in Listing 3.73.
Listing 3.73: Example illustrating recognition of logical operators
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 var_C = dword ptr -0Ch .text:00401000 var_8 = dword ptr -8 .text:00401000 var_4 = dword ptr -4 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, 0Ch .text:00401006 lea eax, [ebp + var_4] .text:00401009 push eax .text:0040100A push offset unk_4080FC .text:0040100F call scanf .text:00401014 add esp, 8 .text:00401017 lea ecx, [ebp + var_8] .text:0040101A push ecx .text:0040101B push offset unk_408100 .text:00401020 call scanf .text:00401025 add esp, 8 .text:00401028 lea edx, [ebp + var_C] .text:0040102B push edx .text:0040102C push offset unk_408104 .text:00401031 call scanf .text:00401036 add esp, 8 .text:00401039 mov eax, [ebp + var_4] .text:0040103C cmp eax, [ebp + var_8] .text:0040103F jle short loc_401047 .text:00401041 cmp [ebp + var_8], 0 .text:00401045 jg short loc_401055 .text:00401047 loc_401047: ; CODE XREF: _main + 3F↓j .text:00401047 mov ecx, [ebp + var_4] .text:0040104A cmp ecx, [ebp + var_C] .text:0040104D jz short loc_401055 .text:0040104F cmp [ebp + var_C], 0 .text:00401053 jnz short loc_401064 .text:00401055 loc_401055: ; CODE XREF: _main + 45↑j .text:00401055 push offset aYes ; "Yes!\n" .text:0040105A call _printf .text:0040105F add esp, 4 .text:00401062 jmp short loc_401071 .text:00401064 loc_401064: ; CODE XREF: _main + 53↑j .text:00401064 push offset aNo ; "No!\n" .text:00401069 call _printf .text:0040106E add esp, 4 .text:00401071 loc_401071: ; CODE XREF: _main + 62↑j .text:00401071 xor eax, eax .text:00401073 mov esp, ebp .text:00401075 pop ebp .text:00401076 retn .text:00401076 _main endp
Lots of similar listings have already been provided in this book. Thus, you'll easily determine that three stack variables of the integer type are used here (12 bytes are reserved, and three variables of the same type are used). The input of their values using the scanf library function also isn't anything new. The most interesting are the conditional jumps. First, consider the code in general. What would you see? First, all conditional jumps can produce one of the two possible results: printing the "Yes" string (the 00401055 address) or the "No" string (the 00401064 address) using the printf function. Consequently, to all appearances you are dealing with the if-else conditional construct. Combine all conditions that produce the first result and, accordingly, combine all conditions that lead to the second result. Thus, the jump to the 00401055 address takes place when var_4 > var_8 and var_8 > 0. This condition is an obvious candidate for the role of the following combined condition: a > b && b > 0. Let it be designated as condition 1. Because the same result is produced when all conditions except for condition 1 are satisfied, it is possible to assume that you are dealing with the OR logical operator (although, it is not necessary to take this into account to gain a sound understanding of the process). The same result is obtained if the var_4 = var_c condition is satisfied. Let it be condition 2. Finally, the same result will be produced if var_c = 0. This will be condition 3. In all other cases, the code fragment starting from the 00401064 address is executed. After completing all of these considerations, it is possible to write a conditional construct using the AND and OR logical operators. However, this won't improve your understanding of the code, because you have already understood the operating logic of this fragment.
Thus, what conclusion can be drawn on the basis of these considerations? This conclusion is as follows: Conditions related to one another using logical AND can be easily recognized. Considering them as a combined condition, it is also possible to understand the operating logic of all other conditions (related to the first one using logical OR).
It is necessary to mention that conditional and unconditional jumps reset the command queue, which slows down the program execution. Bear this in mind when writing a program in Assembly. Avoid using jumps whenever it is possible to do without them. Instead, it is possible to use the following sets of commands: SETcc r/m (conditional setting of the first bit) and CMOVX (conditional data sending). These commands are described in Chapter 1 (see Table 1.2). Advanced compilers, such as Visual C++, are aware of this capability. Unfortunately, I failed to make this compiler use any of the conditional move commands. On the other hand, it turned out that Visual C++ actively uses conditional bit setting commands.
The idea of using the commands for conditionally setting the first bit of the byte is trivial. Let some register (for example, EAX) contain the 0 value in the beginning. Then, after using the CMP command, it is possible to use one of the conditional bit setting commands, for instance, the SETLE command (set if lower). After that, execute DEC EAX. Now, if the condition has been satisfied, EAX contains 0; otherwise, the value contained in EAX is equal to -1 (FFFFFFFFH). If the operand is equal to 0, then the AND instruction with any value of the second operand won't change its content. For instance, consider the fragment in Listing 3.74.
Listing 3.74: Code fragment illustrating the use of the conditional bit setting commands
.text:00401013 xor eax, eax .text:00401015 cmp edx, 0Ah .text:00401018 setle al .text:0040101B dec eax .text:0040101C and eax, OFFFFF200h .text:00401021 add eax, 1000h
As can be easily seen, if the EDX <= 0 condition is satisfied, the EAX register contains the 1000H value; otherwise, it will contain the 200H value. This executable code is equivalent to the if (a > 10) b = 0x200; else b = 0x1000; operator. This code is created by Microsoft's compiler if the "create fast code" option is set.
As a rule, a long sequence of incomplete conditional operators is replaced with the choice operator. Listing 3.75 provides a typical example of using the choice operator. Consider what Microsoft's compiler would do to this code (Listing 3.76).
Listing 3.75: Typical example of using the choice operator
#include <stdio.h> void main() { char a; scanf("%c", &a); switch(a) { case 'A': printf("A\n"); break; case 'B': printf("B\n"); break; default: printf("?\n"); } }
Listing 3.76: Disassembled listing (Listing 3.74) compiled using Microsoft Visual C++
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 var_8 = byte ptr -8 .text:00401000 var_l = byte ptr -1 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, 8 .text:00401006 lea eax, [ebp - 1] .text:00401009 push eax .text:0040100A push offset unk_4080FC .text:0040100F call _scanf .text:00401014 add esp, 8 .text:00401017 mov cl, [ebp + var_1] .text:0040101A mov [ebp + var_8], cl .text:0040101D cmp [ebp + var_8], 41h .text:00401021 jz short loc_40102B .text:00401023 cmp [ebp + var_8], 42h .text:00401027 jz short loc_40103A .text:00401029 jmp short loc_401049 .text:0040102B loc_40102B: ; CODE XREF: _main + 21↑j .text:0040102B push offset byte_408100 .text:00401030 call _printf .text:00401035 add esp, 4 .text:00401038 jmp short loc_401056 .text:0040103A loc_40103A: ; CODE XREF: _main + 27 ↑j .text:0040103A push offset unk_408104 .text:0040103F call _printf .text:00401044 add esp, 4 .text:00401047 jmp short loc_401056 .text:00401049 loc_401049: ; CODE XREF: _main + 29↑j .text:00401049 push offset unk_408108 .text:0040104E call printf .text:00401053 add esp, 4 .text:00401056 loc_401056: ; CODE XREF: _main + 38↑j .text:00401056 ; main + 47 ↑j .text:00401056 xor eax, eax .text:00401058 mov esp, ebp .text:0040105A pop ebp .text:0040105B retn .text:0040105B _main endp
This code is interesting because of a certain redundancy. Two stack variables are defined in the main procedure. The var_l variable apparently is intended for storing the a variable defined in the program source code (see Listing 3.75), because it is used with the scanf function. The var_8 variable is an auxiliary one. It is used for comparison in commands such as cmp [ebp + var_8], 42h. It is possible to use a single variable, var_1. By the way, note that two single-bit variables are stored in two adjacent 4-byte blocks. This is because of the requirement to meet the data alignment by the 4-byte boundary.
The method for checking conditions is bulky. This is because the check tests whether the condition is satisfied. The method in Listing 3.77 would be more elegant.
Listing 3.77: Alternative and more elegant method for checking conditions
cmp [ebp + var_8], 41h jnz 11 ... jmp _break 11: cmp [ebp + var_8], 42h jnz 12: ... jmp _break 12: //default ... _break:
Finally, you can encounter the approach shown in Listing 3.78. In particular, Borland C++ 5.0, as well as the Delphi compiler, processes the choice operator in this way. The Microsoft C++ compiler also behaves in this way, provided that the "create fast code" option is set.
Listing 3.78: Processing the typical choice operator
mov dl, [ebp + var_l] sub dl, 41h jz 11 dec dl jz 12 jmp 13 11: ... jmp 13 12: ... 13: ...
On the basis of Listing 3.78, the principle of this approach is easily understandable. The parameter of the switch operator is placed into some temporary variable, which might also be a register. Assume that the values for equality, to which the variable will be checked, are a1, a2, ..., an (in ascending order). The check is carried out as follows: First, the a1 value is subtracted from the temporary variable and the variable is tested for equality to zero. Later, the values a2 - a1, a3 - a2, etc., are subtracted from the variable. After each subtraction, the variable is tested for equality to zero. This approach is especially efficient when all differences (a2 - al, a3 - a2, etc.) are equal to one. In this case, it is convenient to use the DEC processor command.
Looping algorithms are a kind of branching in which, depending on the condition, a specific program fragment is executed multiple times. At the level of Assembly language, a conditional or unconditional backward jump is assumed to the instructions with smaller addresses. Contemporary disassemblers and debuggers trace such jumps and mark them in the disassembled text.
Consider all possible variants of loop organization in Assembly language. This would allow you to easily understand how contemporary compilers treat loops. A typical loop structure is shown in Listing 3.79.
Listing 3.79: Typical loop structure
... ; Quite often, the JMP L1 instruction is encountered here. _beg: ; In particular, the commands changing the loop parameter ; appear here. ... CMP EAX, EBX ; Or, perhaps, check for some other condition. JZ _end ; Or, perhaps, some other conditional jump ; beyond the loop limits. ; The loop body starts here. L1: ... ; The loop body - any number of commands. JMP _beg _end:
Listing 3.79 demonstrates a typical loop structure that you can encounter when studying the code generated by various compilers. Note that the jump to the start of the loop body might be present (JMP L1). If such a jump is present, then the loop body will be executed at least once. This corresponds to the postcondition loop ideology. If there is no jump to the loop body, then in general the loop body does not execute even once. This corresponds to the precondition loop ideology.
A typical example of a postcondition loop is presented in Listing 3.80.
Listing 3.80: Typical example of a postcondition loop
... _beg: ; Start of the loop body ... CMP EAX, EBX ; Or check for some other condition. JZ _beg ; Or some other conditional jump to the start of the loop ...
In this case, you are dealing with a typical example of the postcondition loop. However, it would be an error to consider that the loop type specified in the high-level programming language would automatically correspond to the loop type in the executable code. Contemporary compilers do not simply transform operators of high-level programming languages to machine commands but sometimes inventively reconstruct them (for some cases, the word inventively should be enclosed in quotation marks). Thus, if you use a precondition loop in your program but variable values are guaranteed to ensure that the loop will be executed at least once, the compiler can use the postcondition loop in the machine code.
The FOR, loops present in the main algorithmic languages are variants of precondition loops. One or more parameters simply participate in formulating the condition. Parameters of such loops, when executing the loop body (or before the next iteration) obtain some constant increment (which might be either positive or negative). The presence of a variable, which is incremented or decremented by a constant value at each iteration, is an important indication that you are dealing with the FOR, loop.
After this brief introduction, it is natural to consider several practical examples. For instance, Listing 3.81 provides the simplest example of using the FOR loop.
Listing 3.81: Simple example of using the FOR loop
#include <stdio.h> void main() { int b = 10; for(int i = 0; i < 100; i++) printf("%d\n", b); }
Consider how the Microsoft Visual C++ compiler translates this program, provided that the "no optimization" option was chosen for compilation (Listing 3.82).
Listing 3.82: Disassembled text compiled using Microsoft Visual C++ without optimization
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 var_8 = dword ptr -8 .text:00401000 var_4 = dword ptr -4 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, 8 .text:00401006 mov [ebp + var_4], 0Ah .text:0040100D mov [ebp + var_8], 0 .text:00401014 jmp short loc_40101F .text:00401016 loc_401016: ; CODE XREF: _main + 36↓j .text:00401016 mov eax, [ebp + var_8] .text:00401019 add eax, 1 .text:0040101C mov [ebp + var_8], eax .text:0040101F loc_40101F: ; CODE XREF: _main + 14↑j .text:0040101F cmp [ebp + var_8], 64h .text:00401023 jge short loc_401038 .text:00401025 mov ecx, [ebp + var_4] .text:00401028 push ecx .text:00401029 push offset unk_4060FC .text:0040102E call _printf .text:00401033 add esp, 8 .text:00401036 jmp short loc_401016 .text:00401038 loc_401038: ; CODE XREF: _main + 23↑j .text:00401038 xor eax, eax .text:0040103A mov esp, ebp .text:0040103C pop ebp .text:0040103D retn .text:0040103D _main endp
First, I'd like to draw your attention to local variables. Obviously, var_4 stands for the b variable. As relates to var_8, it is nothing but the loop parameter. Also, note the mov eax, [ebp + var_8] /add ax, 1/mov [ebp + var_8], eax commands. They immediately attract attention and clearly indicate the FOR loop.
Note that Listing 3.82 corresponds exactly to the program operating method provided in Listing 3.81. The jmp short loc_40101F jump reflects that the initial value of the loop parameter must be 0, and the exit condition of the loop is the equality of the loop parameter to 100. When implementing such an algorithm in Assembly, it would be possible to do without this jump by simply assigning the initial value of-1 to the loop parameter.
From Listing 3.82, it is clear that you are dealing with the precondition loop. However, consider what the result would be if you instructed the compiler to produce compact executable code (Listing 3.83).
Listing 3.83: Disassembled text (Listing 3.81) compiled using the "compact code" option
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 push esi .text:00401001 push 64h .text:00401003 pop esi .text:00401004 loc_401004: ; CODE XREF: _main + 13↓j .text:00401004 push 0Ah .text:00401006 push offset unk_4060FC .text:0040100B call _printf .text:00401010 dec esi .text:00401011 pop ecx .text:00401012 pop ecx .text:00401013 jnz short loc_401004 .text:00401015 xor eax, eax .text:00401017 pop esi .text:00401018 retn .text:00401018 _main endp
Listing 3.83 provides a typical example of the postcondition loop. Note that the role of the loop parameter is delegated to the ESI register. At the same time, instead of incrementing the parameter this code decrements it to make it possible to use the condition of comparing this parameter to zero. This technique is often used by the Microsoft Visual C++ compiler to turn a precondition loop to a postcondition loop. In this case, the parameter increment is replaced with the decrement. An interesting issue is that if the for loop is replaced with the while or do-while loop and the "create compact code" optimization option is preserved, the result will be exactly the same as the one shown in Listing 3.82.
Consider how the program shown in Listing 3.81 is treated by the Borland C++ compiler (Listing 3.84).
Listing 3.84: Disassembled text (Listing 3.81) compiled using the Borland C++ compiler
.text:00401108 _main proc near ; DATA XREF: .data:0040AOB8↓o .text:00401108 argc = dword ptr 8 .text:00401108 argv = dword ptr 0Ch .text:00401108 envp = dword ptr l0h .text:00401108 push ebp .text:00401109 mov ebp, esp .text:0040110B push ebx .text:0040110C push esi .text:0040110D mov esi, 0Ah .text:00401112 xor ebx, ebx .text:00401114 loc_401114: ; CODE XREF: _main + lE↓j .text:00401114 push esi .text:00401115 push offset format ; Format .text:0040111A call _printf .text:0040111F add esp, 8 .text:00401122 inc ebx .text:00401123 cmp ebx, 64h .text:00401126 jl short loc_401114 .text:00401128 pop esi .text:00401129 pop ebx .text:0040112A pop ebp .text:0040112B retn .text:0040112B _main endp
The most interesting issue is that the Borland C++ v. 5.0 compiler interprets the same program slightly differently: The precondition loop is transformed to the postcondition loop; however, the loop parameter continues to be incremented (see Listing 3.84). Moreover, note that Microsoft's compiler is more accurate when working with registers: It uses only one register (ESI) that must be stored in the stack.
One type of loop optimization was covered in the previous section: conversion of the precondition loop to the postcondition loop. In addition, it was shown that the Microsoft Visual C++ compiler can replace the loop parameter increment with its decrement so that the equality of this parameter to zero is the loop termination condition. However, there are even more efficient methods of loop optimization used by advanced compilers. Here, I'll cover the two most important techniques.
The programmer often doesn't notice that the result obtained in the course of executing a looping algorithm is self-evident. Advanced compilers, including Microsoft Visual C++, recognize such situations and replace looping algorithms with ready-to-use results. Consider the simple example provided in Listing 3.85.
Listing 3.85: Simple demonstration of computation at compile time optimization
#include <stdio.h> void main() { int i = 0, s=0, k = 5; for(i = 0; i < k; i++) S = S + i; printf("%d\n", s); }
Principally, it is not difficult to evaluate the result of computing the s value: It will be ten. The Microsoft Visual C++ compiler easily solves this problem. Consider the code that it will produce as the result of compiling the program in Listing 3.85 using the "create fast code" option (Listing 3.86).
Listing 3.86: Disassembled text of the program in Listing 3.85
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 push 0Ah .text:00401002 push offset unk_4060FC .text:00401007 call printf .text:0040100C add esp, 8 .text:0040100F xor eax, eax .text:00401011 retn .text:00401011 _main endp
In Listing 3.86, there are no loops. This reconfirms that the compilation process is generally irreversible. However, this is not related to understanding the program operating logic.
Loop unwinding (also known as loop unrolling) is a technique based on the principles of optimizing memory access.
Listing 3.87 presents a test program demonstrating the loop unwinding technique.
Listing 3.87: Test program demonstrating the loop unwinding technique
#include <stdio.h> void main() { int a[100]; int i = 0, s = 0; for(i = 0; i < 100; i++) a[i] = i; for(i = 0; i < 100; i++) s += a[i]; printf("%d\n", s); }
Listing 3.87 presents a program using the a array. Compile this program using the Microsoft Visual C++ compiler with the "create fast code" option specified. The disassembled text of the executable code of this program is shown in Listing 3.88.
Listing 3.88: Disassembled text of the executable code of the program in Listing 3.87
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 var_194 = dword ptr -194h .text:00401000 var_190 = dword ptr -190h .text:00401000 var_18C = dword ptr -18Ch .text:00401000 var_188 = dword ptr -188h .text:00401000 var_184 = dword ptr -184h .text:00401000 var_180 = dword ptr -180h .text:00401000 sub esp, 190h .text:00401006 xor ecx, ecx .text:00401008 xor eax, eax .text:0040100A lea ebx, [ebx + 0] .text:00401010 loc_401010: ; CODE XREF: _main + 17 ↓j .text:00401010 mov [esp + eax*4 + 190h + var_190], eax .text:00401013 inc eax .text:00401014 cmp eax, 64h .text:00401017 jl short loc_401010 .text:00401019 xor eax, eax .text:0040101B push esi .text:0040101C lea esp, [esp+0] .text:00401020 loc_401020: ; CODE XREF: _main + 3E↓j .text:00401020 mov esi, [esp + eax*4 + 194h + var_184] .text:00401024 mov edx, [esp + eax*4 + 194h + var_180] .text:00401028 add edx, esi .text:0040102A add edx, [esp + eax*4 + 194h + var_188] .text:0040102E add edx, [esp + eax*4 + 194h + var_18C] .text:00401032 add edx, [esp + eax*4 + 194h + var_190] .text:00401036 add eax, 5 .text:00401039 add ecx, edx .text:0040103B cmp eax, 64h .text:0040103E jl short loc_401020 .text:00401040 push ecx .text:00401041 push offset unk_4060FC .text:00401046 call _printf .text:0040104B add esp, 8 .text:0040104E xor eax, eax .text:00401050 pop esi .text:00401051 add esp, 190h .text:00401057 retn .text:00401057 main endp
For storing a local array, 190h bytes are allocated (which is equal to 400) — in other words, 4 bytes per element.
I'd like to immediately draw your attention to two commands: lea ebx, [ebx] and lea esp, [esp]. Neither command changes the contents of the registers; the compiler has inserted them to optimize the execution speed. Earlier in this chapter, I mentioned the optimization technique of command pairing (see the comments that follow Listing 3.2). In this case, the compiler adds commands that do not mean anything but correctly divide the commands by pairs.
The loop, in which the values of array elements are specified, is simple. It is located in the listing from addresses 00401010 to 00401017. Note that addressing of local variables in the stack is not done using the EBP register for the reasons of optimization. On the contrary, the ESP register is used for addressing of local variables. The EAX register plays the role of the i variable (this is an example of using a register variable). Note that in this loop the compiler doesn't use the technique of replacing the loop parameter increment with the decrement, which was described earlier. This is because the loop parameter also plays the role of the array index.
Now, consider the next loop, located from the 00401020 to the 0040103E address. The goal of this loop is to sum all array elements. The sum must be accumulated in the s variable (see Listing 3.87). As can be easily seen, the role of the s variable is delegated to the ECX register (for instance, this can be noticed by the call to the printf function). The most important detail is that the loop is preceded by the PUSH ESI command. The reason for the presence of this command is obvious: The ESI register is further used as a temporary variable, and this register must not be changed after execution of the main function. However, do not forget that addressing is relative to the ESP register and the compiler later corrects the addressing of the array elements. Thus, when computing the array index, it is necessary to subtract one (or to subtract four when computing the element address). The result will be as shown in Listing 3.89.
Listing 3.89: Computing array elements
[esp + eax*4 + 194h + var_184] = [esp + eax*4 + 4*4], whichcorreqponds to a[i + 3] [esp + eax*4 + 194h + var_180] = [esp + eax*4 + 5*4], whichcorreqponds to a[i + 4] [esp + eax*4 + 194h + var_188] = [esp + eax*4 + 3*4], whichcorreqponds to a[i + 2] [esp + eax*4 + 194h + var_18C] = [esp + eax*4 + 2*4], whichcorreqponds to a[i + 1] [esp + eax*4 + 194h + var_190] = [esp + eax*4 + 1*4], whichcorresponds to a[i]
In the beginning, the a[i + 3] and a[i + 4] elements are added, and the result is loaded into the EDX register. Then the a[i + 2], a[i + 1] and a[i] elements are added to the sum. The resulting sum is added to the ECX register, which, as you know, is the s variable. Finally, the index and the loop parameter are increased by five instead of one. Thus, it is possible to state that the compiler has implemented the loop in Listing 3.90.
Listing 3.90: Loop implemented by the compiler
for(i = 0; i < 100; i += 5) { s = s + a[i]; s = s + a[i + 1]; s = s + a[i + 2]; s = s + a[i + 3]; s = s + a[i + 4]; }
From the algorithmic point of view, this loop is the full equivalent of the loop in Listing 3.87. From the optimization point of view, it implements loop unwinding, which allows you to considerably speed up program execution.
In the previous section, I considered executable code structures corresponding to the simplest loops. However, looping algorithms might be complicated by the following factors:
Loop nesting
Complex exit condition
Auxiliary loop control operators, such as break and continue
Consider the following two examples. The first example contains simultaneously complex exit condition and auxiliary loop control operators (Listing 3.91). I intentionally did not provide the source code of the C++ program here, because your main goal is to understand the code operating logic, not to discover how the compiler translated the program source code into the executable code.
Listing 3.91: Simultaneous complex loop exit condition and auxiliary loop control operators
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 var_C = dword ptr -0Ch .text:00401000 var_8 = dword ptr -8 .text:00401000 var_4 = dword ptr -4 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, 0Ch .text:00401006 mov [ebp + var_4], 0 .text:0040100D mov [ebp + var_C], 0 .text:00401014 mov [ebp + var_8], 0 .text:0040101B jmp short loc_401026 .text:0040101D loc_40101D: ; CODE XREF: _main + 51↓j .text:0040101D ; main + 74↓j .text:0040101D mov eax, [ebp + var_8] .text:00401020 add eax, 1 .text:00401023 mov [ebp + var_8], eax .text:00401026 loc_401026: ; CODE XREF: _main + 1B↑j .text:00401026 cmp [ebp + var_8], 2710h .text:0040102D jge short loc_401076 .text:0040102F cmp [ebp+var_4], OC350h .text:00401036 jge short loc_401076 .text:00401038 mov ecx, [ebp + var_4] .text:0040103B add ecx, [ebp + var_C] .text:0040103E add ecx, [ebp + var_8] .text:00401041 mov [ebp + var_4], ecx .text:00401044 mov [ebp + var_4], lEh .text:0040104B cmp [ebp + var_4], 0 .text:0040104F jz short loc_401053 .text:00401051 jmp short loc_40101D .text:00401053 loc_401053: ; CODE XREF: _main + 4F↑j .text:00401053 mov eax, [ebp + var_4] .text:00401056 cdq .text:00401057 idiv [ebp + var_8] .text:0040105A mov [ebp + var_C], eax .text:0040105D cmp [ebp + var_C], 64h .text:00401061 jnz short loc_401065 .text:00401063 jmp short loc_401076 .text:00401065 loc_401065: ; CODE XREF: _main + 61↑J .text:00401065 mov edx, [ebp + var_C] .text:00401068 mov [ebp + var_C], edx .text:0040106B mov eax, [ebp + var_C] .text:0040106E add eax, 1 .text:00401071 mov [ebp + var_C], eax .text:00401074 jmp short loc_40101D .text:00401076 loc_401076: ; CODE XREF: _main + 2D↑j .text:00401076 ; _main + 36↑j ... .text:00401076 xor eax, eax .text:00401078 mov esp, ebp .text:0040107A pop ebp .text:0040107B retn .text:0040107B _main endp
First, consider the fragment located from the 0040101D to the 00401023 address. This is a typical loop header, or, to be precise, the part of the loop that increments the value of the loop parameter. Accordingly, the entry point into the loop is below this fragment. Entry into the loop is executed by the jmp short loc_40l026 command. You have already encountered such fragments. Thus, the start of the loop is identified clearly: This is the loc_40101D label. When viewing the lower part of the listing, pay attention to the jmp short loc_40l0lD command. Because below it there are no commands that would jump to labels between the 0040101D and the 00401074 addresses, it is possible to assume that this is the last command of the loop. Note that principally the loop under consideration could be nested into another loop; however, the current analysis doesn't relate to this issue. Thus, the loop boundaries have been identified.
Another question arises: What are the loop exit conditions? In essence, it is not important whether this condition is written in the loop header or the exit is carried out by the break operator. The issue that does matter is that the exit from the loop is by the 00401076 address. Thus, you'll see the commands in Listing 3.92.
Listing 3.92: Commands for loop exit
.text:0040102D jge short loc_401076 ... .text:00401036 jge short loc_401076 ... .text:00401063 jmp short loc_401076
The last jump is slightly below the first two and takes place if the var_c variable is equal to 64h — in other words, to 100. To all appearances, this is the exit by the break operator. The first two jumps correspond to the loop header and to the logical AND condition. Thus, without any trouble it is possible to determine that the loop execution condition can be written as var_8 < 2710h && var_4 < c350h. Note that the var_8 variable is the loop parameter (see addresses 0040101D-00401023 in Listing 3.91). In essence, only the role of the commands shown in Listing 3.93 needs to be clarified.
Listing 3.93: Commands whose role in the loop structure needs to be clarified
.text:0040104B cmp [ebp + var_4], 0 .text:0040104F jz short loc_401053 .text:00401051 jmp short loc_40101D
If the var_4 variable has a nonzero value, then there will be a jump to the start of the loop. This cannot be anything other than the continue operator.
Now, consider an example of nested loops. Manipulations of multidimensional arrays are typical examples of nested loops. Listing 3.94 presents the disassembled code that fills a two-dimensional array. It is necessary to mention that this program was compiled using the Microsoft Visual C++ compiler with the "no optimization" option.
Listing 3.94: Disassembled code that fills a two-dimensional array
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 var_198 = dword ptr -198h .text:00401000 var_194 = dword ptr -194h .text:00401000 var_190 = dword ptr -190h .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, 198h .text:00401009 mov [ebp + var_194], 0 .text:00401013 jmp short loc_401024 .text:00401015 loc_401015: ; CODE XREF: _main:loc_401078↓j .text:00401015 mov eax, [ebp + var_194] .text:0040101B add eax, 1 .text:0040101E mov [ebp + var_194], eax .text:00401024 loc_401024: ; CODE XREF: _main + 13↑j .text:00401024 cmp [ebp + var_194], 0Ah .text:0040102B jge short loc_40107A .text:0040102D mov [ebp + var_198], 0 .text:00401037 jmp short loc_401048 .text:00401039 loc_401039: ; CODE XREF: _main + 76↓j .text:00401039 mov ecx, [ebp + var_198] .text:0040103F add ecx, 1 .text:00401042 mov [ebp + var_198], ecx .text:00401048 loc_401048: ; CODE XREF: _main + 37↑j .text:00401048 cmp [ebp + var_198], 0Ah .text:0040104F jge short loc_401078 .text:00401051 mov edx, [ebp + var_194] .text:00401057 add edx, [ebp + var_198] .text:0040105D mov eax, [ebp + var_194] .text:00401063 imul eax, 28h .text:00401066 lea ecx, [ebp + eax + var_190] .text:0040106D mov eax, [ebp + var_198] .text:00401073 mov [ecx + eax*4], edx .text:00401076 jmp short loc_401039 .text:00401078 loc_401078: ; CODE XREF: _main + 4F↑j .text:00401078 jmp short loc_401015 .text:0040107A loc_40107A: ; CODE XREF: _main + 2B↑j .text:0040107A xor eax, eax .text:0040107C mov esp, ebp .text:0040107E pop ebp .text:0040107F retn .text:0040107F _main endp
It is not difficult to locate nested loops in this listing. The starting point of the nesting loop at the 00401015 address and the starting point of the nested loop at the 00401039 address can be noticed easily. I'd like to draw your attention to the jmp short loc_401024 and jmp short loc_401048 instructions that carry out initial entries into the nesting and into the nested loops, respectively. Thus, the structure of the nested loops in this case is easy and doesn't require any additional comments.
The most interesting issue here is how the executable code implements the algorithm assigning the value to the two-dimensional array. The parameter of the nesting loop is stored in the var_198 variable. The remaining var_190 variable points to the starting point of the two-dimensional array. Thus, after the mov edx, [ebp + var_194] and add edx, [ebp + var_198] commands, the EDX register will contain the sum of the two index values. If you run a few steps forward to the mov [ecx + eax* 4], edx command, which appears like an operator that assigns the value to the array element, it becomes obvious that array elements are assigned values equal to the sum of indexes (the values of the parameters of the nesting and the nested loops). However, look back several commands. What is the meaning of the mov eax, [ebp + var_194] /imul eax, 28h commands? The index value (consider it to be the first) is multiplied by the number of bytes in the row of a two-dimensional array (4*10 = 40 = 28h). Then, by executing the lea ecx, [ebp + eax + var_190] command, the address of the starting point of the current row is loaded into the ECX register. Finally, ecx + eax*4 is the address of the current element of the two-dimensional array. Thus, the algorithm in Listing 3.94 can be clarified easily.
To complete the description of loops, it is necessary to mention that if the same program operating over a two-dimensional array is compiled using the Microsoft Visual C++ "create fast code" option, the result would be interesting. Namely, the nested loop would disappear, because the compiler would unroll it (see Listing 3.93 and the comments that follow it). The nesting loop will be transformed from the precondition loop to the postcondition loop.
Identification of objects and everything related to them is a more complicated problem than the ones solved earlier. However, you have already encountered most of the material provided here. After all, methods are functions, and object properties are variables. [10] It is time to consider this problem in more detail and in due order.
First, consider a simple example intended to demonstrate typical program structures that serve objects. The program in Listing 3.95 has only one class, on the basis of which a single object is created. Note that this is a global static object. In other words, the compiler must take care to allocate memory for storing that object. The most important issue in object-oriented programming is deciding whether the created properties and methods relate to a single object or to several objects simultaneously. It is necessary to answer the following question: If, for example, a method (in essence, representing some function) relates to several objects simultaneously, how does it "know", in relation to which object it has been called from a specific program location?
Listing 3.95: Simple program demonstrating typical program structures for serving objects
#include <stdio.h> class A { public: int b; int a; int geta(){b = 0; return a;}; void seta(int); }; void A::seta(int al) { a = al; b = 1; }; A Al; void main() { Al.seta(10); int c = Al.geta(); printf("%d\n", c) ; }
Listing 3.96 provides the executable code of the main function from Listing 3.95, disassembled using IDA Pro. The program was compiled using Microsoft Visual C++ with the "no optimization" option.
Listing 3.96: Disassembled code of the main function from Listing 3.95
.text:00401020 _main proc near ; CODE XREF: start + 16E↓p .text:00401020 var_4 = dword ptr -4 .text:00401020 push ebp .text:00401021 mov ebp, esp .text:00401023 push ecx .text:00401024 push 0Ah .text:00401026 mov ecx, offset unk_4086CO .text:0040102B call sub_401000 .text:00401030 mov ecx, offset unk_4086CO .text:00401035 call sub_401060 .text:0040103A mov [ebp + var_4], eax .text:0040103D mov eax, [ebp + var_4] .text:00401040 push eax .text:00401041 push offset unk_4060FC .text:00401046 call _printf .text:0040104B add esp, 8 .text:0040104E xor eax, eax .text:00401050 mov esp, ebp .text:00401052 pop ebp .text:00401053 retn .text:00401053 _main endp
The text presented in Listing 3.96 contains three function calls. The first call is to the printf library function, so it doesn't make sense to consider it in detail. The sub_401000 and sub_40l060 functions deserve special attention. In the comments following Listing 3.97, the code of these functions will be described in more detail. Because the source code of the program is available (see Listing 3.95), it is possible to tell for certain that these are the calls to the seta and geta methods, respectively. Note that before calling the method, in both cases, some memory address is sent into the ECX register: offset unk_4086C0. Click the reference, and you will find that this memory area is made up of 8 bytes (at least, IDA Pro has decided so). This must make you vigilant, because the object contains two int properties, which makes exactly 8 bytes. Thus, already on the basis of preliminary considerations it is possible to conclude that when calling a method, one parameter is the address of the object, in relation to which the given method is called (for the Microsoft Visual C++ compiler, the parameter is passed through the ECX register).
Listing 3.97: Disassembled code (Listing 3.96) corresponding to seta/geta calls (Listing 3.95)
.text:00401000 sub_401000 proc near ; CODE XREF: _main + B↓p .text:00401000 var_4 = dword ptr -4 .text:00401000 arg_0 = dword ptr 8 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 push ecx .text:00401004 mov [ebp + var_4], ecx .text:00401007 mov eax, [ebp + var_4] .text:0040100A mov ecx, [ebp + arg_0] .text:0040100D mov [eax + 4], ecx .text:00401010 mov edx, [ebp + var_4] .text:00401013 mov dword ptr [edx], 1 .text:00401019 mov esp, ebp .text:0040101B pop ebp .text:0040101C retn 4 .text:0040101C sub_401000 endp .text:00401060 sub_401060 proc near ; CODE XREF: _main + 15↑p .text:00401060 var_4 = dword ptr -4 .text:00401060 push ebp .text:00401061 mov ebp, esp .text:00401063 push ecx .text:00401064 mov [ebp + var_4], ecx .text:00401067 mov eax, [ebp + var_4] .text:0040106A mov dword ptr [eax], 0 .text:00401070 mov ecx, [ebp + var_4] .text:00401073 mov eax, [ecx + 4] .text:00401076 mov esp, ebp .text:00401078 pop ebp .text:00401079 retn .text:00401079 sub_401060 endp
Note | The pointer to an object used in a method has a special name in the C++ language: this. Using the this pointer, it is possible to access the class members, including the closed ones. |
Consider the text of the sub_401000 function. As already mentioned, to all appearances this is the seta function. Note immediately that this function has only one stack variable and one parameter. The stack is released by the RETN 4 command. The sequence of the push ecx/mov [ebp + var_4], ecx commands immediately draws attention. This is a nuisance. The push command here is used to reserve memory for the local variable. At the same time, the variable is assigned the value stored in the ECX register. The command that follows the PUSH command again assigns the same value to the variable. Well, the compiler can be excused for this incident, because, after all, it was not instructed to use optimization. Further operations are the matter of technique: The value considered the object address is loaded into the EAX register. The mov [eax + 4], ecx command follows (and ECX now contains the parameter value). In other words, the a object property is located at the offset of 4 bytes in relation to the start of the object. Then there is the sequence of the mov edx, [ebp + var_4] /mov dword ptr [edx], 1 commands, which corresponds to assignment of the value 1 to the b property.
The sub_401060 function doesn't contain anything new. The mov eax, [ecx + 4] command simply returns the a property of the geta function.
Thus, having considered the example of compilation using the Microsoft Visual C++ compiler, it is possible to note that when calling a method, it is implicitly passed the pointer to the object, in the context of which it is called. In this example, a global object was created. However, there will be no difference when creating an object locally in the stack. Try to conduct this investigation on your own.
If the program presented in Listing 3.95 is compiled using the Borland C++ v. 5.0 compiler, there won't be any significant difference. Again, the method will be informed about the object address by the passing of an additional parameter. This additional parameter is the last to be passed through the stack in relation to all other parameters. I won't provide any additional listings here because they won't teach you anything new.
In real-world programming, new objects are typically created on the fly. The new operator is intended specially for this purpose. Consider the program from Listing 3.95 rewritten in terms of dynamically-created objects (Listing 3.98).
Listing 3.98: Example illustrating the use of dynamic objects
#include <stdio.h> class A { public: int b; int a; int geta(){b = 0; return a;}; void seta(int); }; void A::seta(int al) { a = a1; b = 1; }; void main() { A * A1 = new (A) ; Al -> seta(10); int c = A1 -> geta(); printf("%d\n", c) ; delete Al; }
Listing 3.99 presents the disassembled text of the executable code created by the Microsoft Visual C++ compiler with the "no optimization" option.
Listing 3.99: Disassembled text of the program shown in Listing 3.98
.text:00401020 _main proc near ; CODE XREF: start + 16E↓p .text:00401020 var_10 = dword ptr -10h .text:00401020 var_C = dword ptr -OCh .text:00401020 var_8 = dword ptr -8 .text:00401020 var_4 = dword ptr -4 .text:00401020 push ebp .text:00401021 mov ebp, esp .text:00401023 sub esp, 10h .text:00401026 push 8 .text:00401028 call ??2@YAPAXI@Z ; Operator new(uint) .text:0040102D add esp, 4 .text:00401030 mov [ebp + var_C], eax .text:00401033 mov eax, [ebp + var_C] .text:00401036 mov [ebp + var_8], eax .text:00401039 push OAh .text:0040103B mov ecx, [ebp + var_8] .text:0040103E call sub_401000 .text:00401043 mov ecx, [ebp + var_8] .text:00401046 call sub_401080 .text:0040104B mov [ebp + var_4], eax .text:0040104E mov ecx, [ebp + var_4] .text:00401051 push ecx .text:00401052 push offset unk_4060FC .text:00401057 call _printf .text:0040105C add esp, 8 .text:0040105F mov edx, [ebp + var_8] .text:00401062 mov [ebp + var_10], edx .text:00401065 mov eax, [ebp + var_10] .text:00401068 push eax .text:00401069 call j__free .text:0040106E add esp, 4 .text:00401071 xor eax, eax .text:00401073 mov esp, ebp .text:00401075 pop ebp .text:00401076 retn .text:00401076 _main endp
The abundance of redundant code and redundant stack variables in this code fragment is surprising. In this case, however, this is of little or no importance. It is important here that after creation of an object all further actions have principally no differences from the ones encountered in Listing 3.96. Note that the new and delete operators (the call to the j__free procedure) are recognized by the IDA Pro disassembler, which makes further analysis a simple matter.
Virtual functions are a kind of payment for the beauty and ease of object-oriented programming theory. Consider the program shown in Listing 3.100, demonstrating a typical example of inheritance.
Listing 3.100: Example program demonstrating a typical example of inheritance
#include <stdio.h> class A { public: int a; int seta(int al){a = al; return a;}; void pa(){printf("%d\n", a);} }; class B:public A { public: int seta (int al){a = al + 1; return a;}; }; void main() { A* Al; Al = new(B); Al -> seta(10); Al -> pa(); delete Al; };
The B class inherits properties and methods of the A class. In this case, the B class has the seta method that by its name and parameters matches the similar method in class A. If, for example, you create an object on the basis of the B class, then when the seta method is called, the method from the B class will be called. This is a well-known inheritance property. The situation will be slightly different if you first create the pointer to the object of the base class A and then create a new object using the new operator on the base of the B class template (see Listing 3.100). This time, the compiler will orient toward the pointer type. In this case, A1 -> seta (10) will mean the call to the method of the base class. Listing 3.101 shows the disassembled code created by the Microsoft Visual C++ compiler using the "no optimization" option.
Listing 3.101: Disassembled code of the program compiled by Microsoft Visual C++ without optimization
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 var_C = dword ptr -OCh .text:00401000 var_8 = dword ptr -8 .text:00401000 var_4 = dword ptr -4 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, OCh .text:00401006 push 4 .text:00401008 call ??2@YAPAXI@Z ; Operator new(uint) .text:0040100D add esp, 4 .text:00401010 mov [ebp + var_8], eax .text:00401013 mov eax, [ebp + var_8] .text:00401016 mov [ebp + var_4], eax .text:00401019 push OAh .text:0040101B mov ecx, [ebp + var_4] .text:0040101E call sub_401050 .text:00401023 mov ecx, [ebp + var_4] .text:00401026 call sub_401070 .text:0040102B mov ecx, [ebp + var_4] .text:0040102E mov [ebp + var_C], ecx .text:00401031 mov edx, [ebp + var_C] .text:00401034 push edx .text:00401035 call j__free .text:0040103A add esp, 4 .text:0040103D xor eax, eax .text:0040103F mov esp, ebp .text:00401041 pop ebp .text:00401042 retn .text:00401042 _main endp .text:00401042 .text:00401050 .text:00401050 sub_401050 proc near ; CODE XREF: _main + 1E↑p .text:00401050 var_4 = dword ptr -4 .text:00401050 arg_0 = dword ptr 8 .text:00401050 push ebp .text:00401051 mov ebp, esp .text:00401053 push ecx .text:00401054 mov [ebp + var_4], ecx .text:00401057 mov eax, [ebp + var_4] .text:0040105A mov ecx, [ebp + arg_0] .text:0040105D mov [eax], ecx .text:0040105F mov edx, [ebp + var_4] .text:00401062 mov eax, [edx] .text:00401064 mov esp, ebp .text:00401066 pop ebp .text:00401067 retn 4 .text:00401067 sub_401050 endp .text:00401067 .text:00401070 .text:00401070 sub_401070 proc near ; CODE XREF: _main + 26↑p .text:00401070 var_4 = dword ptr -4 .text:00401070 push ebp .text:00401071 mov ebp, esp .text:00401073 push ecx .text:00401074 mov [ebp + var_4], ecx .text:00401077 mov eax, [ebp + var_4] .text:0040107A mov ecx, [eax] .text:0040107C push ecx .text:0040107D push offset unk_4060FC .text:00401082 call _printf .text:00401087 add esp, 8 .text:0040108A mov esp, ebp .text:0040108C pop ebp .text:0040108D retn .text:0040108D sub_401070 endp
This listing doesn't contain anything principally new from the issues investigated in the previous section (see Listing 3.99). Here, 4 bytes are allocated for creating a new object, and the pointer to the newly-created object is passed through the ECX register (see the calls to the sub_401050 and sub_401070 functions). In the functions, the pointer to object is used for accessing the object property. For example, in sub_401050 the code shown in Listing 3.102 can be located.
Listing 3.102: Fragment of the sub_401050 function (Listing 3.101)
mov [ebp + var_4], ecx ; The object address is loaded ; into the stack variable. mov eax, [ebp + var_4] ; The address is loaded into the EAX register. mov ecx, [ebp + arg_0] ; The parameter value is loaded ; into the ECX register. mov [eax], ecx ; The parameter value is assigned to ; the object property.
It is time to consider the concepts of virtual functions and polymorphism. For this purpose, consider the program in Listing 3.103. In comparison to Listing 3.100, another method, seta1, has been added to the base class, and the seta and seta1 methods have been made virtual. If you are acquainted with object-oriented programming, you will immediately guess that in the A1 -> seta (10) and A1 -> seta1(10) calls, the methods of the derived class (the B class) will be used.
Listing 3.103: Program for studying the concepts of virtual functions and polymorphism
#include <stdio.h> class A { public: int a; virtual int seta(int a1){a = al; return a;}; virtual int setal(int a1){a = 2*al; return a;}; void pa(){printf("%d\n", a);} }; class B:public A { public: int seta(int a1) {a = al + 1; return a;}; int setal(int al){a = 2*al + 1; return a;}; }; void main() { A* A1; A1 = new(B); Al -> seta(10); Al -> pa{); Al -> setal(10); A1 -> pa(); delete A1; };
Listing 3.104 shows the disassembled executable code of the main function from Listing 3.103. As usual, the program was compiled using the Microsoft Visual C++ compiler with the "no optimization" option.
Listing 3.104: Disassembled code of the main function from Listing 3.103
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 var_10 = dword ptr -10h .text:00401000 var_C = dword ptr -0Ch .text:00401000 var_8 = dword ptr -8 .text:00401000 var_4 = dword ptr -4 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, 10h .text:00401006 push 8 .text:00401008 call ??2@YAPAXI@Z ; Operator new(uint) .text:0040100D add esp, 4 .text:00401010 mov [ebp + var_8], eax .text:00401013 cmp [ebp + var_8], 0 .text:00401017 jz short loc_401026 .text:00401019 mov ecx, [ebp + var_8] .text:0040101C call sub_4010A0 .text:00401021 mov [ebp + var_10], eax .text:00401024 jmp short loc_40102D .text:00401026 loc_401026: ; CODE XREF: _main + 17↑j .text:00401026 mov [ebp + var_10], 0 .text:0040102D loc_40102D: ; CODE XREF: _main + 24↑j .text:0040102D mov eax, [ebp + var_10] .text:00401030 mov [ebp + var_4], eax .text:00401033 push 0Ah .text:00401035 mov ecx, [ebp + var_4] .text:00401038 mov edx, [ecx] .text:0040103A mov ecx, [ebp + var_4] .text:0040103D call dword ptr [edx] .text:0040103F mov ecx, [ebp + var_4] .text:00401042 call sub_401080 .text:00401047 push OAh .text:00401049 mov eax, [ebp + var_4] .text:0040104C mov edx, [eax] .text:0040104E mov ecx, [ebp + var_4] .text:00401051 call dword ptr [edx + 4] .text:00401054 mov ecx, [ebp + var_4] .text:00401057 call sub_401080 .text:0040105C mov eax, [ebp + var_4] .text:0040105F mov [ebp + var_C], eax .text:00401062 mov ecx, [ebp + var_C] .text:00401065 push ecx .text:00401066 call j_free .text:0040106B add esp, 4 .text:0040106E xor eax, eax .text:00401070 mov esp, ebp .text:00401072 pop ebp .text:00401073 retn .text:00401073 _main endp
Note that when creating an object, 8 bytes are allocated (push 8) instead of the 4 bytes allocated earlier. Running some steps ahead, I'd like to mention that the additional 4 bytes located in the beginning of the object are allocated for the address of the table of virtual functions.
In the text, you can see that the check is carried out, ensuring that the new operator has allocated memory for the object to be created. If the memory allocation function returns zero (which means that an error has occurred), then the var_10 variable is assigned the value of zero, which finally must cause an exception (the mov edx, [ecx] command if the ECX register contains zero).
Pay special attention to the sub_4010A0 function (see Listing 3.106). This is a special function, which is not present in the program code. Its intention is to place the address of the virtual functions table into the start of the area allocated for the object to be created. The function will return the object address — but with the address of the virtual functions table (in the first 4 bytes). Later, the mechanism in Listing 3.105 will be used.
Listing 3.105: Mechanism used for forming the virtual function address
mov ecx, [ebp + var_4] ; The object's address mov edx, [ecx] ; The contents of the start of the area ; The object is loaded into the EDX register. ; The address of the virtual functions table mov ecx, [ebp + var_4] call dword ptr [edx] ; The call to the virtual function
Thus, you can see that the virtual function address, in contrast to a normal function, is formed dynamically. Any indirect call (no matter what the compiler might be) must make you vigilant, suspecting a call to a virtual function. In the program being studied, virtual functions are called twice: call dword ptr [edx] (the call to seta) and call dword ptr [edx + 4] (the call to setal). The EDX register points to the start of the virtual functions table. The table must contain the addresses of two virtual functions.
Consider Listing 3.106, where the disassembled text of the sub_4010A0 function is shown. The main goal of this function is to place the address of the virtual functions table into the start of the area that stores the object.
Listing 3.106: Disassembled code of the sub_4010A0 function
.text:004010A0 sub_4010A0 proc near ; CODE XREF: _main + 1C↑p .text:004010A0 var_4 = dword ptr -4 .text:004010A0 push ebp .text:004010A1 mov ebp, esp .text:004010A3 push ecx .text:004010A4 mov [ebp + var_4], ecx .text:004010A7 mov ecx, [ebp + var_4] .text:004010AA call ??Oios_base@std@@IAE@XZ .text:004010AF mov eax, [ebp + var_4] .text:004010B2 mov dword ptr [eax], offset off_407100 .text:004010B8 mov eax, [ebp + var_4] .text:004010BB mov esp, ebp .text:004010BD pop ebp .text:004010BE retn .text:004010BE sub_4010A0 endp
The function presented in the listing has an interesting organization. First, the address of the virtual functions table is defined and placed into the start of the area where the object is stored. These commands are in Listing 3.107.
Listing 3.107: Defining and placing the address of the virtual functions table
... mov [ebp + var_4], ecx ... mov eax, [ebp + var_4] mov dword ptr [eax], offset off_407100 ...
The memory area with the off_407100 address stores the virtual functions table. What about the call to the ??Oios_base@std@@IAE@xz function? Here, it is necessary to understand that the class hierarchy in the text program has only two levels: the base class, A, and the derived class, B. Tables of virtual functions are created for each class. The ??Oios_base@std@@IAE@xz function places the address of the virtual functions table of the base class into the object. In this case, the address is overwritten with the address of the virtual functions table of class B. Now assume that another class, c, is added to this hierarchy. This class is the descendant of class B. Create an object on the basis of the C class: A1=new(C). How would the compiler treat the tables of virtual functions in this case? Consider Listing 3.108.
Listing 3.108: Virtual functions table in a sophisticated class hierarchy
; The main function _main proc near ... call prod ; Now the object contains the address of the ; virtual functions table of class C. ... _main endp ... prod proc near ... call proc2 ; Now the object contains the address of the ; virtual functions table of class B. ... ; Now the object contains the address of the ; virtual functions table of class C. ... retn prod endp ... Proc2 proc near ... call ??Oios_base@std@@IAE@XZ ... ; Now the object contains the address of the ; virtual functions table of the base class. ... ; Now the object contains the address of the ; virtual functions table of class B. ... retn prod endp
Carefully consider the method presented in Listing 3.108. From this method, it must be clear how the addresses of virtual functions are written for the hierarchy of objects with an arbitrary number of members. In particular, from this listing it follows that the greater the number of members, the more time required to complete such an operation, because the number of nested procedures equals the number of classes, including the base class. Also note that the Microsoft Visual C++ compiler places the virtual functions tables of the parent and child classes one after another and that the base class has the greatest address (addresses grow from bottom to top). The functions are placed in the table from bottom to top, according to their declarations in the program text.
Thus, knowing the address of the virtual functions table, you'll easily locate the text of that function. For example, Listing 3.109 shows the code of the seta virtual function defined in class B.
Listing 3.109: Code of the seta virtual function defined in class B
.text:004010C0 sub_4010CO proc near ; DATA XREF: .rdata:off_407100↓o .text:004010C0 var_4 = dword ptr -4 .text:004010C0 arg_0 = dword ptr 8 .text:004010C0 push ebp .text:004010C1 mov ebp, esp .text:004010C3 push ecx .text:004010C4 mov [ebp + var_4], ecx .text:004010C7 mov eax, [ebp + arg_0] .text:004010CA add eax, 1 .text:004010CD mov ecx, [ebp + var_4] .text:004010DO mov [ecx + 4], eax .text:004010D3 mov edx, [ebp + var_4] .text:004010D6 mov eax, [edx + 4] .text:004010D9 mov esp, ebp .text:004010DB pop ebp .text:004010DC retn 4 .text:004010DC sub_4010C0 endp
The text of the seta procedure is clear. First, the object address, as before, is contained in the ECX register. The arg_0 parameter is the value that must be assigned to the a property (see Listing 3.103). Finally, when assigning the value, it must be increased by one (Listing 3.110).
Listing 3.110: When assigning the value, it must be incremented by one (Listing 3.109)
add eax, 1 mov ecx, [ebp + var_4] mov [ecx + 4], eax
From Listings 3.109 and 3.110, it is clear that the a property is located at the offset 4 bytes from the object start. This is correct, because the first 4 bytes contain the address of the virtual functions table. If optimization options are specified at compile time, the compiler reduces the code. For example, the procedures, where the virtual function address is written to the object, disappear because all operations are carried out directly in the main function (in the example under consideration). In addition, the address of the last class is immediately written into the object.
If you use the Borland C++ v. 5.0 compiler, you won't notice any significant differences. The address of the virtual functions table also is placed in the start of the object. In the course of initialization, it also is overwritten starting from the base class (or, to be precise, starting from the class, in which the virtual keyword appeared first) and ending with the current derived class.
The need for constructors and destructors is the logical consequence of the concept of object-oriented programming, especially as related to visual programming for the Windows operating system. There is an urgent need for automating some actions that must be carried out when objects are created and when they are destroyed. In particular, this relates to initial values of the object properties that cannot be initialized when they are declared, as was the general practice with the C programming language. On the other hand, calling the initialization procedure every time is bad programming style. [11] Until now, I have not used constructors or destructors in the examples. However, this cannot be said about the compiler. For example, determining the address of the virtual functions table is exactly the operation that must be carried out automatically. This means that you have already encountered constructors in this chapter! You first encountered a constructor in Listing 3.104. This is the sub_4010A0 procedure that was separately considered in Listing 3.106. The only intention of this procedure is writing the address of the correct virtual functions table into the start of the object. However, if you define the constructor explicitly and carry out specific actions there, these actions will be "in the same company" as the actions for defining the virtual functions table.
Consider the practical example presented in Listing 3.111. In this program, a single class A is defined, in which there is one variable, one virtual method, and special methods: constructor and destructor.
Listing 3.111: Practical example illustrating the use of constructors and destructors
#include <stdio.h> class A { public: int a; virtual void pa(){printf("%d\n", a);} A(){a = 1; printf("Constructor A\n");}; ~A(){printf("Destructor A\n");}; }; void main() { A* A1; A1 = new(A); Al -> pa () ; delete A1; };
The disassembled code of the main function of this program is presented in Listing 3.112. As before, this code was created using the Microsoft Visual C++ compiler with the "no optimization" option.
Listing 3.112: Disassembled code of the main function from Listing 3.111
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 var_18 = dword ptr -18h .text:00401000 var_14 = dword ptr -14h .text:00401000 var_10 = dword ptr -10h .text:00401000 var_C = dword ptr -OCh .text:00401000 var_8 = dword ptr -8 .text:00401000 var_4 = dword ptr -4 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, 18h .text:00401006 push 8 .text:00401008 call ??2@YAPAXI@Z ; Operator new(uint) .text:0040100D add esp, 4 .text:00401010 mov [ebp + var_8], eax .text:00401013 cmp [ebp + var_8], 0 .text:00401017 jz short loc_401026 .text:00401019 mov ecx, [ebp + var_8] .text:0040101C call sub_401070 .text:00401021 mov [ebp + var_14], eax .text:00401024 jmp short loc_40102D .text:00401026 loc_401026: ; CODE XREF: _main + 17↑j .text:00401026 mov [ebp + var_14], 0 .text:0040102D loc_40102D: ; CODE XREF: _main + 24↑j .text:0040102D mov eax, [ebp + var_14] .text:00401030 mov [ebp + var_4], eax .text:00401033 mov ecx, [ebp + var_4] .text:00401036 mov edx, [ecx] .text:00401038 mov ecx, [ebp + var_4] .text:0040103B call dword ptr [edx] .text:0040103D mov eax, [ebp + var_4] .text:00401040 mov [ebp + var_10], eax .text:00401043 mov ecx, [ebp + var_10] .text:00401046 mov [ebp + var_C], ecx .text:00401049 cmp [ebp + var_C], 0 .text:0040104D jz short loc_40105E .text:0040104F push 1 .text:00401051 mov ecx, [ebp + var_C] .text:00401054 call sub_4010CO .text:00401059 mov [ebp + var_18], eax .text:0040105C jmp short loc_401065 .text:0040105E loc_40105E: ; CODE XREF: _main + 4D↑j .text:0040105E mov [ebp + var_18], 0 .text:00401065 loc_401065: ; CODE XREF: _main + 5C↑j .text:00401065 xor eax, eax .text:00401067 mov esp, ebp .text:00401069 pop ebp .text:0040106A retn .text:0040106A _main endp
The structure of this listing must already be well known to you. However, I'd like to draw your attention to certain issues that have not been covered in sufficient detail yet. On the basis of all previous material, it is clear that the sub_401070 function is the constructor. This is confirmed first by the constructor's proximity to the new operator. Also, important is that after execution of the new operator the check is carried out if the memory has actually been allocated. Consider all listings provided in Section 3.2.4 more closely. You'll see that such checks appear only with the introduction of virtual functions. However, you already know that if virtual functions are present, the compiler creates a constructor even if it wasn't present in the source code of the program. This probably is the main indication of the constructor call, because the compiler won't allow a constructor to execute if the object hasn't been created.
Consider the end of the function. The call to sub_4010C0 immediately attracts attention. Earlier, in similar programs, there was the call to the j__free function, which was associated with the delete operator (see Listing 3.99 and the comments that follow it). In this case, you are dealing with the destructor, which will be considered in detail in Listing 3.116. Note that the compiler again won't allow the destructor to execute if the object hasn't been created.
As relates to the remaining code, hopefully you'll easily recognize the call dword ptr [edx] command as a call to the A1 -> pa() virtual function. This mustn't cause you any problems after you consider all previously provided examples.
Now, consider the contents of the constructor (Listing 3.113).
Listing 3.113: Disassembled code of the constructor (Listing 3.111)
.text:00401070 sub_401070 proc near ; CODE XREF: _main + 1C↑p .text:00401070 var_4 = dword ptr -4 .text:00401070 push ebp .text:00401071 mov ebp, esp .text:00401073 push ecx .text:00401074 mov [ebp + var_4], ecx .text:00401077 mov eax, [ebp + var_4] .text:0040107A mov dword ptr [eax], offset off_40710C .text:00401080 mov ecx, [ebp + var_4] .text:00401083 mov dword ptr [ecx + 4], 1 .text:0040108A push offset aConstructorA ; "Constructor A\n" .text:0040108F call _printf .text:00401094 add esp, 4 .text:00401097 mov eax, [ebp + var_4] .text:0040109A mov esp, ebp .text:0040109C pop ebp .text:0040109D retn .text:0040109D sub_401070 endp
You must rejoice when viewing this listing. Consider the well-known code fragment (Listing 3.114).
Listing 3.114: Writing the virtual functions table address into the new object instance
mov [ebp + var_4], ecx mov eax, [ebp + var 4] mov dword ptr [eax], offset off_40710C
This is nothing other than writing the address of the virtual functions table into the newly-created object instance. Then there is another easily recognizable pattern (Listing 3.115).
Listing 3.115: Code fragment demonstrating the code of the constructor
; Set the property value (a = 1). mov dword ptr [ecx + 4], 1 ; The call to the printf function push offset aConstructorA ; "Constructor A\n" call _printf
This is the code that was placed into the constructor (see Listing 3.111). There isn't anything to add here because the entire pattern is clear.
Listing 3.116 presents the disassembled code of the destructor (or, to be precise, the procedure, from which the true destructor will be called) from the program shown in Listing 3.111.
Listing 3.116: Disassembled code of the procedure calling the destructor (Listing 3.111)
.text:004010C0 sub 4010C0 proc near ; CODE XREF: _main + 54↑p .text:004010C0 var_4 = dword ptr -4 .text:004010C0 arg_0 = dword ptr 8 .text:004010C0 push ebp .text:004010C1 mov ebp, esp .text:004010C3 push ecx .text:004010C4 mov [ebp + var_4], ecx .text:004010C7 mov ecx, [ebp + var_4] .text:004010CA call sub_4010F00 .text:004010CF mov eax, [ebp + arg_0] .text:00401OD2 and eax, 1 .text:00401OD5 jz short loc_4010E3 .text:004010D7 mov ecx, [ebp + var_4] .text:004010DA push ecx .text:004010DB call j_free .text:004010EO add esp, 4 .text:004010E3 loc_4010E3: ; CODE XREF: sub_4010CO + 15↑j .text:004010E3 mov eax, [ebp + var_4] .text:004010E6 mov esp, ebp .text:004010E8 pop ebp .text:004010E9 retn 4 .text:004010E9 sub_4010CO endp
First, pay attention to the following two calls in Listing 3.116: sub_4010F0 and j__free. View sub_4010F0, and you'll immediately notice that it simply contains the text that was placed into the destructor (see Listing 3.111). Thus, the sub_4010C0 function is intended for calling procedures required to destroy an object. You have already encountered the j__free function, and you already know that it is simply an implementation of the delete operator. Finally, the value 1 sent to the sub_4010C0 function as a parameter is simply an indication that it is necessary to call the delete operator.
You have encountered mathematical computations many times. Compilers often use different techniques, such as computation at compile time, to reduce the code size or speed up program execution. You also already know about the possibility of using FPU commands. In this section, several other techniques will be covered. First, consider the example program shown in Listing 3.117.
Listing 3.117: Sample program containing a simple numeric computation
#include <stdio.h> #include <windows.h> void main() { DWORD a, b, c; scanf("%d", &a); scanf("%d", &b); c=((a + b)/8)*(3*a); printf("%d\n", c); };
Listing 3.117 provides a program that contains a simple arithmetic computation. Consider how the Microsoft Visual C++ compiler treats this program when the "create fast code" option was specified at compile time. The disassembled code of the compiled program is shown in Listing 3.118.
Listing 3.118: Disassembled code of the program shown in Listing 3.117
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 var_8 = dword ptr -8 .text:00401000 var_4 = dword ptr -4 .text:00401000 sub esp, 8 .text:00401003 lea eax, [esp + 8 + var_8] .text:00401006 push eax .text:00401007 push offset unk_408100 .text:0040100C call _scanf .text:00401011 lea ecx, [esp + 10h + var_4] .text:00401015 push ecx .text:00401016 push offset unk_408100 .text:0040101B call _scanf .text:00401020 mov ecx, [esp + 18h + var_8] .text:00401024 mov edx, [esp + 18h + var_4] .text:00401028 lea eax, [edx + ecx] .text:00401025 shr eax, 3 .text:0040102E imul eax, ecx .text:00401031 lea eax, [eax + eax*2] .text:00401034 push eax .text:00401035 push offset unk_4080FC ,text:0040103A call _printf .text:0040103F xor eax, eax .text:00401041 add esp, 20h .text:00401044 retn .text:00401044 _main endp
Note that stack variables in this fragment are addressed using the ESP register. Thus, the var_4 and var_8 stack variables correspond to the a and b variables (see Listing 3.117). Consider the sequence of commands in Listing 3.119.
Listing 3.119: Code pattern using lea for optimization of mathematical computations
mov ecx, [esp + 18h + var_8] mov edx, [esp + 18h + var_4] lea eax, [edx + ecx]
The main issue here is the use of the lea command. Uncommon properties of this command were already covered in Chapter 1 (see Table 1.2). Now you'll see this command in action. You'll usually encounter this command when it is necessary to optimize arithmetic operations.
Then there is the shr eax, 3 command. You already know that it is simply an integer divided by eight (two raised to the power of three). This command is executed much faster than IDIV. Later, there is the imul eax, ecx command that multiplies the contents of EAX by the contents of ECX. Finally, you encounter the lea command again: lea eax, [eax + eax*2]. This command stands for multiplication of the contents of the EAX register by three.
At this point, the section on mathematical computations has been completed. I hope that you'll easily master on your own other methods of optimizing computations, in particular, using bitwise commands when investigating source code.
The mechanisms of exception handling at the level of executable code are complicated. Furthermore, details of their implementation might considerably differ for different compilers. A detailed investigation of these mechanisms goes beyond the goals of this book. My goals are more modest: They are to make you acquainted with some basic principles of exception handling and to study how these mechanisms might be reflected in executable code.
Exception handling generated by compilers is based on so-called structured exception handling (SEH). The SEH mechanism is supported at the level of the Windows operating system. The foundation for exception handling is the following: When the operating system of the Windows family runs on an Intel processor, the Fs segment register plays a special role. It points to the thread environment block (TEB).[12] This block, in turn, contains several substructures, one of which is the thread information block (TIB). TIB is stored in the beginning of TEB. Finally, the 4-byte value located in the beginning of the TIB structure represents the address of some structure that, first, contains the exception handler address and, second, contains the address of the previous structure of the same type. In reality, the problem is the linked list. This topic will be covered in detail later in this chapter (see the comments that follow Listing 3.121).
The following conclusions can be drawn on the basis of the preceding information:
Each thread has an individual exception handler.
Knowing the address, by which the pointer to the exception handler is located, the program can set its own handler procedure.
In essence, these two facts are the foundation for the exception handling mechanisms used by compilers creating executable code intended to run under Windows. Although these are easy to understand, the exception-handling mechanism might be rather complicated.
When describing implementations of the exception handling mechanisms in different compilers, I mean the __try/__except pair of operators implemented in C++.[13] Consider the example program presented in Listing 3.120. This program is trivial. The __try block is used to avoid division by zero when computing the quotient resulting from division of a by b.
Listing 3.120: Sample program illustrating the exception handling mechanism
#include <stdio.h> #include <windows.h> int main() { int a, b; scanf("%d", &a); scanf("%d", &b); __try { a = a/b; printf("%d\n", a); } __except(0) { printf("Error 1! \n"); }; return 0; }
The code of this program disassembled by IDA Pro is presented in Listing 3.121. The program was compiled using Microsoft Visual C++ with the "create fast code" option.
Listing 3.121: Disassembled code of the program presented in Listing 3.119
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 var_20 = dword ptr -20h .text:00401000 var_lC = dword ptr -1Ch .text:00401000 var_18 = dword ptr -18h .text:00401000 var_10 = dword ptr -1Oh .text:00401000 var_4 = dword ptr -4 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 push OFFFFFFFFh .text:00401005 push offset byte_408110 .text:0040100A push offset __except_handler3 .text:0040100F mov eax, large fs:0 .text:00401015 push eax .text:00401016 mov large fs:0, esp .text:0040101D sub esp, l0h .text:00401020 push ebx .text:00401021 push esi .text:00401022 push edi .text:00401023 mov [ebp + var_18], esp .text:00401026 lea eax, [ebp + var 1C] .text:00401029 push eax .text:0040102A push offset aD_0 ; "%d" .text:0040102F call _scanf .text:00401034 lea ecx, [ebp + var_20] .text:00401037 push ecx .text:00401038 push offset aD_0 ; "%d" .text:0040103D call _scanf .text:00401042 add esp, 10h .text:00401045 mov [ebp + var_4], 0 .text:0040104C mov eax, [ebp + var_1C] .text:0040104F cdq .text:00401050 idiv [ebp + var_20] .text:00401053 mov [ebp + var_1C], eax .text:00401056 push eax .text:00401057 push offset aD ; "%d\n" .text:0040105C call _printf .text:00401061 add esp, 8 .text:00401064 jmp short loc_401079 .text:00401066 ;----------------------------------------------------- .text:00401066 xor eax, eax .text:00401068 retn .text:00401069 ;----------------------------------------------------- .text:00401069 mov esp, [ebp - 18h] .text:0040106C push offset aErrorl ; "Error 1! \n" .text:00401071 call _printf .text:00401076 add esp, 4 .text:00401079 loc_401079: ; CODE XREF: _main + 64|j .text:00401079 mov [ebp+var_4], OFFFFFFFFh .text:00401080 xor eax, eax .text:00401082 mov ecx, [ebp + var_10] .text:00401085 mov large fs:0, ecx .text:0040108C pop edi .text:0040108D pop esi .text:0040108E pop ebx .text:0040108F mov esp, ebp .text:00401091 pop ebp .text:00401092 retn .text:00401092 _main endp
Pay attention to the presence of the standard function prologue, although in normal conditions standard prologues and epilogues are discarded if compilation was carried out with optimization options. Note that IDA Pro specifies five stack variables even though there were only two of them in the source code of the program. Briefly viewing the code, it is easy to determine (for example, by the calls to scanf) that var_1c corresponds to the a variable and var_20 corresponds to the b variable. Although only part of the function code can be enclosed by the __try block, this block influences the entire function.
The standard prologue is followed by an interesting sequence of commands (Listing 3.122). This code fragment deserves special comments.
Listing 3.122: __try block nesting level and exception handling structures
push OFFFFFFFFh ; var_4 push offset byte_408110 ; push offset __except_handler3 ; Address of the new exception handler mov eax, large fs:0 ; Address of the previous ; handler structure push eax ; var_10 mov large fs:0, esp ; Address of the new handler
First, some arbitrary constant is sent into the stack. Later, this value will be changed. This value will determine the nesting level of the __try block. Next, the address of some global memory cell is placed into the stack. The third element is the address of the exception handler formed by the compiler. The last three commands are interesting. The address of the old exception handling structure is placed into the stack, then the address of the entire group composed of 4 double words (this is the new exception handling structure) in the stack is placed into fs: 0. This operation adds a new record to a linked list. This list might contain the entire chain of exception handlers. The last structure in the list must contain the value OFFFFFFFFH in the exception handler address. Also note that the stack contains the old EBP value.
Note the mov [ebp + var_4], 0 instruction. It directly precedes the command that carries out the division operation. This command specifies the nesting level of the __try block. Note that the nesting level is determined by the number of __try blocks in the function, whether one block is nested within another block or not. Trace the content of the var_4 variable because it is the key to the structure of the function's __try blocks.
The value of the ESP register is saved when the stack of the function is formed completely: mov [ebp + var_18], esp. This value is used for restoring the stack in the __except block (see Listing 3.123).
Listing 3.123: Customary sequence that precedes the start of the _except block
jmp short loc_401079 xor eax, eax retn
Without diverting to consider standard Assembly commands and calls to the scanf and printf functions, proceed with considering the __except block. You won't encounter any difficulties locating the starting and ending points of this block, because the starting point of this block is always preceded by the customary sequence of commands shown in Listing 3.123.
These commands will always be present, no matter which optimization variant of the Microsoft Visual C++ compiler is used. The end of the __except block is defined by the jmp command.
If you resort to the Borland C++ v. 5.0 compiler, the external manifestation of exceptions also will be clear and easily recognizable, even though a slightly different exception handling mechanism is used. The first indication is the presence of the exception setting in the beginning of the block procedure (replacement of the stored value with the FS:0 address). The second indication is the presence of a typical fragment in the center of the procedure. There is no jump to that fragment, and the fragment is bypassed by the JMP command. Finally, in the end of the procedure the command restoring the old FS:0 value must be present.
Earlier in this chapter, the main function — the function, from which the program starts — was treated as if there were no problems with determining its address. IDA Pro doesn't have any trouble with determining the address of the main function in the executable code of a program compiled using any C++ compiler. However, situations are possible, in which IDA Pro is not close at hand. Furthermore, there are other programming languages, for which the situation is not as simple as for C++.
Any program written in any algorithmic programming language is made up of a certain set of commands. For C++, these are the main or winMain functions. However, it would be an error to consider that the executable code of the program starts execution exactly from this set of commands. As a rule, the compiler inserts some start-up code before it, which contains the calls to library and API functions. This code carries out some preparations: For instance, it requests the memory from the system, defines the command-line parameters, and obtains the identifier of the executable module. Only after completing this is control passed to the initial commands described before. As I have already mentioned, disassemblers (and debuggers) are not always capable of correctly determining where these commands start.
Note | When speaking about the Microsoft Visual C++ compiler, for a console application the start-up codes begin execution from the mainCRTStartup function, which passes control to the main function. For GUI applications, program execution starts from the WinMainCRTStartup function. After execution, the WinMainCRTStartup function passes control to the winMain function. There also is a technique allowing you to create executable modules that start execution directly from the main or from the winMain function. This allows considerable reduction of the executable module size (the actual size of such a module becomes smaller than the size of standard libraries). However, this is not a common practice, because in this case the programmer would intentionally reject the possibility of using standard C libraries. |
To find the start of the user part of the executable code, it is necessary to know how this call is executed. First, consider a console application written in the C++ programming language. In general, the prototype of the main function appears as shown in Listing 3.124.
Listing 3.124: Standard prototype of the main function of a console application in C++
int main( int argc[ , char *argv[ ] [, char *envp[ ] ] ] );
In general, the main function has three input parameters. This is an important indication. For example, consider a typical call to the main function from the executable code created by the Microsoft Visual C++ compiler (Listing 3.125).
Listing 3.125: Typical main function call from executable code from Microsoft Visual C++
mov eax, dword_40A724 mov dword_40A728, eax push eax push dword_40A71C push dword_40A718 call _main add esp, OCh
Note that all three parameters turned out to be global variables. This issue is important. Now, it is necessary to study, which library and API functions are called before and after the call to the main function. Armed with this knowledge, you'll be able to easily find the required location within a program. The call to the GetCorrmandLine API function deserves special mention. Using this function, it is possible to obtain the command-line parameters. These parameters are then passed in the second parameter of the main function. What would happen if the main function doesn't have any parameters or if it has one or two parameters? The answer to this question is as follows: Nothing would be changed, and the call will be carried out in the same way. Thus, search criteria also do not change. However, when dealing with the Borland C++ compiler, the situation is more complicated. The call to the main function is carried out by an indirect call appearing like CALL [ESI + N]. If the disassembler doesn't locate and identify the main function automatically, the code investigator has to use the debugger. However, all previously-mentioned considerations related to the preliminary call to library and API functions (for example, GetCommandILine) are applicable even in this case. Knowing these functions, you'll be able to locate the required code section and then find CALL [ESI + N] or a similar call. In this case, it will also be necessary to use the debugger and breakpoints, because manual analysis of the code aimed at locating the address, by which the call is carried out is a tedious procedure.
Consider the winMain function. The prototype of this function is provided in Listing 3.126.
Listing 3.126: Prototype of the WinMain function
int WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR 1pCmdLine, int nCmdShow ); _____________________________________________
As you can see, this function has four parameters. This indication is an important search criterion for searching for the start of the program. Listing 3.127 presents a typical fragment demonstrating the call of the winMain function from the executable module created by the Microsoft Visual C++ compiler.
Listing 3.127: Typical WinMain call from the executable module from Microsoft Visual C++
Push eax push dword ptr [ebp - 20h] push esi push esi ; 1pModuleName call edi ; GetMcduleHandleA push eax call _WinMain@16 ; WinMain(x, x, x, x) mov edi, eax mov [ebp - 2Ch], edi cmp [ebp - 1Ch], esi jnz short loc_401508 push edi ; int call _exit
The fragment presented in Listing 3.127 is so typical that it is worth memorizing. This search criterion allows you to find the entry point to the executable code of the program without failure. The call to winMain function in a module created by the Borland C++ compiler is executed using the CALL [ESI + N] command, as was the case for the main function. Also, the GetModuleHandle API function is called before this call.
In Delphi, the execution starts from the main module of the program (BEGIN...END.); however, before this module there are calls to one or more procedures that initialize start-up. In particular, one such procedure calls the GetModuleHandle API function.
[7]In high-level programming languages, the commonly-adopted practice is to distinguish between procedures and functions. From the standpoint of the disassembled text, there is no difference between these two concepts.
[8]One parameter of the double type must be interpreted as two 32-bit parameters.
[9]The number enclosed in parentheses specifies the nesting level.
[10]Both methods and variables are called class members.
[11]When creating a new object, the programmer might forget to call the initialization procedure or might call it more than once.
[12]Recall that in the protected mode, segment registers store selectors instead of addresses. Selectors are numbers in the descriptors table that define addresses.
[13]This is a standard for the C++ programming language; therefore, all C++ compilers support these operators, although implementations might be different.