3.2. Identifying Program Structures

Understanding the program structure of the executable module is often more important than recognizing variables, because it allows you to understand the program's operating logic.

3.2.1. Procedures and Functions

You have already encountered procedures and functions ^[7] many times. The main goal of this section is to generalize accumulated experience and investigate new features.

Passing Parameters

Until now, it was silently assumed that data are passed to the procedure through the stack. This mechanism, which will be considered in the next section, is common. However, this approach is not the only available one.

For the moment, abstract from compilers and simply consider how and in which way it is possible to pass parameters to the procedure. If you are working in Assembly, you'll be able to add all of these mechanisms to your arsenal. Furthermore, nothing can prevent you from combining several such mechanisms simultaneously. However, when working with compilers created for high-level programming languages, it is necessary to account for generally-adopted conventions, which will be covered in the next few sections.

Passing Parameters through the Stack

Passing parameters through the stack is the most common and widely used mechanism. This approach allows you to create recursive procedures, but the use of other approaches makes recursion problematic. As a rule, parameters are placed into the stack using the PUSH commands. However, another method is possible, which you have encountered multiple times. It is possible to manually change the value of the stack pointer and then use normal MOV commands to place the parameters into the allocated region. For example, if two parameters are loaded into the EAX and EBX registers, respectively, then it is possible to place them into the stack using the following sequence of commands: SUB ESP, 8/MOV [ESP], EAX/MOV [ESP], EBX. This is equivalent to the two PUSH EAX/PUSH EBX commands (recall that the stack grows upward in the direction of smaller addresses).

When passing parameters, the most important issue is the order, in which the parameters appear in the stack. When receiving parameters from the stack, the called procedure follows a strictly defined order, which must be observed when calling that procedure. However, this is only one problem. The second problem is clearing the stack. After the called procedure has executed all required operations and returned control into the calling program fragment, the parameters passed to the procedure remain in the stack. If the procedure is called multiple times, this, in the long run, might crash the program. There are two practical approaches to solving this problem. The first method is used only in the C++ programming language. Using this method, the stack is released after the return from the called procedure. It is convenient because it is possible to use procedures with a variable number of parameters.

Note

The printf standard C library function is an example of such a procedure. The first parameter of this function is always a string that might contain special substrings (called format specifiers). Format specifiers start with the % character. The number of such specifiers is equal to the number of additional parameters of the printf function.

As a rule, the stack is restored using the ADD ESP, 4*N command, where N is the number of 32-bit parameters. ^[8] However, alternative ways, such as using SUB ESP, -4*N or even PoP commands, are possible. It is important to understand their meaning. Sometimes the compiler, for economy, restores the stack after calling several procedures.

The second method of stack recovery consists of using the RETN 4*N command immediately after exiting the procedure. Again, N specifies the number of 32-bit parameters. This approach was initially used in Pascal compilers. This approach is slightly faster. However, it makes it problematic to call a procedure with a variable number of parameters.

Passing Parameters through the Data Segment

The use of global variables for passing information into a procedure suggests itself. However, this approach is a persistent source of headaches. To avoid errors, you'll have to allocate an individual set of global parameters for every procedure, which requires additional memory resources. The use of global variables also makes recursive calls problematic, because you won't be able to use the same variables if they are already in use. However, this drawback doesn't mean that this approach is not used. Nothing prevents you from using it when writing a program in C++ or Delphi.

The preceding approach can be improved by using a specially organized memory block for passing parameters. You'll probably have to organize such a block individually for each procedure, although, in theory, it is possible to create a structure of universal buffer for passing parameters to all called procedures. The structure of such a buffer can be organized to make it possible to use recursive procedure calls.

Passing Parameters Through Program Code

Passing parameters through program code looks somewhat exotic. However, it is a realistic method, provided that you use Assembly language. For example, consider the algorithm in Listing 3.45.

Listing 3.45: Algorithm for passing parameters through the program code

 ... CALL PROC1       DB "This parameter is passed through the program code", 0 ; The PROC1 procedure will return control here. ... PROC1 PROC ; Pop the return address from the stack. ; Define the parameter addresses and length. ; Modify the return address in the stack. ; Process. Return from the procedure.       RETN PROC1 ENDP

As you can see, this method doesn't contain anything too difficult or impossible. However, its implementation in a high-level programming language requires additional effort.

Passing Parameters Through Registers

The method of passing parameters through registers is fast. However, it has certain limitations, because registers are few. This approach is mainly used with other mechanisms, such as passing parameters through the stack. When using this combined approach, the first parameters are usually passed through registers and the remaining parameters are passed through the stack. Later in this chapter, this approach will be covered in more detail.

Conventions for Passing Parameters

Consider compilers of high-level programming languages. As you would expect, they mainly pass parameters through the stack. The main calling conventions used by contemporary compilers are listed in Table 3.2.

Table 3.2: Standard calling conventions used by contemporary compilers
Calling convention	Order of parameters	Stack-clearing method	Comment
C convention (__cdecl)	From right to left	By the calling program	The compiler automatically inserts the underscore character (_) before the function name.
Standard calling convention (__stdcall)	From right to left	By the called procedure	The compiler automatically inserts the underscore character (_) before the function name. The function name is terminated by the @ suffix followed by the number specifying the total length of all parameters (in bytes).
Pascal calling convention (PASCAL)	From left to right	By the called procedure	This calling convention is used in Pascal and Delphi.
Fast calling convention, also known as register call (__fastcall)	From left to right	By the called procedure	Microsoft's C++ compiler employs two registers (ECX and EDX). If this is not enough for passing all parameters, then the remaining parameters are passed through the stack. The Borland C++ compiler uses three registers (EAX, EDX, and ECX).

Note

The calling conventions listed in Table 3.2 are not the only available ones. In different programming languages, there are language-specific conventions. For example, Delphi supports the safecall convention, and Basic has its individual calling convention. Some calling conventions have been gradually moved out of use. For example, the Pascal (__pascal) calling convention is no longer supported in Microsoft Visual C++.

When writing programs in C++, the most common calling conventions are __cdecl (when working with normal and library functions) and __stdcall (when calling most API functions).

As an illustration of the use of register calling conventions (fast function calls), consider the simple program shown in Listing 3.46.

Listing 3.46: Simple program illustrating the use of the__fastcall calling convention

 #include <stdio.h> int __fastcall add(int, int, int); void main() {         int i = 10, j = 20, k = 30;         printf("%d\n", add(i, j, k)); }; int__fastcall add(int a, int b, int c) {         return a + b + c; };

As you can see, the program in Listing 3.46 contains a function declared as __fastcall. First, consider the disassembled text of the executable code of this program produced by the Microsoft Visual C++ compiler (Listing 3.47).

Listing 3.47: Disassembled text (Listing 3.46) produced by Microsoft Visual C++

 .text:00401000 _main        proc near          ; CODE XREF: start + 16E↓p .text:00401000       var_C  = dword ptr -OCh .text:00401000       var_8  = dword ptr -8 .text:00401000       var_4  = dword ptr -4 .text:00401000              push    ebp .text:00401001              mov     ebp, esp .text:00401003              sub     esp, 0Ch .text:00401006              mov     [ebp + var_4], 0Ah .text:0040100D              mov     [ebp + var_C], 14h .text:00401014              mov     [ebp + var_8], lEh .text:0040101B              mov     eax, [ebp + var_8] .text:0040101E              push    eax .text:0040101F              mov     edx, [ebp + var_C] .text:00401022              mov     ecx, [ebp + var_4] .text:00401025              call    sub_401040 .text:0040102A              push    eax .text:0040102B              push    offset unk_4060FC .text:00401030              call    _printf .text:00401035              add     esp, 8 .text:00401038              xor     eax, eax .text:0040103A              mov     esp, ebp .text:0040103C              pop     ebp .text:0040103D              retn .text:0040103D_main         endp

The code presented in this listing is well known. However, there is one issue that you did not encounter earlier. According to the program (see Listing 3.46), the add function must have three parameters. Obviously, sub_401040 corresponds to the add function. Later, the mov eax, [ebp + var_8] /push eax commands send the last variable into the stack (this is the k variable). The values of the i and j variables are placed into the ECX and EDX registers, respectively. This corresponds to the fastcall calling convention typical for the Microsoft Visual C++ compiler. The documentation supplied with the compiler states that it uses the __fastcall calling convention whenever possible. As you can see, this is true. If the number of parameters is increased, then the compiler will pass the remaining parameters in a normal way, namely, through the stack. This can be easily explained because the procedure that will be called also needs registers. So, as the number of parameters is increased, the number of general-purpose registers will not be enough and it will be necessary to create local stack variables.

Listing 3.48 presents the disassembled code of the same program compiled using the Borland C++ compiler.

Listing 3.48: Disassembled code (Listing 3.46) compiled using the Borland C++ compiler

 .text:00401108 _main        proc near       ; DATA XREF: .data:0040A0B8;↓o .text:00401108        argc  = dword ptr l0h .text:00401108        argv  = dword ptr 14h .text:00401108        envp  = dword ptr 18h .text:00401108                push    ebx .text:00401109                push    esi .text:0040110A                push    edi .text:0040110B                mov     ebx, OAh .text:00401110                mov     esi, 14h .text:00401115                mov     edi, 1Eh .text:0040111A                mov     ecx, edi .text:0040111C                mov     edx, esi .text:0040111E                mov     eax, ebx .text:00401120                call    sub_401138 .text:00401125                push    eax .text:00401126                push    offset format   ; Format .text:0040112B                call    _printf .text:00401130                add     esp, 8 .text:00401133                pop     edi .text:00401134                pop     esi .text:00401135                pop     ebx .text:00401136                retn  .text:00401136 _main          endp

As you can see from Listing 3.48, the Borland C++ compiler sends parameters sequentially into the EAX, EDX, and ECX registers. Note that the Borland's compiler uses register variables in the EBX, EST, and EDI registers instead of stack variables. In contrast to the Microsoft Visual C++ compiler, the Borland C++ compiler is serious about the __fastcall modifier and doesn't neglect the instruction for using registers as the number of parameters increases.

Stack Structures

Throughout this chapter, I have provided lots of different listings, in which I try to draw your attention to the locations of the return address, parameters, and local and temporary variables within the stack. The main goal of this section is to generalize accumulated experience and supply new information.

The standard stack structure in the course of a procedure call is shown in Fig. 3.2. This illustration shows the stages that the stack undergoes. The process of stack modification starts from the procedure call (stages 1–3), during which the parameters are placed into the stack and the procedure is called. During stages 4–5, memory is allocated and the registers that will be used within the procedure, whose values must not be changed after the call, are saved into the stack.

image from book
Figure 3.2: Standard stack structure in the course of a procedure call

Consider the stages shown in Fig. 3.2 in more detail:

Usually, parameters are placed into the stack using reg32 or PUSH DWORD PTR mem commands, where reg32 is a 32-bit register and mem is the address of the memory area (direct or indirect). However, another method of placing parameters into the stack is possible. First, the area for parameters is allocated in the stack. This can be carried out, for example, as follows: SUB ESP, N. Here, Nn is the number of bytes required for storing parameters, aligned by the 4-byte boundary. Then, the parameters are loaded into the stack using standard MOV commands. For example, this task can be carried out as follows: MOV DWORD PTR [ESP], EAX/MOV DWORD PTR [ESP + 41, EBX, etc. When dealing with double operands (which are 8 bytes in size), the ESTP command is used to place them into the stack, for example: FSTP DWORD PTR [ESP]. Thus, 8 bytes from the ST(0) FPU register will be sent to the stack (see Listing 3.12 and the comments that follow it).
The CALL command places the return address into the stack directly after parameters (if there are any). To correctly return from the procedure, this address must be located on the top of the stack. In addition, the CALL command jumps to the address specified to it. Now all work related to stack modification is delegated to the procedure. As a rule, the procedure starts with the PUSH EBP command. This command immediately assumes further use of EBP, and this register probably will be used for addressing the stack variable and parameters. The presence of the Mov EBP, ESP command confirms this assumption. For what purpose is this necessary? The ESP register is bound to the PUSH and POP commands that change it automatically. Consequently, if the parameter nearest to the stack top was located in the start of the procedure at the [ESP + 4] address, then after the execution of the PUSH command it will be located at the [ESP + 8] address. Thus, the EBP register is used to fix the reference point, from which locations of the parameters and stack variables are counted.
The next step in the procedure of forming the stack structure is allocation of the memory area for storing local variables. Note that if the use of local variables is not presumed, then the compiler skips this step. As a rule, stack allocation is carried out by the SUB ESP, N command, where N stands for the number of allocated bytes, aligned by the 4-byte boundary. In some cases, however, it is possible to use the ADD ESP, -N command or several PUSH commands. The use of the PUSH command is convenient, because within the same command it is possible to combine stack allocation and variable initialization (see Listing 3.36 and the comments that follow it). The sequence of the PUSH EBP/Mov EBP, ESP/SUB ESP, N commands can be replaced with a single ENTER N command, which, however, is rarely used by the compilers because of its slowness.
Finally, if it is presumed that the EBX, EST, and EDT registers are used in the procedure, they also must be saved in the stack.
In the end of the procedure, the stack must be returned to the state, in which the address of return from the procedure was located on its top. In addition, it is necessary to restore the EBP, EBX, EST, and EDT registers (provided that they were modified). The most common is the sequence of Mov ESP, EBP/POP EBP commands, which the compiler often replaces with a single leave command.
If the preceding method was strictly observed, then there will be no problems with recognizing the procedure in the course of disassembling, even if the procedure was called using indirect call commands (CALL reg32, CALL [reg32], and CALL[mem]). However, contemporary compilers, because of optimization, abandon the use of the EBP register for addressing stack variables and parameters (see Listing 3.38 and the comments that follow it).

In my opinion, the most interesting issue is the one related to nested procedures. In C++, nested functions are not possible. Pascal, in contrast, allows such constructs (Listing 3.49).

Listing 3.49: Pascal program with nested procedures

 program Projectl; var a:integer; procedure procl(al:integer); var  b, g, d, e:integer;   procedure proc2(al:integer);   var c:integer;   begin     c := 30;     writeln(al, b, c, d, e, g);   end; begin    b := 20; g  30; d:= 40; e  := 50;    proc2(al); end; begin   a := 10;   procl(a); end.

Listing 3.50 provides the disassembled starting (main) part of the program (see Listing 3.49) compiled using Delphi.

Listing 3.50: Disassembled code of the program shown in Listing 3.49, compiled using Delphi

 CODE:004039B4                public start CODE:004039B4     start: CODE:004039B4                push    ebp CODE:004039B5                mov     ebp, esp CODE:004039B7                add     esp, OFFFFFFFOh CODE:004039BA                mov     eax, ds:off_4040A8 CODE:004039BF                mov     byte ptr [eax], 1 CODE:004039C2                mov     eax, offset dword_403994 CODE:004039C7                call    sub_403860 CODE:004039CC                mov     ds:dword_40565C, OAh CODE:004039D6                mov     eax, ds:dword_40565C CODE:004039DB                call    sub_403938 CODE:004039EO                call    sub_403394

Listing 3.50 shows the starting part of the program (see Listing 3.49). Of the three procedure calls shown in this listing, one is the call to the procedure directly present in the application program (procl). This is the sub_403938 procedure. The other two procedures are system procedures executed when starting (start-up initialization) and when exiting the program. The sub_403938 procedure obtains its only parameter through the EAX register. The __fascall calling convention is "flourishing" in Delphi, although it wasn't declared in the program. I have even declined optimization when compiling this program. However, as you can see, Delphi made an independent decision. The dword_40565C name corresponds to the a variable in the program source code, and it is the one passed to the procedure through the register. Also, pay attention to the add esp, 0FFFFFFF0h command. I hope that you without trouble can guess that this is the add esp, -16 command, which is equivalent to sub esp, 16. In other words, 16 bytes are reserved.

Listing 3.51 provides the disassembled text of the compiled procl (sub_403938) procedure.

Listing 3.51: Disassembled text of the compiled proc1 procedure

 CODE=00403938  sub_403938    proc near       ; CODE XREF: CODE:004039DB↓p CODE:00403938        var_14  = dword ptr -14h CODE:00403938        var_10  = dword ptr -10h CODE:00403938        var_C   = dword ptr -0Ch CODE:00403938        var_8   = dword ptr -8 CODE:00403938        var_4   = dword ptr -4 CODE=00403938               push    ebp CODE:0040393E               mov     [ebp + var_14], eax CODE:00403941               mov     [ebp + var_4], 14h CODE:00403948               mov     [ebp + var_10], 1Eh CODE:0040394F               mov     [ebp + var_8], 28h CODE:00403956               mov     [ebp + var_C], 32h CODE:0040395D               push    ebp CODE:0040395E               mov     eax, [ebp + var_14] CODE:00403961               call    sub_4038DC CODE:00403966               pop     ecx CODE:00403967               mov     esp, ebp CODE:00403969               pop     ebp CODE:0040396A               retn CODE:0040396A  sub_403938   endp

Note that four local variables are defined in the procl procedure. However, as you can see, five local variables are defined in the executable code. The var_14 variable is allocated for storing the parameter passed to the procedure (mov [ebp + var_14], eax); in other words, it is a temporary variable. The add esp, 0FFFFFFECh command is equivalent to add esp, -20. Everything is correct here (there are live variables, and 20 = 4*5).

There are even more interesting issues. For instance, consider the call to the proc2 procedure, to which the call sub_4038DC command in Listing 3.51 corresponds. Note that this time the parameter also is passed to the procedure through the EAX register. However, what does the push ebp command mean? Is it another parameter? In the source code of the program, there were no additional parameters. Furthermore, this doesn't correspond to the __fascall convention. Recall that the proc2 procedure is nested, and it must have access to the local variables of the procl procedure. This is why the EBP register is secretly passed to the proc2 procedure through this value. This is necessary to provide the nested procedure with access to local variables of the procl procedure. Also, note that the pop ecx command that follows the procedure call simply releases the stack from this "illegal" parameter.

Listing 3.52 provides the disassembled code of the proc2 procedure (see Listing 3.49).

Listing 3.52: Disassembled code of the proc2 procedure from Listing 3.49

 CODE:004038DC  sub_4038DC    proc near     ; CODE XREF: sub_403938 + 29↓p CODE:004038DC        var_8  = dword ptr -8 CODE:004038DC        var_4  = dword ptr -4 CODE:004038DC        arg_0  = dword ptr  8 CODE:004038DC               push    ebp CODE:004038DD               mov     ebp, esp CODE:004038DF               add     esp, OFFFFFFF8h CODE:004038E2               mov     [ebp + var_4], eax CODE:004038E5               mov     [ebp + var_8], IEh CODE:004038EC               mov     eax, ds:off_4040A4 CODE:004038F1               mov     edx, [ebp + var_4] CODE:004038F4               call    sub_402B78 CODE:004038F9               mov     edx, [ebp + arg_0] CODE:004038FC               mov     edx, [edx - 4] CODE:004038FF               call    sub_402B78 CODE:00403904               mov     edx, [ebp + var_8] CODE=00403907               call    sub_402B78 CODE:0040390C               mov     edx, [ebp + arg_0] CODE:0040390F               mov     edx, [edx - 8] CODE=00403912               call    sub_402B78 CODE:00403917               mov     edx, [ebp + arg_0] CODE:0040391A               mov     edx, [edx - 0Ch] CODE:0040391D               call    sub_402B78 CODE:00403922               mov     edx, [ebp + arg_0] CODE=00403925               mov     edx, [edx - 10h] CODE=00403928               call    sub_402B78 CODE:0040392D               call    sub_402BA8 CODE=00403932               pop     ecx CODE=00403933               pop     ecx CODE=00403934               pop     ebp CODE=00403935               retn CODE=00403935 sub_4038DC    endp

The abundance of procedure calls immediately attracts attention. But you know that in the source code (see Listing 3.49), there is only the writeln function. However, writeln is not a function but an operator. The compiler transforms this operator into two procedure calls. The first procedure (sub_402B78) forms some resulting string, which will be printed. The number of calls to this procedure matches the number of parameters in the writeln operator. When the resulting string is formed, the sub_402BA8 procedure is called, which outputs the string to the console.

Pay special attention to the add esp, OFFFFFFF8h command. The memory for two stack variables is reserved. The parameter passed to the procedure is placed into the var_4 variable. The var_8 variable corresponds to the c local variable, which is assigned the value of 30 (lEh).

In addition to the two local variables, the procedure has the arg_0 parameter, which is nothing but the EBP value passed from the procl procedure, using which it is possible to access local variables of the procl procedure.

If you view the source code of the program, you'll immediately note that the proc2 procedure prints the values of al (the values passed from procl as a parameter), c (local variable of the procl procedure), and the values of four variables defined in procl.

The previously-considered arg_0 parameter is used for obtaining the values of variables defined in procl. For example, consider how the value of the b variable is retrieved: mov edx, [ebp + arg_0]/mov edx, [edx - 4]. Again, parameters are passed through registers. As relates to the EAX register, some ds:off_4040A4 parameter is placed there, the value of which is unknown. As you probably can guess, this parameter is required for the operation of the sub_402B78 procedure.

Identifying Procedures and Functions

To identify a specific procedure, you need to determine the addresses of its start and its end. Second, it is necessary to determine the number and type (or at least the size) of the passed parameters, the stack variables used by this procedure, and the type of its return value. Consider the possibilities are available for completing this task:

The procedure call can be used. The CALL addr command explicitly specifies that some procedure is located at the addr address. However, there are two possible problems:
- An indirect procedure call, such as CALL [EAX], causes difficulties for disassemblers. It is necessary to either resort to using a debugger or analyze the disassembled text manually. Furthermore, if the value of the EAX register is subject to change depending on the values of some other parameters, it becomes problematic to locate all procedures called using this method.
- From Section 1.6.1, you know that there are lots of methods of calling procedures. This can be achieved even using the RET command. If you are dealing with a program written in Assembly or containing Assembly inserts, and the program's author aims to confuse potential code investigators, there are lots of possibilities of achieving this goal. However, if nonstandard procedure calls are used, this might be disclosed by the presence of commands like ADD ESP, N, SUB ESP, -N, or one or more POP instructions. If you encounter such patterns, this must inspire you to investigate the code more closely.
Functions can be identified by locating the standard function prologue. As a rule, the standard function prologue is made up of three commands directly following each other: PUSH EBP, Mov EBP, ESP, and SUB ESP, N. The final command might be different, for example, ADD ESP, -N or simply one or more sequential PUSH commands. It also is possible to do without stack allocation for local and temporary variables. This is the case when there are no such variables or if registers are used for passing such variables. Besides this, the procedure might start with the commands for saving the values of the EBX, EST, and EDT registers. In the course of optimization, the compiler might do without the standard prologues and address all stack variables and parameters using the ESP register. Finally, instead of the standard prologue it is possible to use the ENTER N command. The end of the function is easier to locate when the starting point of that function is known. However, in some cases the end of the function is the first to be located.
The procedure end can be easily located if a standard epilogue is present: Mov ESP, EBP/POP EBP. Sometimes, this sequence of commands is replaced with the LEAVE command. The epilogue is followed by the RETN command. In general, any RETN command (especially RETN N) must make the code investigator vigilant. Having encountered it, you should always check whether this is the procedure end. This is a good criterion; however, it is not always applicable. In particular, if the function contains more than one __try/__except blocks, the Microsoft Visual C++ compiler might generate several standard epilogues (for optimization). Thus, even the IDA Pro disassembler might be easily confused in such situations.
As mentioned earlier, it is easier to find the procedure end if the procedure start has been determined. This usually will be the first RETN command encountered. However, it is possible to exit in the middle of a procedure. In this case, some unconditional jump must precede the RETN command, which passes control to some location beyond the RETN command. For example, the pattern might appear as shown in Listing 3.53.

Listing 3.53: Sequence of commands typical for exiting in the middle of a procedure
```
    CMP    EAX,  1    JNZ    L1      RETN L1: 
```

Thus, it is possible to search for the procedure end starting from the Ll (label). If the procedure has to return something when terminating its execution (when it is a function), then the command setting the value of the EAX register must be present near its end. Such commands might appear as XOR EAX, EAX (return false) or MOV EAX, 1 (return true), or these might be some commands that modify the EAX value (MOV, ADD, SUB, etc.). If the type of the return value is 8 bytes is size, then this value is returned in the EDX:EAX pair of registers. Finally, values of the double type are returned in the ST(0) FPU register.

Note

If the return value is a structure, then the pointer to that structure, instead of the structure, is returned in the EAX register. The structure itself is created in the calling function. When the function of the "structure" type is called, the pointer to that structure is passed in the EAX register. Thus, the function will work with the structure that has already been created, and after completion it will return the pointer to the same structure.

Most procedures and functions have either variables defined in the stack or parameters passed through the stack. This is an important indication because you will certainly encounter commands with addressing through the EBP or ESP registers. By carefully viewing the code above and below the encountered command, it is possible to determine the procedure start.
When stack variables are addressed in a standard way (through the EBP register), it won't be difficult to determine the amount of stack memory allocated to them (SUB ESP, N or any similar command). As relates to the parameters passed to the procedure, here the situation is slightly more complicated, because it is not known beforehand how much memory has been allocated for them. The easiest way of solving this problem is to find the call to this procedure, because all parameters are usually loaded into the stack using PUSH commands or another obvious method (for instance, see Listing 3.13 and the comments that follow it). If the location, from which the procedure under consideration was called, is not known beforehand, then it will be necessary to analyze its code. First, it will be necessary to find the maximum offset in relation to the EBP value in the direction of higher addresses. Because the return address and old EBP value were loaded into the stack after parameters, the first parameter (the one with the minimum address) will be located at the [EBP + 8] address (see Fig. 3.2). Thus, if the maximum offset using the [EBP + N] addressing is equal to max_off, then the number of bytes allocated for parameters will be equal to max_off-4. Assuming that all parameters are 32 bits in size and have a simple data type (these are not arrays or structures), an approximate number of parameters will be equal to (max_off-4) /4.

After these theoretical considerations and computations, consider a simple example program written in C++ (Listing 3.54).

Listing 3.54: C++ program illustrating the procedure of identifying function start and end

 #include <stdio.h> #include <windows.h> double myfunc(double, __int64, int, BYTE); void main() {         double ff = 10.45;         __int64 ii = 1000;         int jj = 200;         BYTE bb = 50;         double ss = myfunc(ff, ii, jj, bb);         printf("%f\n", ff); }; double myfunc(double f, __int64 i, int j, BYTE b) {         double s;         s = f + i + j + b;         printf("%f\n", s);         return s; };

The disassembled text of the main function from this program is shown in Listing 3.55.

Listing 3.55: Disassembled code of the main function from Listing 3.54

 .text:00401000 _main        proc near        ; CODE XREF: start + 16E↓p .text:00401000      var_40  = qword ptr -40h .text:00401000      var_30  = qword ptr -30h .text:00401000      var_28  = qword ptr -28h .text:00401000      var_1C  = dword ptr -1Ch .text:00401000      var_18  = qword ptr -18h .text:00401000      var_10  = dword ptr -l0h .text:00401000      var_C   = dword ptr -0Ch .text:00401000      var_l  = byte ptr -1 .text:00401000             push    ebp .text:00401001             mov     ebp, esp .text:00401003             sub     esp, 28h .text:00401006             fld     ds:dbl_408108 .text:0040100C             fstp    [ebp + var_28] .text:0040100F             mov     [ebp + var_10], 3E8h .text:00401016             mov     [ebp + var_C], 0 .text:0040101D             mov     [ebp + var_1C], 0C8h .text:00401024             mov     [ebp + var_1], 32h .text:00401028             mov     al, [ebp + var_l] .text:0040102B             push    eax .text:0040102C             mov     ecx, [ebp + var 1C] .text:0040102F             push    ecx .text:00401030             mov     edx, [ebp + var_C] .text:00401033             push    edx .text:00401034             mov     eax, [ebp + var_10] .text:00401037             push    eax .text:00401038             fld     [ebp + var_28] .text:0040103B             sub     esp, 8 .text:0040103E             fstp    [esp +  40h + var_40] .text:00401041             call    sub_401070 .text:00401046             add     esp,   18h .text:00401049             fstp    [ebp + var_18] .text:0040104C             fid     [ebp + var_28] .text:0040104F             sub     esp, 8 .text:00401052             fstp    [esp + 30h + var_30] .text:00401055             push    offset unk_4080FC .text:0040105A             call    _printf .text:0040105F             add     esp, 0Ch .text:00401062             xor     eax, eax .text:00401064             mov     esp, ebp .text:00401066             pop     ebp .text:00401067             retn .text:00401067 _main       endp

First, identify four local variables defined in the main function. Discard the var_30 and var_40 names, because these are not variables. These are identifiers used by IDA Pro. For local variables, 40 bytes are allocated. This is too much for five variables. Thus, it is necessary to investigate these variables in more detail and in due order. The var_28 variable stands for the ff variable of the double type. Here, everything is clear: The initial value is loaded from the dbl_408108 constant using the fld/fstp commands. The mov [ebp + var_10], 3E8h/mov [ebp + var_C], 0 commands load the 1000 (3E8h) value into the ii variable. The disassembler didn't understand that this was a single 64-bit variable, and it interpreted this data item as two different variables. The var_1C variable designates the jj variable. Then, there is the var_l single-byte variable designating bb. Note that although this is a single-byte variable, it takes 4 bytes. This variable is followed by 4 more free bytes, and only after this interval does the var_c variable start. Thus, the compiler has aligned the data by the 8-byte boundary. This alone appears suspicious and causes the code investigator to assume that instead of two 4-byte variables there is one 8-byte variable. Now only the ss variable remains. Note that after the call to the myfunc function that has the double type, there is the fstp [ebp + var_18] command. This means that the value from the ST(0) FPU register is loaded into the var_18 variable. However, double variables are returned in the ST(0) register. Thus, it is possible to conclude that var_18 stands for the ss variable. Therefore, everything is OK. All variables have been identified, and the extra reservation was caused by data alignment.

Another interesting sequence of commands is as follows: mov al, [ebp + var_l] / push eax. At first glance, it appears that everything is all right here because, although the variable is 1 byte, it is necessary to load a 4-byte value into the stack. However, the most significant bytes of the EAX register were not cleared. Furthermore, the entire double word is sent to the stack as a parameter. This is possible only if the function strictly accounts for the parameter being 1 byte in size. By the way, pay special attention to the order, in which parameters are sent to the stack (from right to left). It is the calling function that clears the stack. This corresponds to the __cdecl calling convention (see Table 3.2). Then all other parameters are sent into the stack. The ii(var_10, var_c) variable is sent into the stack as two independent 4-byte variables. The ff variable is sent into the stack using the fstp command, as required. Then there is the call to the printf function. This isn't anything unusual. However, I would still like to draw your attention to the following issue: The first parameter of this function is the format string, which specifies all other parameters of the function. This specification is often helpful for determining the variable type and size. This is even truer because the C++ library provides several other functions similar to printf and operating over the format string.

The disassembled code of the myfunc function is provided in Listing 3.56.

Listing 3.56: Disassembled code of the myfunc function from Listing 3.54

 .text:00401070       sub_401070    proc near        ; CODE XREF: _main + 41↑p .text:00401070       var_14  = qword ptr -14h .text:00401070       var_C   = dword ptr -0Ch .text:00401070       var_8   = qword ptr -8 .text:00401070       arg_0   = qword ptr  8 .text:00401070       arg_8   = qword ptr  l0h .text:00401070       arg_10  = dword ptr  18h .text:00401070       arg_14  = byte ptr 1Ch .text:00401070               push    ebp .text:00401071               mov     ebp, esp .text:00401073               sub     esp, 0Ch .text:00401076               fild    [ebp + arg_8] .text:00401079               fadd    [ebp + arg_0] .text:0040107C               fiadd   [ebp + arg_10] .text:0040107F               movzx   eax, [ebp + arg_14] .text:00401083               mov     [ebp + var_C], eax .text:00401086               fild    [ebp + var_C] .text:00401089               faddp   st(1), st .text:0040108B               fst     [ebp + var_8] .text:0040108E               sub     esp, 8 .text:00401091               fstp    [esp + 14h + var_14] .text:00401094               push    offset byte_408100 .text:00401099               call    _printf .text:0040109E               add     esp, 0Ch .text:004010A1               fld     [ebp + var_8] .text:004010A4               mov     esp, ebp .text:004010A6               pop     ebp .text:004010A7               retn .text:004010A7  sub_401070   endp

Start the analysis by considering stack variables. There are only two such variables (the var_14 variable is not taken into account): var_8 and var_c. The var_8 variable takes 8 bytes, which leads to the conclusion that this is nothing but the s variable. This assumption will be further confirmed. No other variables were declared in the myfunc function. Consequently, the 4-byte var_c variable is simply a temporary variable.

It is time to consider the function parameters. Strangely, there are only four parameters. Recall that although there are four parameters, when disassembling the main function IDA Pro considered there to be five variables, which later are used as parameters. Nevertheless, there isn't anything difficult here. When the disassembler processed the main function, it didn't have groundwork for considering var_10 and var_c as a single variable. When processing the myfunc function, the disassembler has well-grounded reasons for considering arg_8 as a single 8-byte parameter or the __int64 number (see the fild command).

Now, consider the algorithm used for computing the value of the f + i + j + b expression. The fild command loads a long integer number (the i number) onto the top of the FPU stack (namely, into the ST(0) register). The next command, fadd, adds this number to a real number, f. The result is then loaded into ST(0) and interpreted as real. The next command, fiadd, adds the real number stored in ST(0) to the 32-bit integer number j. Again, the result is placed into ST(0). Then the movzx eax, [ebp + arg_14] command places 1 byte into the EAX register and clears the most significant bytes of the register. This issue has already been mentioned in comments that follow Listing 3.55. The byte is sent into the stack as part of a double word, and the calling party doesn't clear the most significant bits, while the called procedure does clear them; otherwise, error would be inevitable. Later, the var_c variable is used. The b number is loaded into it (mov [ebp + var_C], eax), after which the var_c variable is loaded into the ST(0) register and the old value of ST(0) is moved into the ST(1) register. Finally, there is the faddp st(1), st command, and the result of computing the f + i + j + b expression is placed into var_8 (the s variable) by the fst [ebp + var_8] command. The stack is not popped, and the result is still contained in ST(0). Thus, the sequence of the sub esp, 8/ fstp [esp + 14h + var_14] commands places this result into the stack for output using the printf function. The final stroke is the fld[ebp + var_8] command — the value returned by the function. Here, the compiler has made minor error. It wasn't necessary to use the fstp command, and without it the latter command also wouldn't be needed.

Buffer Overflow

Buffer overflow is one of the methods often used by hackers for correcting the software at run time. By skillfully manipulating the input data, the hacker causes buffer overflow and passes control to the shellcode expertly inserted into the program. Here, only one type of overflow error will be considered, namely, stack overflow. Usually, stack overflow manifests in programs written in C++. It consists of intrusion into the executable program code through the program stack.

The stack overflow technique is mainly used for remote attacks. If you need to intrude the program that runs on the local computer, there are more powerful tools for achieving this goal. Besides, cracking some system running on a remote computer requires the intruder to carry out some preliminary investigations. This is why I decided to include this material in this book, even though it mainly relates to remote attacks.

Essence of the Problem

When working with external devices, the program allocates buffers for storing sent and received data. The received data fill the allocated buffers either completely or partially. The program must ensure that the received data do not go beyond the buffer limits. If this happens, other data items of even the executable code might become damaged. If the data being loaded exceed the boundaries of the buffer allocated to them, this might result in a program malfunction or even in a total crash. A specific feature of such errors is that they are exceedingly difficult to detect. In some cases, programmers might gain a false impression that the errors are arbitrary and are not related to specific actions. Exceeding the array boundaries is a typical example of such errors.

Most contemporary compilers are capable of generating the code that can check whether the boundaries of allocated buffers have been exceeded. For example, the Microsoft C++ compiler provides the /GS command-line options. If this option is used, additional code is generated by the compiler, which checks all operations to find out whether the buffer limits have been exceeded. However, only buffers defined in the stack are checked, and this check is superficial. It doesn't guarantee that the data do not fall into another buffer after exceeding limits of the allocated one. The essence of the idea of checking the stack for overflow is to place certain predefined bytes at the boundaries of the allocated buffer. After any operations that write into the allocated buffer, a special procedure that checks these bytes must be called. If these bytes are modified, this is evidence that there was a buffer overflow error. This approach requires additional memory. Furthermore, it considerably slows program execution.

A natural question arises: How can the buffer overflow be exploited by those who try to crack the program or a system? There are several techniques of penetration:

If the buffer is located in the stack, then the most obvious mechanism of intrusion from the outside is modification of the return address from the function. The return address can be modified in such a way that it jumps to another function or to the address that also is located in the stack but contains malicious code instead of normal data. This method is the one that will be covered later in this section.
Another approach is to modify some pointers (including to pointers to functions, the jump table, etc.) at run time in such a way as to point to code other than that initially intended by the program developer. This code was previously inserted by the intruder and will run according to the intruder's plans.
Another approach is to modify data addresses in the course of overflow. The next portion of data will be supplied into a different program location, which would allow the intruder to insert an external code into the program.

Thus, it is possible to state that software is cracked by buffer overflow in the following two stages: It is necessary first to insert malicious code and then to pass control to it. In theory, the first stage might not be needed if the procedure that the hacker requires is part of the program being investigated. In this case, the only thing that the hacker needs is to pass control to that procedure at the required moment.

Why do I pay such attention to stack overflow? Windows protects the executable code from writing there (the topic of self-modification was covered in detail in Section 1.6.2). Also, Windows protects the data section from executing some code there. The stack is the only location where it is possible to both write data and execute commands. This fundamental property of the stack is universally applicable to most operating systems. In other words, it is possible to fill the stack buffer with executable code and then make the processor execute it.

Practical Example

As an example, consider the simple program provided in Listing 3.57. The getpassword function checks whether the supplied password is correct and, depending on the password's correctness, returns either false or true.

Listing 3.57: Example of a simple program vulnerable to stack overflow

 #include <stdio.h> #include <string.h> int getpassword(char *); char *passw = "privet"; int main() {        printf("Input password:\n");        if(!getpassword(passw))printf("You are registered!\n");        else printf("You are wrong!\n");        return 0; }; int getpassword(char *ss) {        char s[13];        gets(s) ;        if (!strcmp(s, ss))return 0;        else return 1; }

Fig. 3.3 shows the design of the stack of the program presented in Listing 3.57. This is a standard arrangement with prologue and epilogue. As you can see, the general stack structure is divided into the stack structure of the main function and the stack structure of the getpassword function. Note that addresses decrease from bottom to top. The local variable shown in this illustration is the s variable. And, although the size of this parameter was specified to equal 13 bytes, the compiler aligns it by the 4-byte boundary so that the buffer size equals 16 bytes. Data are inserted into the s variable from lower addresses to higher addresses. Thus, the data that overflow the stack overwrite everything located below. The EBP value is the first to be overwritten. After it, there is the address of return from the getpassword function (which is the result longed for by the attacker). If the data go beyond the buffer limits and change the value of the return address, then the function would return control to a different location. But where to? This is the most interesting question. The attacker can place into the stack the address of some function present within the program so that in case of buffer overflow the program would execute according to the different method (which the programmer didn't expect).

image from book
Figure 3.3: The stack structure, with addresses decreasing from bottom to top

It is time to conduct an experiment by changing the return address from the getpassword function. First, it is necessary to carefully study the disassembled code of the program. Listings 3.58 and 3.59 show the disassembled texts of the main and getpassword functions, respectively.

Listing 3.58: Disassembled code of the main function from Listing 3.57

 .text:00401000  _main       proc near        ; CODE XREF: start + 16E↓p .text:00401000              push    ebp .text:00401001              mov     ebp, esp .text:00401003              push    offset aInputPassword .text:00401008              call    _printf .text:0040100D              add     esp, 4 .text:00401010              mov     eax, dword_409040 .text:00401015              push    eax .text:00401016              call    sub_401050 .text:0040101B              add     esp, 4 .text:0040101E              test    eax, eax .text:00401020              jnz     short loc_401031 .text:00401022              push    offset aYouAreRegister .text:00401027              call    _printf .text:0040102C              add     esp, 4 .text:0040102F              jmp     short loc_40103E .text:00401031  loc_401031:                  ; CODE XREF: _main + 20↑j .text:00401031              push    offset aYouAreWrong .text:00401036              call    _printf .text:0040103B              add     esp, 4 .text:0040103E  loc_40103E:                  ;CODE XREF: _main + 2F↑j .text:0040103E               xor    eax, eax .text:00401040               pop    ebp .text:00401041               retn .text:00401041  _main        endp

Listing 3.59: Disassembled code of the getpassword function (Listing 3.57)

 .text:00401050   sub_401050  proc near        ; CODE XREF: _main + 16↑p .text:00401050        var_10 = byte ptr -l0h .text:00401050        arg_0  = dword ptr  8 .text:00401050               push    ebp .text:00401051               mov     ebp, esp .text:00401053               sub     esp, 10h .text:00401056               lea     eax, [ebp + var_10] .text:00401059               push    eax .text:0040105A               call    gets .text:0040105F               add     esp, 4 .text:00401062               mov     ecx, [ebp + arg_0] .text:00401065               push    ecx ; char .text:00401066               lea     edx, [ebp + var_10] .text:00401069               push    edx ; char .text:0040106A               call    _strcmp .text:0040106F               add     esp, 8 .text:00401072               test    eax, eax .text:00401074               jnz     short loc_40107A .text:00401076               xor     eax, eax .text:00401078               jmp     short loc_40107F .text:0040107A  loc_40107A:                 ; CODE XREF: sub_401050 + 24↑j .text:0040107A                mov    eax, 1 .text:0040107F  loc_40107F:                 ; CODE XREF: sub_401050 + 28↑j .text:0040107F                mov    esp, ebp .text:00401081                pop    ebp .text:00401082                retn .text:00401082  sub_401050    endp

To start this study, first consider the call to the sub_401050 function, which is nothing but the designation of the getpassword function. The sequence of the mov eax, dword _409040/push eax commands corresponds to sending the pointer containing the reference password into the stack. The dword_409040 global variable contains the address of this string; in other words, it is the pointer variable. Thus, the parameter is sent to the stack, then the call command places there the return address (in this case, this is the 0040101Bh value).

Then pay attention to the test eax, eax command and the conditional jump that follows it: jnz short loc_401031. This is a normal conditional construct, and the test command corresponds to the if(!) operator. You have encountered the structure of the main function many times.

First, note that 16 bytes are allocated for storing local variables in this program. This is because all data in the stack are aligned by the 4-byte boundary. So, if you want to overflow the stack, bear in mind the actual size of this buffer.

The lea eax, [ebp + var_10]/push eax sequence of commands simply places the address of this buffer into the stack. It is assumed that the password will be loaded into that buffer. This is the key issue. As you should understand (see Fig. 3.3), this buffer is followed by the content of the EBP register, which, in turn, is followed by the desired return address.

After the call to the gets library function, the strcmp function for comparing strings is called. This function receives the addresses of two strings from the stack. After the execution of this function, there is an ordinary conditional construct (test, then jnz). By the way, note that the strcmp function, like many other string functions, controls the string lengths only by the terminating null; consequently, buffer overflow cannot influence their operation.

Thus, having carefully considered Listings 3.58 and 3.59, it is possible to proceed further. What is the main goal? First, try to modify the return address so that the jump by retn from the getpassword function would pass control to the printf command in the beginning of the main function. From Listing 3.58, it follows that the jump address is equal to 00401003. Recall that in the memory the number is written according to the principle "the most significant byte gets the higher address," and you'll find that the following sequence of bytes must be sent to the buffer: 03 10 40 00. However, it is necessary to fill first the 16-byte buffer then 4 more bytes where the EBP value is stored. Because it is difficult to enter the characters with codes 10h and 03h from the command line, it is recommended that you use the following technique. Prepare the text file with the required string, then use input redirection. Thus, assuming that the program name is progl.exe and the text file is named pasw.txt, it is necessary to issue the following command: progl < pasw.txt. For entering characters with codes smaller than 32, it is possible to use the hiew.exe program (see Section 2.1.3). Well, let the string appear as follows:

    qqqqqqqqqqqqqqqqqqqq♥▶@

Exactly 20 bytes (16 bytes is the size of the string buffer, and 4 bytes are for the contents of EBP) are filled with arbitrary characters (for simplicity, I suggested simply the q character. Then there follow the characters with the codes 03h 10h 40h. And where is the character with code 0? Actually, you do not need it. After all, in the address that you will change, it is present in the position where it is required. Thus, prepare and execute the progl < pasw.txt command. The result will be as follows:

    Input password:    Input password:    You are wrong!

Thus, the problem has been solved. After the execution of the getpassword function, the jump passes control to the address specified in the buffer. However, after these strings are displayed, the dialog box warning you that there was an exception will appear (Fig. 3.4). This is natural and can be easily explained. During the second pass, different data fall into the buffer and stack is already corrupted; thus, the application cannot terminate correctly.

image from book
Figure 3.4: The exception reported by Windows XP after an artificially-created buffer overflow

Note

The attacker has serious limitations when choosing what bytes are sent into the buffer. For example, the hacker would fail to send characters with codes such as 26 or, say, 0 into the buffer. However, these limitations are not insurmountable for the following two reasons:

Only a particular case is considered in this example, namely, input using the gets console function. In general, the buffer might be intended for arbitrary binary information, in which case the attacker can freely choose the information that will be passed for the input.
The information that will be sent into the buffer can be encoded so that it won't contain "dangerous" codes. This issue will be described later in this chapter.

Recall that it is possible to run executable code in the stack. What if you place the program code into the stack and pass control to that code? This sounds promising! After all, in the primitive program the buffer size is only 16 bytes. Assume that you are dealing with a buffer that is 16 KB. It is possible to place a program into that buffer that would do anything you like and on behalf of a program that probably has a high privilege level in the system. This security hole has been exploited by crackers for more than 10 years.

What can be placed into the buffer in this particular case? For example, consider the code in Listing 3.60.

Listing 3.60: Executable code that can be placed into the vulnerable buffer (Listing 3.57)

 MOV  EAX, 0 RETN

Is this code suitable? I have intentionally made an error here. Recall that the RETN command has already been executed for jumping to this fragment; so instead of RETN it is necessary to use some kind of JMP command.

If you succeed in achieving this goal, the program will be cracked, because it will always report any supplied password as correct.

From Listing 3.58 it becomes clear that the jump address must be equal to 0040101B. The stack content will not be damaged, and no critical errors will occur.

Disappointingly, the sequence of commands such as MOV EAX, 0/JMP 0040101B is not suitable for passing it into the stack as a string. This is because of the following:

The first command contains zeros. The code o0f the MOV EAX, O command is equal to B8 00000000.
In the second command, the jump address is counted in relation to the command that follows the JMP command. If the starting address of the stack is changed, then this code fragment won't operate correctly.

Thus, to achieve the formulated goal, the two commands from Listing 3.60 must be transformed into a sequence of six commands (Listing 3.61).

Listing 3.61: Executable code that would operate correctly in the vulnerable buffer

 XOR EAX, EAX         ; 33 CO XOR ECX, ECX         ; 33 C9 MOV CL, 40H          ; B1 40 SHL ECX, 1OH         ; C1 E1 10 MOV CX, 101BH        ; 66 B9 1B 10 JMP ECX              ; FF E1

The codes of the commands in Listing 3.61 are specified as comments. The best approach to obtaining the correct codes of all commands is to use some debugger, such as OllyDbg. Thus, you'll require 15 bytes out of 20 allocated bytes (recall that 16 bytes are allocated for the string and 4 bytes are allocated for storing EBP). The remaining 5 bytes of the allocated 20 bytes can contain any information. The return address, which is next, must contain the starting address of the buffer. The buffer address can be determined using the same debugger. In this particular example, it turns out to be equal to 0012FEC8. Thus, it is necessary to add 3 more bytes for the 20-byte string that has already been formed: C8 FE 12.

Here is the content of the pasw.txt file: . To avoid errors, fill the text file using hiew.exe, not using a standard text editor. An attempt at passing some characters through the clipboard inevitably corrupts the code, although the character might remain the same. Thus, having prepared the text file, issue the progl < pasw.txt command. The result will be wonderful:

    You are registered!

Thus, the shellcode has been inserted into the program, making the program register the user.

Now, only one issue needs clarification. As you have seen, there are problems with the input of characters that can be sent to the program as a string. The input is not always carried out using the console procedure that treats some characters selectively. If you deal with such an input, then there is no problem.

However, return to the case of console input. What is the solution to this problem? The answer is straightforward. It is necessary to encode the sequence of bytes in such a way as to ensure that the codes incorrectly treated by console input are missing from the string. I won't consider different methods of encoding here. I prefer the following approach (which might require additional memory). The main goal of this approach is as follows: All "invalid" bytes must be encoded (for example, by the XOR command). Each of these bytes must be preceded by a byte specifying that the next byte that follows it is encoded. For this purpose, it is natural to use the NOP command that has the 90H code. The entire fragment must start with the decoder for the remainder of the code. Decoding consists of removing the NOP bytes and decoding the bytes that follow them. Because the bytes that console input functions consider invalid or interpret incorrectly are not numerous, the number of NOP instructions must not be too large.

3.2.2. Conditional Constructs and Logical Operators

When programming in high-level languages such as C and Pascal, most programmers have become used to completed conditional constructs (if-else) and logical operators (AND, OR, etc.). However, there was a time when such possibilities were not available. For instance, consider Fortran or early versions of Basic. For these, unconditional jump operators, such as goto, were helpful, despite strong dislike by ardent supporters of high style in programming. However, the machine language is entirely built on the basis of conditional and unconditional jumps. Like it or not, it is impossible to do without them if you need to check some condition.

Simple Constructs

Consider the simple conditional construct shown in Listing 3.62.

Listing 3.62: Simple conditional construct

 #include <stdio.h> void main() {         int a, b;          scanf("%d", &a);          scanf("%d", &b);          if (a >= b)                   printf("a >= b\n");          else                   printf("a < b\n"); }

After compiling the program presented in Listing 3.62 using Microsoft Visual Studio and loading it into the IDA Pro disassembler, you'll obtain the code shown in Listing 3.63.

Listing 3.63: Disassembled code of the program shown in Listing 3.62

 .text:00401000  _main        proc near         ; CODE XREF: start + 16E↓p .text:00401000         var_8 = dword ptr -8 .text:00401000         var_4 = dword ptr -4 .text:00401000              push    ebp .text:00401001              mov     ebp, esp .text:00401003              sub     esp, 8 .text:00401006              lea     eax, [ebp + var_4] .text:00401009              push    eax .text:0040100A              push    offset unk_4080FC .text:0040100F              call    _scanf .text:00401014              add     esp, 8 .text:00401017              lea     ecx, [ebp + var_8] .text:0040101A              push    ecx .text:0040101B              push    offset unk_408100 .text:00401020              call    _scanf .text:00401025              add     esp, 8 .text:00401028              mov     edx, [ebp + var_4] .text:0040102B              cmp     edx, [ebp + var_8] .text:0040102E              jl      short loc_40103F .text:00401030              push    offset aAB   ; "a >= b\n" .text:00401035              call    _printf .text:0040103A              add     esp, 4 .text:0040103D              jmp     short loc_40104C .text:0040103F  loc_40103F:                    ; CODE XREF:  _main +  2E↑j .text:0040103F              push    offset aAB_0 ; "a<b\n" .text:00401044              call    printf .text:00401049              add     esp, 4 .text:0040104C  loc_40104C:                    ; CODE XREF:  _main + 3D↑|j .text:0040104C              xor     eax, eax .text:0040104E              mov     esp, ebp .text:00401050              pop     ebp .text:00401051              retn .text:00401051  _main       endp

Pay special attention to the way, in which the scanf function is called. Because its argument requires a pointer to a variable, lea eax, [ebp + var_4] /push eax sends the pointer to the var_4 variable into the stack. Note that Microsoft's compiler treats the var_8 variable in the same way.

The second important issue is how the complete conditional construct is implemented in the executable code. Schematically, this can be represented as shown in Listing 3.64.

Listing 3.64: Implementation of a complete conditional structure in the executable code

 j1   l1 // a >= b ... jmp 12 l1: // a < b ... 12:

As you can see, to implement a complete conditional construct one conditional jump and one unconditional jump are required. Note that the conditional jump command corresponds to the condition that is the negation of the one in the original program text. If you remove the else branch (the incomplete conditional construct) from the program, then the unconditional jump command (jmp 12) will be missing from the executable code. Finally, if you replace the a >= b condition with a > b, then the executable code will contain the jle (jump less or equal) command instead of the jl command. If the source program uses the less-or-equal condition, for example, a =< b, then the executable code will contain jg or jge commands (for the less condition). The fact that instead of a direct condition its negation is checked in the executable code isn't an axiom. Another approach is possible (Listing 3.65).

Listing 3.65: Alternative implementation of complete conditional constructs

 jge  11 // a < b ... jmp 12 11: // a >= b ... 12:

As you can see, there isn't anything abnormal in this approach. You can encounter it when analyzing the code created by some compilers. However, I'd like to point out again that decompilation isn't your main goal. You must try to understand the program's operating logic. For this purpose, you do not need to know exactly, which source code was used to generate a specific fragment of the disassembled code.

If variables used within the program being studied are unsigned, then instead of jl(jle) the jb(jbe) commands are used, and the jg(jge) commands are replaced with ja(jae). In case of the check for equality (==) or inequality (!=), the jnz and jz commands are used, respectively. Note that in the previously considered example, the CMP command was used in the executable code for checking the a > b condition. This is obvious. This command is also used for checking other conditions: <, <=, >=, ==, and !=. Everything depends on the conditional check that you'll use later — in other words, on the flag or group of flags that you are checking. If you check for equality (or inequality) to zero, the TEST command is often used instead of CMP. Recall that in many programming languages the false value corresponds to 0, and true corresponds to some nonzero value (1, for example). In this relation, it is instructive to consider a construct typical for the C++ language (Listing 3.66).

Listing 3.66: Typical C++ construct

 if(k =  (a = b))) { } else { }

It is clear that the k variable will be assigned one of the following two values: 1 (if a is equal to b) or 0 (if a is not equal to b). Here is a fragment of the executable code compiled using the Microsoft Visual C++ compiler (Listing 3.67).

Listing 3.67: Executable code generated using Microsoft Visual C++ (Listing 3.66)

 ; Load the a variable into the EAX register. mov     eax, [ebp + var_4] ; Compute the difference a - b; the variables remain unchanged. sub     eax, [ebp + var_8] ; Sign inversion is needed to check whether the EAX register contains 0. neg     eax ; Subtraction takes into account the sign. ; If the EAX register contains a nonzero value, ; then the subtraction result in EAX will be -1; ; otherwise, the EAX register would contain 0. sbb     eax, eax ; If the EAX register contained -1, the inc command ; will produce 0 (false); otherwise, the result will be 1 (true). inc     eax ; The value is assigned to the k variable. mov     [ebp + var_C], eax ; Jump depends on the result in the EAX register. jz      short loc_401058 ... jmp     short loc_401065 loc_401058: ... loc_401065: ...

The algorithm of obtaining the value of the k variable is a notable one. As you can see, the CMP command is not used in this case. Note that the conditional jump is carried out depending on the value contained in the EAX register after the INC EAX operation. The result of this operation, contained in the EAX register, is either 0 (false) or 1 (true).

Comparison of real numbers deserves special attention (Listing 3.68).

Listing 3.68: Sample program illustrating comparison of real numbers

 #include <stdio.h> void main() {        double a, b;        scanf("%Lf", &a);        scanf("%Lf", &b);        if (a >= b)                 printf("%Lf\n", a) ;        else                 printf(""-Lf\n", b); }

Listing 3.68 presents a simple program for comparing two real numbers of the double type. From the language syntax point of view, the difference between this program and a similar one that compares integer variables is minor and relates only to the format of the scanf and printf functions. However, comparison of real variables must be principally different from comparison of integer numbers at the level of executable code. Consider Listing 3.69, produced by IDA Pro.

Listing 3.69: Disassembled code of the program in Listing 3.68 created by IDA Pro

 .text:00401000  _main       proc near         ; CODE XREF: start + 16E↓p .text:00401000       var_18 = qword ptr -18h .text:00401000       var_10 = qword ptr -10h .text:00401000       var_8  = qword ptr -8 .text:00401000              push    ebp .text:00401001              mov     ebp,  esp .text:00401003              sub     esp,  10h .text:00401006              lea     eax, [ebp + var_8] .text:00401009              push    eax .text:0040100A              push    offset unk_4090FC .text:0040100F              call    _scanf .text:00401014              add     esp, 8 .text:00401017              lea     ecx, [ebp + var_10] .text:0040101A              push    ecx .text:0040101B              push    offset unk_409100 .text:00401020              call    _scanf .text:00401025              add     esp, 8 .text:00401028              fld     [ebp + var_8] .text:0040102B              fcomp   [ebp + var_10] .text:0040102E              fnstsw  ax .text:00401030              test    ah, 1 .text:00401033              jnz     short loc_40104D .text:00401035              fld     [ebp + var_8] .text:00401038              sub     esp, 8 .text:0040103B              fstp    [esp + 18h + var_18] .text:0040103E              push    offset aLf  ; "%Lf\n" .text:00401043              call    _printf .text:00401048              add     esp, 0Ch .text:0040104B              jmp     short loc_401063 .text:0040104D  loc_40104D:                    ; CODE XREF: _main + 33↑j .text:0040104D              fld     [ebp + var_10] .text:00401050              sub     esp, 8 .text:00401053              fstp    [esp + 18h + var_18] .text:00401056              push    offset aLf_0 ; "%Lf\n" .text:0040105B              call    _printf .text:00401060              add     esp, 0Ch .text:00401063  loc_401063:                    ; CODE XREF:  _main +  4B↑j .text:00401063              xor     eax, eax .text:00401065              mov     esp, ebp .text:00401067              pop     ebp .text:00401068              retn .text:00401068 _main        endp

For the moment, you shouldn't see anything unusual in allocating memory for stack variables. The only issue that deserves special mention is that now variables have the double type and, consequently, take 8 bytes. Arguments of the scanf function are pointers to variables rather than actual variables. Hence, the lea (lea eax, [ebp + var_8]/push eax) commands are used. The use of two registers (EAX and ECX) as temporary variables for storing the pointers to variables for further pushing into the stack has no special meaning. It would be possible to do with only one register.

Starting from the 00401028 address, the really interesting commands are encountered. It is necessary to compare two 8-byte floating-point numbers. Here is the sequence of commands that carries out this comparison (Listing 3.70).

Listing 3.70: Sequence of commands that compares two 8-byte floating-point numbers

 ; Load the a variable into ST(0). fld     [ebp + var_8] ; Compare the content of ST(0) to var_10 (the b variable). fcomp   [ebp + var_10] ; Save the status word (SW) in the AX register. fnstsw  ax ; Check bit 0 of the AH register. test    ah, 1 ; Jump if this bit is set. jnz     short loc_40104D ... jmp     short loc_401063 loc_40104D: ... loc_401063: ...

Although the preceding fragment is supplied with comments, some notes still have to be made. The fcomp command compares two operands, and the result of this comparison is reflected by the C0, C2, and C3 flags that correspond to bits 8, 9, and 10 of the status word (see Section 1.2.3). Table 3.3 outlines the flag values for different situations of comparison.

Table 3.3: Flag values for different situations of comparison
Checked condition	C3 flag	C2flag	CO flag
ST(0) > src	0	0	0
ST(0) < src	0	0	1
ST(0) = src	1	0	0
Operands are incomparable	1	1	1

From this table, it follows that if C0 == 0, this corresponds to the >= condition. Hence, the JNZ (jump if nonzero) command corresponds to printing the message that the b variable is the greatest. Some compilers use different techniques. For instance, a complier may copy the coprocessor flags into the flags register using the SAHF command. In this case, the C0 flag is copied into the CF flag, C2 is copied into PF, and C3 goes into ZF. Later, it is possible to use conditional jumps. In the case under consideration, this will be the JZ command (instead of JNZ). What should you do when checking a strict inequality, such as a > b? According to the data provided in Table 3.3, the checking command would appear as follows: TEST AX, 41H.

Nested Constructs and Logical Operators

In real-world programming, conditional constructs might contain nested logical constructs (Listing 3.71). Consider how nesting might influence the executable code.

Listing 3.71: Example program containing nested conditional constructs

 // Searching for the maximum number out of three numbers #include <stdio.h> void main() {         int a, b, c;         scanf("%d", &a);         scanf("%d", &b);         scanf("%d", &c);         if (a > b)         {                  if(a > c) printf("%d\n", a) ;                  else print f("%d\n", c);         } else {                  if(b > c) printf("%d\n", b) ;                  else printf("%d\n", c);         } }

The compiler builds the structure of nested conditional constructs according to the same method as the one shown in Listing 3.63. This method is presented in Listing 3.72.

Listing 3.72: Method of nested conditional structures built by the compiler

 jl 11 //  if (1) ^[9] // The branch corresponding to a > b jl 14 // if (2) // The branch corresponding to a > c // Output the a value. ... jmp 12 14: // else (2) // The a > c condition is not observed, while the a > b condition is true. // Output the c value. ... 12: // The end of the if operator of the first nesting level jmp 13 // The start of the else operator of the first nesting level 11: // else (1) // The a > b condition hasn't been observed. jl 15 // if (2) // The b > c condition is true, while the a > b condition is false. // Output the b value. ... jmp 13 15: // else (2) // The b > c is condition false, and the a > b condition is false. // Output the c value. ... 13: // End of the nested construct

Carefully consider the method presented in Listing 3.72. As you can see, it is clearly mapped to the method present in the source program (see Listing 3.71). At the same time, any complete conditional construct can be easily converted into incomplete conditional construct by simply discarding an appropriate jmp command.

Contemporary programming languages use OR and AND logical operators instead of large numbers of nested conditional constructs. Using these logical operators, combined conditions are built. In real-world programming, such combined conditions are sophisticated. Kris Kaspersky, in his excellent book Hacker Debugging Uncovered, suggests the diagrams technique for analyzing executable code resulting from complex conditional constructs. This approach can be helpful when reconstructing a complex conditional construct. In my opinion, however, it is not always necessary to reconstruct the source code to understand the conditions implemented in the executable code. After all, programmers often use logical operators more for making the code compact than for making their programs readable. This doesn't improve understanding of their programs. Furthermore, most programmers use both logical operators and nested conditional constructs, which makes it problematic to recover source constructs.

Consider the example in Listing 3.73.

Listing 3.73: Example illustrating recognition of logical operators

 .text:00401000  _main       proc near          ; CODE XREF: start + 16E↓p .text:00401000       var_C = dword ptr -0Ch .text:00401000       var_8 = dword ptr -8 .text:00401000       var_4 = dword ptr -4 .text:00401000             push    ebp .text:00401001             mov     ebp, esp .text:00401003             sub     esp, 0Ch .text:00401006             lea     eax, [ebp + var_4] .text:00401009             push    eax .text:0040100A             push    offset unk_4080FC .text:0040100F             call    scanf .text:00401014             add     esp, 8 .text:00401017             lea     ecx, [ebp + var_8] .text:0040101A             push    ecx .text:0040101B             push    offset unk_408100 .text:00401020             call    scanf .text:00401025             add     esp, 8 .text:00401028             lea     edx, [ebp + var_C] .text:0040102B             push    edx .text:0040102C             push    offset unk_408104 .text:00401031             call    scanf .text:00401036             add     esp, 8 .text:00401039             mov     eax, [ebp + var_4] .text:0040103C             cmp     eax, [ebp + var_8] .text:0040103F             jle     short loc_401047 .text:00401041             cmp     [ebp + var_8], 0 .text:00401045             jg      short loc_401055 .text:00401047  loc_401047:                    ; CODE XREF: _main + 3F↓j .text:00401047             mov     ecx, [ebp + var_4] .text:0040104A             cmp     ecx, [ebp + var_C] .text:0040104D             jz      short loc_401055 .text:0040104F             cmp     [ebp + var_C], 0 .text:00401053             jnz     short loc_401064 .text:00401055  loc_401055:                    ; CODE XREF: _main + 45↑j .text:00401055             push    offset aYes ; "Yes!\n" .text:0040105A             call    _printf .text:0040105F             add     esp, 4 .text:00401062             jmp     short loc_401071 .text:00401064  loc_401064:                    ; CODE XREF: _main + 53↑j .text:00401064             push    offset aNo ; "No!\n" .text:00401069             call    _printf .text:0040106E             add     esp, 4 .text:00401071  loc_401071:                    ; CODE XREF: _main + 62↑j .text:00401071             xor     eax, eax .text:00401073             mov     esp, ebp .text:00401075             pop     ebp .text:00401076             retn .text:00401076 _main       endp

Lots of similar listings have already been provided in this book. Thus, you'll easily determine that three stack variables of the integer type are used here (12 bytes are reserved, and three variables of the same type are used). The input of their values using the scanf library function also isn't anything new. The most interesting are the conditional jumps. First, consider the code in general. What would you see? First, all conditional jumps can produce one of the two possible results: printing the "Yes" string (the 00401055 address) or the "No" string (the 00401064 address) using the printf function. Consequently, to all appearances you are dealing with the if-else conditional construct. Combine all conditions that produce the first result and, accordingly, combine all conditions that lead to the second result. Thus, the jump to the 00401055 address takes place when var_4 > var_8 and var_8 > 0. This condition is an obvious candidate for the role of the following combined condition: a > b && b > 0. Let it be designated as condition 1. Because the same result is produced when all conditions except for condition 1 are satisfied, it is possible to assume that you are dealing with the OR logical operator (although, it is not necessary to take this into account to gain a sound understanding of the process). The same result is obtained if the var_4 = var_c condition is satisfied. Let it be condition 2. Finally, the same result will be produced if var_c = 0. This will be condition 3. In all other cases, the code fragment starting from the 00401064 address is executed. After completing all of these considerations, it is possible to write a conditional construct using the AND and OR logical operators. However, this won't improve your understanding of the code, because you have already understood the operating logic of this fragment.

Thus, what conclusion can be drawn on the basis of these considerations? This conclusion is as follows: Conditions related to one another using logical AND can be easily recognized. Considering them as a combined condition, it is also possible to understand the operating logic of all other conditions (related to the first one using logical OR).

Conditional Constructs without Jumps

It is necessary to mention that conditional and unconditional jumps reset the command queue, which slows down the program execution. Bear this in mind when writing a program in Assembly. Avoid using jumps whenever it is possible to do without them. Instead, it is possible to use the following sets of commands: SETcc r/m (conditional setting of the first bit) and CMOVX (conditional data sending). These commands are described in Chapter 1 (see Table 1.2). Advanced compilers, such as Visual C++, are aware of this capability. Unfortunately, I failed to make this compiler use any of the conditional move commands. On the other hand, it turned out that Visual C++ actively uses conditional bit setting commands.

The idea of using the commands for conditionally setting the first bit of the byte is trivial. Let some register (for example, EAX) contain the 0 value in the beginning. Then, after using the CMP command, it is possible to use one of the conditional bit setting commands, for instance, the SETLE command (set if lower). After that, execute DEC EAX. Now, if the condition has been satisfied, EAX contains 0; otherwise, the value contained in EAX is equal to -1 (FFFFFFFFH). If the operand is equal to 0, then the AND instruction with any value of the second operand won't change its content. For instance, consider the fragment in Listing 3.74.

Listing 3.74: Code fragment illustrating the use of the conditional bit setting commands

 .text:00401013       xor       eax, eax .text:00401015       cmp       edx, 0Ah .text:00401018       setle     al .text:0040101B       dec       eax .text:0040101C       and       eax, OFFFFF200h .text:00401021       add       eax, 1000h

As can be easily seen, if the EDX <= 0 condition is satisfied, the EAX register contains the 1000H value; otherwise, it will contain the 200H value. This executable code is equivalent to the if (a > 10) b = 0x200; else b = 0x1000; operator. This code is created by Microsoft's compiler if the "create fast code" option is set.

Choice Operators

As a rule, a long sequence of incomplete conditional operators is replaced with the choice operator. Listing 3.75 provides a typical example of using the choice operator. Consider what Microsoft's compiler would do to this code (Listing 3.76).

Listing 3.75: Typical example of using the choice operator

 #include <stdio.h> void main() {         char a;         scanf("%c", &a);         switch(a)         {                 case 'A':                         printf("A\n");                         break;                 case 'B':                         printf("B\n");                         break;                 default:                         printf("?\n");         } }

Listing 3.76: Disassembled listing (Listing 3.74) compiled using Microsoft Visual C++

 .text:00401000  _main      proc near        ; CODE XREF: start + 16E↓p .text:00401000       var_8 = byte ptr -8 .text:00401000       var_l = byte ptr -1 .text:00401000             push    ebp .text:00401001             mov     ebp, esp .text:00401003             sub     esp, 8 .text:00401006             lea     eax, [ebp - 1] .text:00401009             push    eax .text:0040100A             push    offset unk_4080FC .text:0040100F             call    _scanf .text:00401014             add     esp, 8 .text:00401017             mov     cl, [ebp + var_1] .text:0040101A             mov     [ebp + var_8], cl .text:0040101D             cmp     [ebp + var_8], 41h .text:00401021             jz      short loc_40102B .text:00401023             cmp     [ebp + var_8], 42h .text:00401027             jz      short loc_40103A .text:00401029             jmp     short loc_401049 .text:0040102B  loc_40102B:                 ; CODE XREF: _main + 21↑j .text:0040102B             push    offset byte_408100 .text:00401030             call    _printf .text:00401035             add     esp, 4 .text:00401038             jmp     short loc_401056 .text:0040103A  loc_40103A:                 ; CODE XREF: _main + 27 ↑j .text:0040103A             push    offset unk_408104 .text:0040103F             call    _printf .text:00401044             add     esp, 4 .text:00401047             jmp     short loc_401056 .text:00401049  loc_401049:                 ; CODE XREF: _main + 29↑j .text:00401049             push    offset unk_408108 .text:0040104E             call    printf .text:00401053             add     esp, 4 .text:00401056  loc_401056:                 ; CODE XREF: _main + 38↑j .text:00401056             ; main + 47 ↑j .text:00401056             xor    eax,  eax .text:00401058             mov    esp,  ebp .text:0040105A             pop    ebp .text:0040105B             retn .text:0040105B _main       endp

This code is interesting because of a certain redundancy. Two stack variables are defined in the main procedure. The var_l variable apparently is intended for storing the a variable defined in the program source code (see Listing 3.75), because it is used with the scanf function. The var_8 variable is an auxiliary one. It is used for comparison in commands such as cmp [ebp + var_8], 42h. It is possible to use a single variable, var_1. By the way, note that two single-bit variables are stored in two adjacent 4-byte blocks. This is because of the requirement to meet the data alignment by the 4-byte boundary.

The method for checking conditions is bulky. This is because the check tests whether the condition is satisfied. The method in Listing 3.77 would be more elegant.

Listing 3.77: Alternative and more elegant method for checking conditions

 cmp    [ebp + var_8], 41h jnz      11 ... jmp    _break 11: cmp    [ebp + var_8], 42h jnz      12: ... jmp    _break 12: //default ... _break:

Finally, you can encounter the approach shown in Listing 3.78. In particular, Borland C++ 5.0, as well as the Delphi compiler, processes the choice operator in this way. The Microsoft C++ compiler also behaves in this way, provided that the "create fast code" option is set.

Listing 3.78: Processing the typical choice operator

 mov       dl, [ebp + var_l] sub       dl, 41h jz        11 dec       dl jz        12 jmp       13 11: ... jmp       13 12: ... 13: ...

On the basis of Listing 3.78, the principle of this approach is easily understandable. The parameter of the switch operator is placed into some temporary variable, which might also be a register. Assume that the values for equality, to which the variable will be checked, are a1, a2, ..., an (in ascending order). The check is carried out as follows: First, the a1 value is subtracted from the temporary variable and the variable is tested for equality to zero. Later, the values a2 - a1, a3 - a2, etc., are subtracted from the variable. After each subtraction, the variable is tested for equality to zero. This approach is especially efficient when all differences (a2 - al, a3 - a2, etc.) are equal to one. In this case, it is convenient to use the DEC processor command.

3.2.3. Loops

Looping algorithms are a kind of branching in which, depending on the condition, a specific program fragment is executed multiple times. At the level of Assembly language, a conditional or unconditional backward jump is assumed to the instructions with smaller addresses. Contemporary disassemblers and debuggers trace such jumps and mark them in the disassembled text.

Simple Loops

Consider all possible variants of loop organization in Assembly language. This would allow you to easily understand how contemporary compilers treat loops. A typical loop structure is shown in Listing 3.79.

Listing 3.79: Typical loop structure

 ... ; Quite often, the JMP L1 instruction is encountered here. _beg: ; In particular, the commands changing the loop parameter ; appear here. ... CMP EAX, EBX ; Or, perhaps, check for some other condition. JZ _end      ; Or, perhaps, some other conditional jump              ; beyond the loop limits. ; The loop body starts here. L1: ... ; The loop body - any number of commands. JMP _beg _end:

Listing 3.79 demonstrates a typical loop structure that you can encounter when studying the code generated by various compilers. Note that the jump to the start of the loop body might be present (JMP L1). If such a jump is present, then the loop body will be executed at least once. This corresponds to the postcondition loop ideology. If there is no jump to the loop body, then in general the loop body does not execute even once. This corresponds to the precondition loop ideology.

A typical example of a postcondition loop is presented in Listing 3.80.

Listing 3.80: Typical example of a postcondition loop

 ... _beg: ; Start of the loop body ... CMP EAX, EBX  ; Or check for some other condition. JZ _beg       ; Or some other conditional jump to the start of the loop ...

In this case, you are dealing with a typical example of the postcondition loop. However, it would be an error to consider that the loop type specified in the high-level programming language would automatically correspond to the loop type in the executable code. Contemporary compilers do not simply transform operators of high-level programming languages to machine commands but sometimes inventively reconstruct them (for some cases, the word inventively should be enclosed in quotation marks). Thus, if you use a precondition loop in your program but variable values are guaranteed to ensure that the loop will be executed at least once, the compiler can use the postcondition loop in the machine code.

The FOR, loops present in the main algorithmic languages are variants of precondition loops. One or more parameters simply participate in formulating the condition. Parameters of such loops, when executing the loop body (or before the next iteration) obtain some constant increment (which might be either positive or negative). The presence of a variable, which is incremented or decremented by a constant value at each iteration, is an important indication that you are dealing with the FOR, loop.

After this brief introduction, it is natural to consider several practical examples. For instance, Listing 3.81 provides the simplest example of using the FOR loop.

Listing 3.81: Simple example of using the FOR loop

 #include <stdio.h> void main() {        int b = 10;        for(int i = 0; i < 100; i++)                 printf("%d\n", b); }

Consider how the Microsoft Visual C++ compiler translates this program, provided that the "no optimization" option was chosen for compilation (Listing 3.82).

Listing 3.82: Disassembled text compiled using Microsoft Visual C++ without optimization

 .text:00401000  _main     proc near        ; CODE XREF: start + 16E↓p .text:00401000      var_8 = dword ptr -8 .text:00401000      var_4 = dword ptr -4 .text:00401000            push    ebp .text:00401001            mov     ebp, esp .text:00401003            sub     esp, 8 .text:00401006            mov     [ebp + var_4], 0Ah .text:0040100D            mov     [ebp + var_8], 0 .text:00401014            jmp     short loc_40101F .text:00401016  loc_401016:                ; CODE XREF: _main + 36↓j .text:00401016            mov     eax, [ebp + var_8] .text:00401019            add     eax, 1 .text:0040101C            mov     [ebp + var_8], eax .text:0040101F  loc_40101F:                ; CODE XREF: _main + 14↑j .text:0040101F            cmp     [ebp + var_8], 64h .text:00401023            jge     short loc_401038 .text:00401025            mov     ecx, [ebp + var_4] .text:00401028            push    ecx .text:00401029            push    offset unk_4060FC .text:0040102E            call    _printf .text:00401033            add     esp, 8 .text:00401036            jmp     short loc_401016 .text:00401038  loc_401038:                ; CODE XREF:  _main + 23↑j .text:00401038            xor     eax, eax .text:0040103A            mov     esp, ebp .text:0040103C            pop     ebp .text:0040103D            retn .text:0040103D _main      endp

First, I'd like to draw your attention to local variables. Obviously, var_4 stands for the b variable. As relates to var_8, it is nothing but the loop parameter. Also, note the mov eax, [ebp + var_8] /add ax, 1/mov [ebp + var_8], eax commands. They immediately attract attention and clearly indicate the FOR loop.

Note that Listing 3.82 corresponds exactly to the program operating method provided in Listing 3.81. The jmp short loc_40101F jump reflects that the initial value of the loop parameter must be 0, and the exit condition of the loop is the equality of the loop parameter to 100. When implementing such an algorithm in Assembly, it would be possible to do without this jump by simply assigning the initial value of-1 to the loop parameter.

From Listing 3.82, it is clear that you are dealing with the precondition loop. However, consider what the result would be if you instructed the compiler to produce compact executable code (Listing 3.83).

Listing 3.83: Disassembled text (Listing 3.81) compiled using the "compact code" option

 .text:00401000  _main      proc near        ; CODE XREF: start + 16E↓p .text:00401000             push    esi .text:00401001             push    64h .text:00401003             pop     esi .text:00401004  loc_401004:                 ; CODE XREF: _main + 13↓j .text:00401004             push    0Ah .text:00401006             push    offset unk_4060FC .text:0040100B             call    _printf .text:00401010             dec     esi .text:00401011             pop     ecx .text:00401012             pop     ecx .text:00401013             jnz     short loc_401004 .text:00401015             xor     eax, eax .text:00401017             pop     esi .text:00401018             retn .text:00401018 _main       endp

Listing 3.83 provides a typical example of the postcondition loop. Note that the role of the loop parameter is delegated to the ESI register. At the same time, instead of incrementing the parameter this code decrements it to make it possible to use the condition of comparing this parameter to zero. This technique is often used by the Microsoft Visual C++ compiler to turn a precondition loop to a postcondition loop. In this case, the parameter increment is replaced with the decrement. An interesting issue is that if the for loop is replaced with the while or do-while loop and the "create compact code" optimization option is preserved, the result will be exactly the same as the one shown in Listing 3.82.

Consider how the program shown in Listing 3.81 is treated by the Borland C++ compiler (Listing 3.84).

Listing 3.84: Disassembled text (Listing 3.81) compiled using the Borland C++ compiler

 .text:00401108  _main      proc near       ; DATA XREF: .data:0040AOB8↓o .text:00401108       argc  = dword ptr 8 .text:00401108       argv  = dword ptr 0Ch .text:00401108       envp  = dword ptr l0h .text:00401108             push    ebp .text:00401109             mov     ebp, esp .text:0040110B             push    ebx .text:0040110C             push    esi .text:0040110D             mov     esi, 0Ah .text:00401112             xor     ebx, ebx .text:00401114  loc_401114:                ; CODE XREF: _main + lE↓j .text:00401114             push    esi .text:00401115             push    offset format   ; Format .text:0040111A             call    _printf .text:0040111F             add     esp, 8 .text:00401122             inc     ebx .text:00401123             cmp     ebx, 64h .text:00401126             jl      short loc_401114 .text:00401128             pop     esi .text:00401129             pop     ebx .text:0040112A             pop     ebp .text:0040112B             retn .text:0040112B _main      endp

The most interesting issue is that the Borland C++ v. 5.0 compiler interprets the same program slightly differently: The precondition loop is transformed to the postcondition loop; however, the loop parameter continues to be incremented (see Listing 3.84). Moreover, note that Microsoft's compiler is more accurate when working with registers: It uses only one register (ESI) that must be stored in the stack.

Loop Optimization

One type of loop optimization was covered in the previous section: conversion of the precondition loop to the postcondition loop. In addition, it was shown that the Microsoft Visual C++ compiler can replace the loop parameter increment with its decrement so that the equality of this parameter to zero is the loop termination condition. However, there are even more efficient methods of loop optimization used by advanced compilers. Here, I'll cover the two most important techniques.

Computation at Compile Time

The programmer often doesn't notice that the result obtained in the course of executing a looping algorithm is self-evident. Advanced compilers, including Microsoft Visual C++, recognize such situations and replace looping algorithms with ready-to-use results. Consider the simple example provided in Listing 3.85.

Listing 3.85: Simple demonstration of computation at compile time optimization

 #include <stdio.h> void main() {        int i = 0, s=0, k = 5;        for(i = 0; i < k; i++) S = S + i;        printf("%d\n", s); }

Principally, it is not difficult to evaluate the result of computing the s value: It will be ten. The Microsoft Visual C++ compiler easily solves this problem. Consider the code that it will produce as the result of compiling the program in Listing 3.85 using the "create fast code" option (Listing 3.86).

Listing 3.86: Disassembled text of the program in Listing 3.85

 .text:00401000  _main    proc near        ; CODE XREF: start + 16E↓p .text:00401000           push    0Ah .text:00401002           push    offset unk_4060FC .text:00401007           call    printf .text:0040100C           add     esp, 8 .text:0040100F           xor     eax, eax .text:00401011           retn .text:00401011 _main     endp

In Listing 3.86, there are no loops. This reconfirms that the compilation process is generally irreversible. However, this is not related to understanding the program operating logic.

Loop Unwinding

Loop unwinding (also known as loop unrolling) is a technique based on the principles of optimizing memory access.

Listing 3.87 presents a test program demonstrating the loop unwinding technique.

Listing 3.87: Test program demonstrating the loop unwinding technique

 #include <stdio.h> void main() {        int a[100];        int i =  0, s = 0;        for(i = 0; i < 100; i++) a[i]   = i;        for(i = 0; i < 100; i++) s += a[i];        printf("%d\n", s); }

Listing 3.87 presents a program using the a array. Compile this program using the Microsoft Visual C++ compiler with the "create fast code" option specified. The disassembled text of the executable code of this program is shown in Listing 3.88.

Listing 3.88: Disassembled text of the executable code of the program in Listing 3.87

 .text:00401000  _main        proc near         ; CODE XREF: start + 16E↓p .text:00401000       var_194 = dword ptr -194h .text:00401000       var_190 = dword ptr -190h .text:00401000       var_18C = dword ptr -18Ch .text:00401000       var_188 = dword ptr -188h .text:00401000       var_184 = dword  ptr -184h .text:00401000       var_180 = dword ptr -180h .text:00401000               sub     esp, 190h .text:00401006               xor     ecx, ecx .text:00401008               xor     eax, eax .text:0040100A               lea     ebx, [ebx + 0] .text:00401010  loc_401010:                   ; CODE XREF: _main + 17 ↓j .text:00401010               mov     [esp + eax*4 + 190h + var_190], eax .text:00401013               inc     eax .text:00401014               cmp     eax, 64h .text:00401017               jl      short loc_401010 .text:00401019               xor     eax, eax .text:0040101B               push    esi .text:0040101C               lea     esp, [esp+0] .text:00401020  loc_401020:                   ; CODE XREF: _main + 3E↓j .text:00401020               mov     esi, [esp + eax*4 + 194h + var_184] .text:00401024               mov     edx, [esp + eax*4 + 194h + var_180] .text:00401028               add     edx, esi .text:0040102A               add     edx, [esp + eax*4 + 194h + var_188] .text:0040102E               add     edx, [esp + eax*4 + 194h + var_18C] .text:00401032               add     edx, [esp + eax*4 + 194h + var_190] .text:00401036               add     eax, 5 .text:00401039               add     ecx, edx .text:0040103B               cmp     eax, 64h .text:0040103E               jl      short loc_401020 .text:00401040               push    ecx .text:00401041               push    offset unk_4060FC .text:00401046               call    _printf .text:0040104B               add     esp, 8 .text:0040104E               xor     eax, eax .text:00401050               pop     esi .text:00401051               add     esp, 190h .text:00401057               retn .text:00401057 main          endp

For storing a local array, 190h bytes are allocated (which is equal to 400) — in other words, 4 bytes per element.

I'd like to immediately draw your attention to two commands: lea ebx, [ebx] and lea esp, [esp]. Neither command changes the contents of the registers; the compiler has inserted them to optimize the execution speed. Earlier in this chapter, I mentioned the optimization technique of command pairing (see the comments that follow Listing 3.2). In this case, the compiler adds commands that do not mean anything but correctly divide the commands by pairs.

The loop, in which the values of array elements are specified, is simple. It is located in the listing from addresses 00401010 to 00401017. Note that addressing of local variables in the stack is not done using the EBP register for the reasons of optimization. On the contrary, the ESP register is used for addressing of local variables. The EAX register plays the role of the i variable (this is an example of using a register variable). Note that in this loop the compiler doesn't use the technique of replacing the loop parameter increment with the decrement, which was described earlier. This is because the loop parameter also plays the role of the array index.

Now, consider the next loop, located from the 00401020 to the 0040103E address. The goal of this loop is to sum all array elements. The sum must be accumulated in the s variable (see Listing 3.87). As can be easily seen, the role of the s variable is delegated to the ECX register (for instance, this can be noticed by the call to the printf function). The most important detail is that the loop is preceded by the PUSH ESI command. The reason for the presence of this command is obvious: The ESI register is further used as a temporary variable, and this register must not be changed after execution of the main function. However, do not forget that addressing is relative to the ESP register and the compiler later corrects the addressing of the array elements. Thus, when computing the array index, it is necessary to subtract one (or to subtract four when computing the element address). The result will be as shown in Listing 3.89.

Listing 3.89: Computing array elements

 [esp + eax*4 + 194h + var_184] = [esp + eax*4 + 4*4], whichcorreqponds to a[i + 3] [esp + eax*4 + 194h + var_180] = [esp + eax*4 + 5*4], whichcorreqponds to a[i + 4] [esp + eax*4 + 194h + var_188] = [esp + eax*4 + 3*4], whichcorreqponds to a[i + 2] [esp + eax*4 + 194h + var_18C] = [esp + eax*4 + 2*4], whichcorreqponds to a[i + 1] [esp + eax*4 + 194h + var_190] = [esp + eax*4 + 1*4], whichcorresponds to a[i]

In the beginning, the a[i + 3] and a[i + 4] elements are added, and the result is loaded into the EDX register. Then the a[i + 2], a[i + 1] and a[i] elements are added to the sum. The resulting sum is added to the ECX register, which, as you know, is the s variable. Finally, the index and the loop parameter are increased by five instead of one. Thus, it is possible to state that the compiler has implemented the loop in Listing 3.90.

Listing 3.90: Loop implemented by the compiler

 for(i = 0; i < 100; i += 5) {        s = s + a[i];        s = s + a[i + 1];        s = s + a[i + 2];        s = s + a[i + 3];        s = s + a[i + 4]; }

From the algorithmic point of view, this loop is the full equivalent of the loop in Listing 3.87. From the optimization point of view, it implements loop unwinding, which allows you to considerably speed up program execution.

Nested Loops and Loops with Complex Exit Conditions

In the previous section, I considered executable code structures corresponding to the simplest loops. However, looping algorithms might be complicated by the following factors:

Loop nesting
Complex exit condition
Auxiliary loop control operators, such as break and continue

Consider the following two examples. The first example contains simultaneously complex exit condition and auxiliary loop control operators (Listing 3.91). I intentionally did not provide the source code of the C++ program here, because your main goal is to understand the code operating logic, not to discover how the compiler translated the program source code into the executable code.

Listing 3.91: Simultaneous complex loop exit condition and auxiliary loop control operators

 .text:00401000  _main      proc near       ; CODE XREF: start + 16E↓p .text:00401000       var_C = dword ptr -0Ch .text:00401000       var_8 = dword ptr -8 .text:00401000       var_4 = dword ptr -4 .text:00401000             push    ebp .text:00401001             mov     ebp, esp .text:00401003             sub     esp, 0Ch .text:00401006             mov     [ebp + var_4], 0 .text:0040100D             mov     [ebp + var_C], 0 .text:00401014             mov     [ebp + var_8], 0 .text:0040101B             jmp     short loc_401026 .text:0040101D  loc_40101D:                 ; CODE XREF: _main + 51↓j .text:0040101D                              ; main + 74↓j .text:0040101D             mov     eax, [ebp + var_8] .text:00401020             add     eax, 1 .text:00401023             mov     [ebp + var_8], eax .text:00401026  loc_401026:                 ; CODE XREF: _main + 1B↑j .text:00401026             cmp     [ebp + var_8], 2710h .text:0040102D             jge     short loc_401076 .text:0040102F             cmp     [ebp+var_4], OC350h .text:00401036             jge     short loc_401076 .text:00401038             mov     ecx, [ebp + var_4] .text:0040103B             add     ecx, [ebp + var_C] .text:0040103E             add     ecx, [ebp + var_8] .text:00401041             mov     [ebp + var_4], ecx .text:00401044             mov     [ebp + var_4], lEh .text:0040104B             cmp     [ebp + var_4], 0 .text:0040104F             jz      short loc_401053 .text:00401051             jmp     short loc_40101D .text:00401053  loc_401053:                 ; CODE XREF: _main + 4F↑j .text:00401053             mov     eax, [ebp + var_4] .text:00401056             cdq .text:00401057             idiv    [ebp + var_8] .text:0040105A             mov     [ebp + var_C], eax .text:0040105D             cmp     [ebp + var_C], 64h .text:00401061             jnz     short loc_401065 .text:00401063             jmp     short loc_401076 .text:00401065  loc_401065:                 ; CODE XREF: _main + 61↑J .text:00401065             mov     edx, [ebp + var_C] .text:00401068             mov     [ebp + var_C], edx .text:0040106B             mov     eax, [ebp + var_C] .text:0040106E             add     eax, 1 .text:00401071             mov     [ebp + var_C], eax .text:00401074             jmp     short loc_40101D .text:00401076  loc_401076:                 ; CODE XREF: _main + 2D↑j .text:00401076                              ; _main + 36↑j ... .text:00401076             xor     eax, eax .text:00401078             mov     esp, ebp .text:0040107A             pop     ebp .text:0040107B             retn .text:0040107B _main       endp

First, consider the fragment located from the 0040101D to the 00401023 address. This is a typical loop header, or, to be precise, the part of the loop that increments the value of the loop parameter. Accordingly, the entry point into the loop is below this fragment. Entry into the loop is executed by the jmp short loc_40l026 command. You have already encountered such fragments. Thus, the start of the loop is identified clearly: This is the loc_40101D label. When viewing the lower part of the listing, pay attention to the jmp short loc_40l0lD command. Because below it there are no commands that would jump to labels between the 0040101D and the 00401074 addresses, it is possible to assume that this is the last command of the loop. Note that principally the loop under consideration could be nested into another loop; however, the current analysis doesn't relate to this issue. Thus, the loop boundaries have been identified.

Another question arises: What are the loop exit conditions? In essence, it is not important whether this condition is written in the loop header or the exit is carried out by the break operator. The issue that does matter is that the exit from the loop is by the 00401076 address. Thus, you'll see the commands in Listing 3.92.

Listing 3.92: Commands for loop exit

 .text:0040102D              jge       short loc_401076 ... .text:00401036              jge       short loc_401076 ... .text:00401063              jmp       short loc_401076

The last jump is slightly below the first two and takes place if the var_c variable is equal to 64h — in other words, to 100. To all appearances, this is the exit by the break operator. The first two jumps correspond to the loop header and to the logical AND condition. Thus, without any trouble it is possible to determine that the loop execution condition can be written as var_8 < 2710h && var_4 < c350h. Note that the var_8 variable is the loop parameter (see addresses 0040101D-00401023 in Listing 3.91). In essence, only the role of the commands shown in Listing 3.93 needs to be clarified.

Listing 3.93: Commands whose role in the loop structure needs to be clarified

 .text:0040104B             cmp    [ebp + var_4], 0 .text:0040104F             jz     short loc_401053 .text:00401051             jmp    short loc_40101D

If the var_4 variable has a nonzero value, then there will be a jump to the start of the loop. This cannot be anything other than the continue operator.

Now, consider an example of nested loops. Manipulations of multidimensional arrays are typical examples of nested loops. Listing 3.94 presents the disassembled code that fills a two-dimensional array. It is necessary to mention that this program was compiled using the Microsoft Visual C++ compiler with the "no optimization" option.

Listing 3.94: Disassembled code that fills a two-dimensional array

 .text:00401000  _main         proc near   ; CODE XREF: start + 16E↓p .text:00401000       var_198  = dword ptr -198h .text:00401000       var_194  = dword ptr -194h .text:00401000       var_190  = dword ptr -190h .text:00401000              push    ebp .text:00401001              mov     ebp, esp .text:00401003              sub     esp, 198h .text:00401009              mov     [ebp + var_194], 0 .text:00401013              jmp     short loc_401024 .text:00401015  loc_401015:               ; CODE XREF: _main:loc_401078↓j .text:00401015              mov     eax, [ebp + var_194] .text:0040101B              add     eax, 1 .text:0040101E              mov     [ebp + var_194], eax .text:00401024  loc_401024:               ; CODE XREF: _main + 13↑j .text:00401024              cmp     [ebp + var_194], 0Ah .text:0040102B              jge     short loc_40107A .text:0040102D              mov     [ebp + var_198], 0 .text:00401037              jmp     short loc_401048 .text:00401039  loc_401039:               ; CODE XREF: _main + 76↓j .text:00401039              mov     ecx, [ebp + var_198] .text:0040103F              add     ecx, 1 .text:00401042              mov     [ebp + var_198], ecx .text:00401048  loc_401048:               ; CODE XREF: _main + 37↑j .text:00401048              cmp     [ebp + var_198], 0Ah .text:0040104F              jge     short loc_401078 .text:00401051              mov     edx, [ebp + var_194] .text:00401057              add     edx, [ebp + var_198] .text:0040105D              mov     eax, [ebp + var_194] .text:00401063              imul    eax, 28h .text:00401066              lea     ecx, [ebp + eax + var_190] .text:0040106D              mov     eax, [ebp + var_198] .text:00401073              mov     [ecx + eax*4], edx .text:00401076              jmp     short loc_401039 .text:00401078  loc_401078:               ; CODE XREF: _main + 4F↑j .text:00401078              jmp     short loc_401015 .text:0040107A  loc_40107A:               ; CODE XREF: _main + 2B↑j .text:0040107A              xor     eax, eax .text:0040107C              mov     esp, ebp .text:0040107E              pop     ebp .text:0040107F              retn .text:0040107F _main        endp

It is not difficult to locate nested loops in this listing. The starting point of the nesting loop at the 00401015 address and the starting point of the nested loop at the 00401039 address can be noticed easily. I'd like to draw your attention to the jmp short loc_401024 and jmp short loc_401048 instructions that carry out initial entries into the nesting and into the nested loops, respectively. Thus, the structure of the nested loops in this case is easy and doesn't require any additional comments.

The most interesting issue here is how the executable code implements the algorithm assigning the value to the two-dimensional array. The parameter of the nesting loop is stored in the var_198 variable. The remaining var_190 variable points to the starting point of the two-dimensional array. Thus, after the mov edx, [ebp + var_194] and add edx, [ebp + var_198] commands, the EDX register will contain the sum of the two index values. If you run a few steps forward to the mov [ecx + eax* 4], edx command, which appears like an operator that assigns the value to the array element, it becomes obvious that array elements are assigned values equal to the sum of indexes (the values of the parameters of the nesting and the nested loops). However, look back several commands. What is the meaning of the mov eax, [ebp + var_194] /imul eax, 28h commands? The index value (consider it to be the first) is multiplied by the number of bytes in the row of a two-dimensional array (4*10 = 40 = 28h). Then, by executing the lea ecx, [ebp + eax + var_190] command, the address of the starting point of the current row is loaded into the ECX register. Finally, ecx + eax*4 is the address of the current element of the two-dimensional array. Thus, the algorithm in Listing 3.94 can be clarified easily.

To complete the description of loops, it is necessary to mention that if the same program operating over a two-dimensional array is compiled using the Microsoft Visual C++ "create fast code" option, the result would be interesting. Namely, the nested loop would disappear, because the compiler would unroll it (see Listing 3.93 and the comments that follow it). The nesting loop will be transformed from the precondition loop to the postcondition loop.

3.2.4. Objects

Identification of objects and everything related to them is a more complicated problem than the ones solved earlier. However, you have already encountered most of the material provided here. After all, methods are functions, and object properties are variables. ^[10] It is time to consider this problem in more detail and in due order.

Identifying Objects

Static Objects

First, consider a simple example intended to demonstrate typical program structures that serve objects. The program in Listing 3.95 has only one class, on the basis of which a single object is created. Note that this is a global static object. In other words, the compiler must take care to allocate memory for storing that object. The most important issue in object-oriented programming is deciding whether the created properties and methods relate to a single object or to several objects simultaneously. It is necessary to answer the following question: If, for example, a method (in essence, representing some function) relates to several objects simultaneously, how does it "know", in relation to which object it has been called from a specific program location?

Listing 3.95: Simple program demonstrating typical program structures for serving objects

 #include <stdio.h> class A { public:        int b;        int a;        int geta(){b = 0; return a;};        void seta(int); }; void A::seta(int al) {        a = al;        b = 1; }; A Al; void main() {        Al.seta(10);        int c = Al.geta();        printf("%d\n", c) ; }

Listing 3.96 provides the executable code of the main function from Listing 3.95, disassembled using IDA Pro. The program was compiled using Microsoft Visual C++ with the "no optimization" option.

Listing 3.96: Disassembled code of the main function from Listing 3.95

 .text:00401020  _main      proc near       ; CODE XREF: start + 16E↓p .text:00401020       var_4 = dword ptr -4 .text:00401020             push    ebp .text:00401021             mov     ebp, esp .text:00401023             push    ecx .text:00401024             push    0Ah .text:00401026             mov     ecx, offset unk_4086CO .text:0040102B             call    sub_401000 .text:00401030             mov     ecx, offset unk_4086CO .text:00401035             call    sub_401060 .text:0040103A             mov     [ebp + var_4], eax .text:0040103D             mov     eax, [ebp + var_4] .text:00401040             push    eax .text:00401041             push    offset unk_4060FC .text:00401046             call    _printf .text:0040104B             add     esp, 8 .text:0040104E             xor     eax, eax .text:00401050             mov     esp, ebp .text:00401052             pop     ebp .text:00401053             retn .text:00401053  _main      endp

The text presented in Listing 3.96 contains three function calls. The first call is to the printf library function, so it doesn't make sense to consider it in detail. The sub_401000 and sub_40l060 functions deserve special attention. In the comments following Listing 3.97, the code of these functions will be described in more detail. Because the source code of the program is available (see Listing 3.95), it is possible to tell for certain that these are the calls to the seta and geta methods, respectively. Note that before calling the method, in both cases, some memory address is sent into the ECX register: offset unk_4086C0. Click the reference, and you will find that this memory area is made up of 8 bytes (at least, IDA Pro has decided so). This must make you vigilant, because the object contains two int properties, which makes exactly 8 bytes. Thus, already on the basis of preliminary considerations it is possible to conclude that when calling a method, one parameter is the address of the object, in relation to which the given method is called (for the Microsoft Visual C++ compiler, the parameter is passed through the ECX register).

Listing 3.97: Disassembled code (Listing 3.96) corresponding to seta/geta calls (Listing 3.95)

 .text:00401000  sub_401000  proc near        ; CODE XREF: _main + B↓p .text:00401000        var_4 = dword ptr -4 .text:00401000        arg_0 = dword ptr  8 .text:00401000              push    ebp .text:00401001              mov     ebp, esp .text:00401003              push    ecx .text:00401004              mov     [ebp + var_4], ecx .text:00401007              mov     eax, [ebp + var_4] .text:0040100A              mov     ecx, [ebp + arg_0] .text:0040100D              mov     [eax + 4], ecx .text:00401010              mov     edx, [ebp + var_4] .text:00401013              mov     dword ptr [edx], 1 .text:00401019              mov     esp, ebp .text:0040101B              pop     ebp .text:0040101C              retn    4 .text:0040101C  sub_401000  endp .text:00401060  sub_401060   proc near       ; CODE XREF: _main + 15↑p .text:00401060         var_4 = dword ptr -4 .text:00401060              push    ebp .text:00401061              mov     ebp, esp .text:00401063              push    ecx .text:00401064              mov     [ebp + var_4], ecx .text:00401067              mov     eax, [ebp + var_4] .text:0040106A              mov     dword ptr [eax], 0 .text:00401070              mov     ecx, [ebp + var_4] .text:00401073              mov     eax, [ecx + 4] .text:00401076              mov     esp, ebp .text:00401078              pop     ebp .text:00401079              retn .text:00401079  sub_401060  endp

Note

The pointer to an object used in a method has a special name in the C++ language: this. Using the this pointer, it is possible to access the class members, including the closed ones.

Consider the text of the sub_401000 function. As already mentioned, to all appearances this is the seta function. Note immediately that this function has only one stack variable and one parameter. The stack is released by the RETN 4 command. The sequence of the push ecx/mov [ebp + var_4], ecx commands immediately draws attention. This is a nuisance. The push command here is used to reserve memory for the local variable. At the same time, the variable is assigned the value stored in the ECX register. The command that follows the PUSH command again assigns the same value to the variable. Well, the compiler can be excused for this incident, because, after all, it was not instructed to use optimization. Further operations are the matter of technique: The value considered the object address is loaded into the EAX register. The mov [eax + 4], ecx command follows (and ECX now contains the parameter value). In other words, the a object property is located at the offset of 4 bytes in relation to the start of the object. Then there is the sequence of the mov edx, [ebp + var_4] /mov dword ptr [edx], 1 commands, which corresponds to assignment of the value 1 to the b property.

The sub_401060 function doesn't contain anything new. The mov eax, [ecx + 4] command simply returns the a property of the geta function.

Thus, having considered the example of compilation using the Microsoft Visual C++ compiler, it is possible to note that when calling a method, it is implicitly passed the pointer to the object, in the context of which it is called. In this example, a global object was created. However, there will be no difference when creating an object locally in the stack. Try to conduct this investigation on your own.

If the program presented in Listing 3.95 is compiled using the Borland C++ v. 5.0 compiler, there won't be any significant difference. Again, the method will be informed about the object address by the passing of an additional parameter. This additional parameter is the last to be passed through the stack in relation to all other parameters. I won't provide any additional listings here because they won't teach you anything new.

Dynamic Objects

In real-world programming, new objects are typically created on the fly. The new operator is intended specially for this purpose. Consider the program from Listing 3.95 rewritten in terms of dynamically-created objects (Listing 3.98).

Listing 3.98: Example illustrating the use of dynamic objects

 #include <stdio.h> class A { public:        int b;        int a;        int geta(){b = 0; return a;};        void seta(int); }; void A::seta(int al) {        a = a1;        b = 1; }; void main() {        A * A1 = new (A) ;        Al -> seta(10);        int c = A1 -> geta();        printf("%d\n", c) ;        delete Al; }

Listing 3.99 presents the disassembled text of the executable code created by the Microsoft Visual C++ compiler with the "no optimization" option.

Listing 3.99: Disassembled text of the program shown in Listing 3.98

 .text:00401020  _main      proc near        ; CODE XREF: start + 16E↓p .text:00401020       var_10 = dword ptr -10h .text:00401020       var_C  = dword ptr -OCh .text:00401020       var_8  = dword ptr -8 .text:00401020       var_4  = dword ptr -4 .text:00401020             push    ebp .text:00401021             mov     ebp, esp .text:00401023             sub     esp, 10h .text:00401026             push    8 .text:00401028             call    ??2@YAPAXI@Z      ; Operator new(uint) .text:0040102D             add     esp,   4 .text:00401030             mov     [ebp + var_C], eax .text:00401033             mov     eax, [ebp + var_C] .text:00401036             mov     [ebp + var_8], eax .text:00401039             push    OAh .text:0040103B             mov     ecx, [ebp + var_8] .text:0040103E             call    sub_401000 .text:00401043             mov     ecx,   [ebp + var_8] .text:00401046             call    sub_401080 .text:0040104B             mov     [ebp + var_4], eax .text:0040104E             mov     ecx, [ebp + var_4] .text:00401051             push    ecx .text:00401052             push    offset unk_4060FC .text:00401057             call    _printf .text:0040105C             add     esp,   8 .text:0040105F             mov     edx,   [ebp + var_8] .text:00401062             mov     [ebp + var_10], edx .text:00401065             mov     eax, [ebp + var_10] .text:00401068             push    eax .text:00401069             call    j__free .text:0040106E             add     esp, 4 .text:00401071             xor     eax, eax .text:00401073             mov     esp, ebp .text:00401075             pop     ebp .text:00401076             retn .text:00401076  _main      endp

The abundance of redundant code and redundant stack variables in this code fragment is surprising. In this case, however, this is of little or no importance. It is important here that after creation of an object all further actions have principally no differences from the ones encountered in Listing 3.96. Note that the new and delete operators (the call to the j__free procedure) are recognized by the IDA Pro disassembler, which makes further analysis a simple matter.

Virtual Functions

Virtual functions are a kind of payment for the beauty and ease of object-oriented programming theory. Consider the program shown in Listing 3.100, demonstrating a typical example of inheritance.

Listing 3.100: Example program demonstrating a typical example of inheritance

 #include <stdio.h>  class A { public:        int a;        int seta(int al){a = al; return a;};        void pa(){printf("%d\n", a);} }; class B:public A { public:        int seta (int al){a = al + 1; return a;}; }; void main() {        A* Al;        Al = new(B);        Al -> seta(10);        Al -> pa();        delete Al; };

The B class inherits properties and methods of the A class. In this case, the B class has the seta method that by its name and parameters matches the similar method in class A. If, for example, you create an object on the basis of the B class, then when the seta method is called, the method from the B class will be called. This is a well-known inheritance property. The situation will be slightly different if you first create the pointer to the object of the base class A and then create a new object using the new operator on the base of the B class template (see Listing 3.100). This time, the compiler will orient toward the pointer type. In this case, A1 -> seta (10) will mean the call to the method of the base class. Listing 3.101 shows the disassembled code created by the Microsoft Visual C++ compiler using the "no optimization" option.

Listing 3.101: Disassembled code of the program compiled by Microsoft Visual C++ without optimization

 .text:00401000  _main      proc near        ; CODE XREF: start + 16E↓p .text:00401000      var_C  = dword ptr -OCh .text:00401000      var_8  = dword ptr -8 .text:00401000      var_4  = dword ptr -4 .text:00401000             push    ebp .text:00401001             mov     ebp, esp .text:00401003             sub     esp, OCh .text:00401006             push    4 .text:00401008             call    ??2@YAPAXI@Z    ; Operator new(uint) .text:0040100D             add     esp, 4 .text:00401010             mov     [ebp + var_8], eax .text:00401013             mov     eax, [ebp + var_8] .text:00401016             mov     [ebp + var_4], eax .text:00401019             push    OAh .text:0040101B             mov     ecx, [ebp + var_4] .text:0040101E             call    sub_401050 .text:00401023             mov     ecx, [ebp + var_4] .text:00401026             call    sub_401070 .text:0040102B             mov     ecx, [ebp + var_4] .text:0040102E             mov     [ebp + var_C], ecx .text:00401031             mov     edx, [ebp + var_C] .text:00401034             push    edx .text:00401035             call    j__free .text:0040103A             add     esp, 4 .text:0040103D             xor     eax, eax .text:0040103F             mov     esp, ebp .text:00401041             pop     ebp .text:00401042             retn .text:00401042 _main       endp .text:00401042 .text:00401050 .text:00401050 sub_401050  proc near        ; CODE XREF: _main + 1E↑p .text:00401050      var_4  = dword ptr -4 .text:00401050      arg_0  = dword ptr  8 .text:00401050             push    ebp .text:00401051             mov     ebp, esp .text:00401053             push    ecx .text:00401054             mov     [ebp + var_4], ecx .text:00401057             mov     eax, [ebp + var_4] .text:0040105A             mov     ecx, [ebp + arg_0] .text:0040105D             mov     [eax], ecx .text:0040105F             mov     edx, [ebp + var_4] .text:00401062             mov     eax, [edx] .text:00401064             mov     esp, ebp .text:00401066             pop     ebp .text:00401067             retn    4 .text:00401067 sub_401050  endp .text:00401067 .text:00401070 .text:00401070 sub_401070  proc near       ; CODE XREF: _main + 26↑p .text:00401070      var_4  = dword ptr -4 .text:00401070             push    ebp .text:00401071             mov     ebp, esp .text:00401073             push    ecx .text:00401074             mov     [ebp + var_4], ecx .text:00401077             mov     eax, [ebp + var_4] .text:0040107A             mov     ecx, [eax] .text:0040107C             push    ecx .text:0040107D             push    offset unk_4060FC .text:00401082             call    _printf .text:00401087             add     esp, 8 .text:0040108A             mov     esp, ebp .text:0040108C             pop     ebp .text:0040108D             retn .text:0040108D sub_401070  endp

This listing doesn't contain anything principally new from the issues investigated in the previous section (see Listing 3.99). Here, 4 bytes are allocated for creating a new object, and the pointer to the newly-created object is passed through the ECX register (see the calls to the sub_401050 and sub_401070 functions). In the functions, the pointer to object is used for accessing the object property. For example, in sub_401050 the code shown in Listing 3.102 can be located.

Listing 3.102: Fragment of the sub_401050 function (Listing 3.101)

 mov    [ebp + var_4], ecx ; The object address is loaded                           ; into the stack variable. mov    eax, [ebp + var_4] ; The address is loaded into the EAX register. mov    ecx, [ebp + arg_0] ; The parameter value is loaded                           ; into the ECX register. mov    [eax], ecx         ; The parameter value is assigned to                           ; the object property.

It is time to consider the concepts of virtual functions and polymorphism. For this purpose, consider the program in Listing 3.103. In comparison to Listing 3.100, another method, seta1, has been added to the base class, and the seta and seta1 methods have been made virtual. If you are acquainted with object-oriented programming, you will immediately guess that in the A1 -> seta (10) and A1 -> seta1(10) calls, the methods of the derived class (the B class) will be used.

Listing 3.103: Program for studying the concepts of virtual functions and polymorphism

 #include <stdio.h> class A { public:        int a;        virtual int seta(int a1){a = al; return a;};        virtual int setal(int a1){a = 2*al; return a;};        void pa(){printf("%d\n", a);} }; class B:public A { public:        int seta(int a1) {a = al + 1; return a;};        int setal(int al){a = 2*al + 1; return a;}; }; void main() {        A* A1;        A1 = new(B);        Al -> seta(10);        Al -> pa{);         Al -> setal(10);        A1 -> pa();        delete A1; };

Listing 3.104 shows the disassembled executable code of the main function from Listing 3.103. As usual, the program was compiled using the Microsoft Visual C++ compiler with the "no optimization" option.

Listing 3.104: Disassembled code of the main function from Listing 3.103

 .text:00401000  _main       proc near       ; CODE XREF: start + 16E↓p .text:00401000       var_10 = dword ptr -10h .text:00401000       var_C  = dword ptr -0Ch .text:00401000       var_8  = dword ptr -8 .text:00401000       var_4  = dword ptr -4 .text:00401000              push    ebp .text:00401001              mov     ebp, esp .text:00401003              sub     esp, 10h .text:00401006              push    8 .text:00401008              call    ??2@YAPAXI@Z   ; Operator new(uint) .text:0040100D              add     esp, 4 .text:00401010              mov     [ebp + var_8], eax .text:00401013              cmp     [ebp + var_8], 0 .text:00401017              jz      short loc_401026 .text:00401019              mov     ecx, [ebp + var_8] .text:0040101C              call    sub_4010A0 .text:00401021              mov     [ebp + var_10], eax .text:00401024              jmp     short loc_40102D .text:00401026  loc_401026:                  ; CODE XREF: _main + 17↑j .text:00401026              mov     [ebp + var_10], 0 .text:0040102D  loc_40102D:                  ; CODE XREF: _main + 24↑j .text:0040102D              mov     eax, [ebp + var_10] .text:00401030              mov     [ebp + var_4], eax .text:00401033              push    0Ah .text:00401035              mov     ecx, [ebp + var_4] .text:00401038              mov     edx, [ecx] .text:0040103A              mov     ecx, [ebp + var_4] .text:0040103D              call    dword ptr [edx] .text:0040103F              mov     ecx, [ebp + var_4] .text:00401042              call    sub_401080 .text:00401047              push    OAh .text:00401049              mov     eax, [ebp + var_4] .text:0040104C              mov     edx, [eax] .text:0040104E              mov     ecx, [ebp + var_4] .text:00401051              call    dword ptr [edx + 4] .text:00401054              mov     ecx, [ebp + var_4] .text:00401057              call    sub_401080 .text:0040105C              mov     eax, [ebp + var_4] .text:0040105F              mov     [ebp + var_C], eax .text:00401062              mov     ecx, [ebp + var_C] .text:00401065              push    ecx .text:00401066              call    j_free .text:0040106B              add     esp, 4 .text:0040106E              xor     eax, eax .text:00401070              mov     esp, ebp .text:00401072              pop     ebp .text:00401073              retn .text:00401073  _main       endp

Note that when creating an object, 8 bytes are allocated (push 8) instead of the 4 bytes allocated earlier. Running some steps ahead, I'd like to mention that the additional 4 bytes located in the beginning of the object are allocated for the address of the table of virtual functions.

In the text, you can see that the check is carried out, ensuring that the new operator has allocated memory for the object to be created. If the memory allocation function returns zero (which means that an error has occurred), then the var_10 variable is assigned the value of zero, which finally must cause an exception (the mov edx, [ecx] command if the ECX register contains zero).

Pay special attention to the sub_4010A0 function (see Listing 3.106). This is a special function, which is not present in the program code. Its intention is to place the address of the virtual functions table into the start of the area allocated for the object to be created. The function will return the object address — but with the address of the virtual functions table (in the first 4 bytes). Later, the mechanism in Listing 3.105 will be used.

Listing 3.105: Mechanism used for forming the virtual function address

 mov    ecx, [ebp + var_4] ; The object's address mov    edx, [ecx]         ; The contents of the start of the area                           ; The object is loaded into the EDX register.                           ; The address of the virtual functions table mov    ecx, [ebp + var_4] call   dword ptr [edx]    ; The call to the virtual function

Thus, you can see that the virtual function address, in contrast to a normal function, is formed dynamically. Any indirect call (no matter what the compiler might be) must make you vigilant, suspecting a call to a virtual function. In the program being studied, virtual functions are called twice: call dword ptr [edx] (the call to seta) and call dword ptr [edx + 4] (the call to setal). The EDX register points to the start of the virtual functions table. The table must contain the addresses of two virtual functions.

Consider Listing 3.106, where the disassembled text of the sub_4010A0 function is shown. The main goal of this function is to place the address of the virtual functions table into the start of the area that stores the object.

Listing 3.106: Disassembled code of the sub_4010A0 function

 .text:004010A0  sub_4010A0    proc near     ; CODE XREF: _main + 1C↑p .text:004010A0        var_4   = dword ptr -4 .text:004010A0             push   ebp .text:004010A1              mov    ebp, esp .text:004010A3              push   ecx .text:004010A4              mov    [ebp + var_4], ecx .text:004010A7              mov    ecx, [ebp + var_4] .text:004010AA              call   ??Oios_base@std@@IAE@XZ .text:004010AF              mov    eax, [ebp + var_4] .text:004010B2              mov    dword ptr [eax], offset off_407100 .text:004010B8              mov    eax, [ebp + var_4] .text:004010BB              mov    esp, ebp .text:004010BD              pop    ebp .text:004010BE              retn .text:004010BE  sub_4010A0  endp

The function presented in the listing has an interesting organization. First, the address of the virtual functions table is defined and placed into the start of the area where the object is stored. These commands are in Listing 3.107.

Listing 3.107: Defining and placing the address of the virtual functions table

 ... mov    [ebp + var_4], ecx ... mov    eax, [ebp + var_4] mov    dword ptr [eax], offset off_407100 ...

The memory area with the off_407100 address stores the virtual functions table. What about the call to the ??Oios_base@std@@IAE@xz function? Here, it is necessary to understand that the class hierarchy in the text program has only two levels: the base class, A, and the derived class, B. Tables of virtual functions are created for each class. The ??Oios_base@std@@IAE@xz function places the address of the virtual functions table of the base class into the object. In this case, the address is overwritten with the address of the virtual functions table of class B. Now assume that another class, c, is added to this hierarchy. This class is the descendant of class B. Create an object on the basis of the C class: A1=new(C). How would the compiler treat the tables of virtual functions in this case? Consider Listing 3.108.

Listing 3.108: Virtual functions table in a sophisticated class hierarchy

 ; The main function _main proc near        ...        call prod ; Now the object contains the address of the ; virtual functions table of class C.         ... _main endp ... prod proc near        ...        call proc2 ; Now the object contains the address of the ; virtual functions table of class B.        ... ; Now the object contains the address of the ; virtual functions table of class C.        ...        retn prod endp        ... Proc2 proc near        ...        call ??Oios_base@std@@IAE@XZ        ... ;  Now the object contains the address of the ;  virtual functions table of the base class.        ... ;  Now the object contains the address of the ;  virtual functions table of class B.        ...        retn prod endp

Carefully consider the method presented in Listing 3.108. From this method, it must be clear how the addresses of virtual functions are written for the hierarchy of objects with an arbitrary number of members. In particular, from this listing it follows that the greater the number of members, the more time required to complete such an operation, because the number of nested procedures equals the number of classes, including the base class. Also note that the Microsoft Visual C++ compiler places the virtual functions tables of the parent and child classes one after another and that the base class has the greatest address (addresses grow from bottom to top). The functions are placed in the table from bottom to top, according to their declarations in the program text.

Thus, knowing the address of the virtual functions table, you'll easily locate the text of that function. For example, Listing 3.109 shows the code of the seta virtual function defined in class B.

Listing 3.109: Code of the seta virtual function defined in class B

 .text:004010C0  sub_4010CO  proc near    ; DATA XREF: .rdata:off_407100↓o .text:004010C0       var_4  = dword ptr -4 .text:004010C0       arg_0  = dword ptr  8 .text:004010C0              push    ebp .text:004010C1              mov     ebp, esp .text:004010C3              push    ecx .text:004010C4              mov     [ebp + var_4], ecx .text:004010C7              mov     eax, [ebp + arg_0] .text:004010CA              add     eax, 1 .text:004010CD              mov     ecx, [ebp + var_4] .text:004010DO              mov     [ecx + 4], eax .text:004010D3              mov     edx, [ebp + var_4] .text:004010D6              mov     eax, [edx + 4] .text:004010D9              mov     esp, ebp .text:004010DB              pop     ebp .text:004010DC              retn    4 .text:004010DC sub_4010C0   endp

The text of the seta procedure is clear. First, the object address, as before, is contained in the ECX register. The arg_0 parameter is the value that must be assigned to the a property (see Listing 3.103). Finally, when assigning the value, it must be increased by one (Listing 3.110).

Listing 3.110: When assigning the value, it must be incremented by one (Listing 3.109)

 add    eax, 1 mov    ecx, [ebp + var_4] mov    [ecx + 4], eax

From Listings 3.109 and 3.110, it is clear that the a property is located at the offset 4 bytes from the object start. This is correct, because the first 4 bytes contain the address of the virtual functions table. If optimization options are specified at compile time, the compiler reduces the code. For example, the procedures, where the virtual function address is written to the object, disappear because all operations are carried out directly in the main function (in the example under consideration). In addition, the address of the last class is immediately written into the object.

If you use the Borland C++ v. 5.0 compiler, you won't notice any significant differences. The address of the virtual functions table also is placed in the start of the object. In the course of initialization, it also is overwritten starting from the base class (or, to be precise, starting from the class, in which the virtual keyword appeared first) and ending with the current derived class.

Constructors and Destructors

The need for constructors and destructors is the logical consequence of the concept of object-oriented programming, especially as related to visual programming for the Windows operating system. There is an urgent need for automating some actions that must be carried out when objects are created and when they are destroyed. In particular, this relates to initial values of the object properties that cannot be initialized when they are declared, as was the general practice with the C programming language. On the other hand, calling the initialization procedure every time is bad programming style. ^[11] Until now, I have not used constructors or destructors in the examples. However, this cannot be said about the compiler. For example, determining the address of the virtual functions table is exactly the operation that must be carried out automatically. This means that you have already encountered constructors in this chapter! You first encountered a constructor in Listing 3.104. This is the sub_4010A0 procedure that was separately considered in Listing 3.106. The only intention of this procedure is writing the address of the correct virtual functions table into the start of the object. However, if you define the constructor explicitly and carry out specific actions there, these actions will be "in the same company" as the actions for defining the virtual functions table.

Consider the practical example presented in Listing 3.111. In this program, a single class A is defined, in which there is one variable, one virtual method, and special methods: constructor and destructor.

Listing 3.111: Practical example illustrating the use of constructors and destructors

 #include <stdio.h> class A { public:        int a;        virtual void pa(){printf("%d\n", a);}        A(){a = 1; printf("Constructor A\n");};        ~A(){printf("Destructor A\n");}; }; void main() {        A* A1;        A1 = new(A);        Al -> pa () ;        delete A1; };

The disassembled code of the main function of this program is presented in Listing 3.112. As before, this code was created using the Microsoft Visual C++ compiler with the "no optimization" option.

Listing 3.112: Disassembled code of the main function from Listing 3.111

 .text:00401000  _main       proc near           ; CODE XREF: start + 16E↓p .text:00401000       var_18  = dword ptr -18h .text:00401000       var_14  = dword ptr -14h .text:00401000       var_10  = dword ptr -10h .text:00401000       var_C   = dword ptr -OCh .text:00401000       var_8   = dword ptr -8 .text:00401000       var_4   = dword ptr -4 .text:00401000              push    ebp .text:00401001              mov     ebp, esp .text:00401003              sub     esp, 18h .text:00401006              push    8 .text:00401008              call    ??2@YAPAXI@Z ; Operator new(uint) .text:0040100D              add     esp, 4 .text:00401010              mov     [ebp + var_8], eax .text:00401013              cmp     [ebp + var_8], 0 .text:00401017              jz      short loc_401026 .text:00401019              mov     ecx, [ebp + var_8] .text:0040101C              call    sub_401070 .text:00401021              mov     [ebp + var_14], eax .text:00401024              jmp     short loc_40102D .text:00401026  loc_401026:                    ; CODE XREF: _main + 17↑j .text:00401026              mov     [ebp + var_14], 0 .text:0040102D  loc_40102D:                    ; CODE XREF: _main + 24↑j .text:0040102D              mov     eax, [ebp + var_14] .text:00401030              mov     [ebp + var_4], eax .text:00401033              mov     ecx, [ebp + var_4] .text:00401036              mov     edx, [ecx] .text:00401038              mov     ecx, [ebp + var_4] .text:0040103B              call    dword ptr [edx] .text:0040103D              mov     eax, [ebp + var_4] .text:00401040              mov     [ebp + var_10], eax .text:00401043              mov     ecx, [ebp + var_10] .text:00401046              mov     [ebp + var_C], ecx .text:00401049              cmp     [ebp + var_C], 0 .text:0040104D              jz      short loc_40105E .text:0040104F              push    1 .text:00401051              mov     ecx, [ebp + var_C] .text:00401054              call    sub_4010CO .text:00401059              mov     [ebp + var_18],   eax .text:0040105C              jmp     short loc_401065 .text:0040105E loc_40105E:                     ; CODE XREF: _main +  4D↑j .text:0040105E              mov     [ebp + var_18],   0 .text:00401065 loc_401065:                     ; CODE XREF: _main + 5C↑j .text:00401065              xor     eax, eax .text:00401067              mov     esp, ebp .text:00401069              pop     ebp .text:0040106A              retn .text:0040106A _main        endp

The structure of this listing must already be well known to you. However, I'd like to draw your attention to certain issues that have not been covered in sufficient detail yet. On the basis of all previous material, it is clear that the sub_401070 function is the constructor. This is confirmed first by the constructor's proximity to the new operator. Also, important is that after execution of the new operator the check is carried out if the memory has actually been allocated. Consider all listings provided in Section 3.2.4 more closely. You'll see that such checks appear only with the introduction of virtual functions. However, you already know that if virtual functions are present, the compiler creates a constructor even if it wasn't present in the source code of the program. This probably is the main indication of the constructor call, because the compiler won't allow a constructor to execute if the object hasn't been created.

Consider the end of the function. The call to sub_4010C0 immediately attracts attention. Earlier, in similar programs, there was the call to the j__free function, which was associated with the delete operator (see Listing 3.99 and the comments that follow it). In this case, you are dealing with the destructor, which will be considered in detail in Listing 3.116. Note that the compiler again won't allow the destructor to execute if the object hasn't been created.

As relates to the remaining code, hopefully you'll easily recognize the call dword ptr [edx] command as a call to the A1 -> pa() virtual function. This mustn't cause you any problems after you consider all previously provided examples.

Now, consider the contents of the constructor (Listing 3.113).

Listing 3.113: Disassembled code of the constructor (Listing 3.111)

 .text:00401070  sub_401070  proc near           ; CODE XREF: _main + 1C↑p .text:00401070        var_4 = dword ptr -4 .text:00401070              push    ebp .text:00401071              mov     ebp, esp .text:00401073              push    ecx .text:00401074              mov     [ebp + var_4], ecx .text:00401077              mov     eax, [ebp + var_4] .text:0040107A              mov     dword ptr [eax], offset off_40710C .text:00401080              mov     ecx, [ebp + var_4] .text:00401083              mov     dword ptr [ecx + 4], 1 .text:0040108A              push offset aConstructorA ; "Constructor A\n" .text:0040108F              call    _printf .text:00401094              add     esp, 4 .text:00401097              mov     eax, [ebp + var_4] .text:0040109A              mov     esp, ebp .text:0040109C              pop     ebp .text:0040109D              retn .text:0040109D  sub_401070  endp

You must rejoice when viewing this listing. Consider the well-known code fragment (Listing 3.114).

Listing 3.114: Writing the virtual functions table address into the new object instance

 mov     [ebp + var_4], ecx mov     eax, [ebp + var 4] mov     dword ptr [eax], offset off_40710C

This is nothing other than writing the address of the virtual functions table into the newly-created object instance. Then there is another easily recognizable pattern (Listing 3.115).

Listing 3.115: Code fragment demonstrating the code of the constructor

 ; Set the property value (a = 1). mov     dword ptr [ecx + 4], 1 ; The call to the printf function push    offset aConstructorA ; "Constructor A\n" call    _printf

This is the code that was placed into the constructor (see Listing 3.111). There isn't anything to add here because the entire pattern is clear.

Listing 3.116 presents the disassembled code of the destructor (or, to be precise, the procedure, from which the true destructor will be called) from the program shown in Listing 3.111.

Listing 3.116: Disassembled code of the procedure calling the destructor (Listing 3.111)

 .text:004010C0  sub 4010C0 proc near        ; CODE XREF: _main + 54↑p .text:004010C0       var_4 = dword ptr -4 .text:004010C0       arg_0 = dword ptr  8 .text:004010C0             push    ebp .text:004010C1             mov     ebp, esp .text:004010C3             push    ecx .text:004010C4             mov     [ebp + var_4], ecx .text:004010C7             mov     ecx, [ebp + var_4] .text:004010CA             call    sub_4010F00 .text:004010CF             mov     eax, [ebp + arg_0] .text:00401OD2             and     eax, 1 .text:00401OD5             jz      short loc_4010E3 .text:004010D7             mov     ecx, [ebp + var_4] .text:004010DA             push    ecx .text:004010DB             call    j_free .text:004010EO             add     esp, 4 .text:004010E3  loc_4010E3:               ; CODE XREF: sub_4010CO + 15↑j .text:004010E3             mov     eax, [ebp + var_4] .text:004010E6             mov     esp, ebp .text:004010E8             pop     ebp .text:004010E9             retn    4 .text:004010E9  sub_4010CO endp

First, pay attention to the following two calls in Listing 3.116: sub_4010F0 and j__free. View sub_4010F0, and you'll immediately notice that it simply contains the text that was placed into the destructor (see Listing 3.111). Thus, the sub_4010C0 function is intended for calling procedures required to destroy an object. You have already encountered the j__free function, and you already know that it is simply an implementation of the delete operator. Finally, the value 1 sent to the sub_4010C0 function as a parameter is simply an indication that it is necessary to call the delete operator.

3.2.5. More About Executable Code Investigation

Mathematical Computations

You have encountered mathematical computations many times. Compilers often use different techniques, such as computation at compile time, to reduce the code size or speed up program execution. You also already know about the possibility of using FPU commands. In this section, several other techniques will be covered. First, consider the example program shown in Listing 3.117.

Listing 3.117: Sample program containing a simple numeric computation

 #include <stdio.h> #include <windows.h> void main() {        DWORD a, b, c;        scanf("%d", &a);        scanf("%d", &b);        c=((a + b)/8)*(3*a);        printf("%d\n", c); };

Listing 3.117 provides a program that contains a simple arithmetic computation. Consider how the Microsoft Visual C++ compiler treats this program when the "create fast code" option was specified at compile time. The disassembled code of the compiled program is shown in Listing 3.118.

Listing 3.118: Disassembled code of the program shown in Listing 3.117

 .text:00401000  _main      proc near        ; CODE XREF: start + 16E↓p .text:00401000       var_8 = dword ptr -8 .text:00401000       var_4 = dword ptr -4 .text:00401000             sub     esp, 8 .text:00401003             lea     eax, [esp + 8 + var_8] .text:00401006             push    eax .text:00401007             push    offset unk_408100 .text:0040100C             call    _scanf .text:00401011             lea     ecx, [esp + 10h + var_4] .text:00401015             push    ecx .text:00401016             push    offset unk_408100 .text:0040101B             call    _scanf .text:00401020             mov     ecx, [esp + 18h + var_8] .text:00401024             mov     edx, [esp + 18h + var_4] .text:00401028             lea     eax, [edx + ecx] .text:00401025             shr     eax, 3 .text:0040102E             imul    eax, ecx .text:00401031             lea     eax, [eax + eax*2] .text:00401034             push    eax .text:00401035             push    offset unk_4080FC ,text:0040103A             call    _printf .text:0040103F             xor     eax, eax .text:00401041             add     esp, 20h .text:00401044             retn .text:00401044 _main      endp

Note that stack variables in this fragment are addressed using the ESP register. Thus, the var_4 and var_8 stack variables correspond to the a and b variables (see Listing 3.117). Consider the sequence of commands in Listing 3.119.

Listing 3.119: Code pattern using lea for optimization of mathematical computations

 mov    ecx, [esp + 18h + var_8] mov    edx, [esp + 18h + var_4] lea    eax, [edx + ecx]

The main issue here is the use of the lea command. Uncommon properties of this command were already covered in Chapter 1 (see Table 1.2). Now you'll see this command in action. You'll usually encounter this command when it is necessary to optimize arithmetic operations.

Then there is the shr eax, 3 command. You already know that it is simply an integer divided by eight (two raised to the power of three). This command is executed much faster than IDIV. Later, there is the imul eax, ecx command that multiplies the contents of EAX by the contents of ECX. Finally, you encounter the lea command again: lea eax, [eax + eax*2]. This command stands for multiplication of the contents of the EAX register by three.

At this point, the section on mathematical computations has been completed. I hope that you'll easily master on your own other methods of optimizing computations, in particular, using bitwise commands when investigating source code.

Other Constructs

Handling Exceptions

The mechanisms of exception handling at the level of executable code are complicated. Furthermore, details of their implementation might considerably differ for different compilers. A detailed investigation of these mechanisms goes beyond the goals of this book. My goals are more modest: They are to make you acquainted with some basic principles of exception handling and to study how these mechanisms might be reflected in executable code.

Exception handling generated by compilers is based on so-called structured exception handling (SEH). The SEH mechanism is supported at the level of the Windows operating system. The foundation for exception handling is the following: When the operating system of the Windows family runs on an Intel processor, the Fs segment register plays a special role. It points to the thread environment block (TEB).^[12] This block, in turn, contains several substructures, one of which is the thread information block (TIB). TIB is stored in the beginning of TEB. Finally, the 4-byte value located in the beginning of the TIB structure represents the address of some structure that, first, contains the exception handler address and, second, contains the address of the previous structure of the same type. In reality, the problem is the linked list. This topic will be covered in detail later in this chapter (see the comments that follow Listing 3.121).

The following conclusions can be drawn on the basis of the preceding information:

Each thread has an individual exception handler.
Knowing the address, by which the pointer to the exception handler is located, the program can set its own handler procedure.

In essence, these two facts are the foundation for the exception handling mechanisms used by compilers creating executable code intended to run under Windows. Although these are easy to understand, the exception-handling mechanism might be rather complicated.

When describing implementations of the exception handling mechanisms in different compilers, I mean the __try/__except pair of operators implemented in C++.^[13] Consider the example program presented in Listing 3.120. This program is trivial. The __try block is used to avoid division by zero when computing the quotient resulting from division of a by b.

Listing 3.120: Sample program illustrating the exception handling mechanism

 #include  <stdio.h> #include  <windows.h> int main() {        int a, b;        scanf("%d", &a);        scanf("%d", &b);        __try {              a = a/b;              printf("%d\n", a);              }        __except(0)        {               printf("Error 1! \n");        };        return 0; }

The code of this program disassembled by IDA Pro is presented in Listing 3.121. The program was compiled using Microsoft Visual C++ with the "create fast code" option.

Listing 3.121: Disassembled code of the program presented in Listing 3.119

 .text:00401000  _main       proc near            ; CODE XREF: start + 16E↓p .text:00401000       var_20 = dword ptr -20h .text:00401000       var_lC = dword ptr -1Ch .text:00401000       var_18 = dword ptr -18h .text:00401000       var_10 = dword ptr -1Oh .text:00401000       var_4  = dword ptr -4 .text:00401000              push    ebp .text:00401001              mov     ebp, esp .text:00401003              push    OFFFFFFFFh .text:00401005              push    offset byte_408110 .text:0040100A              push    offset __except_handler3 .text:0040100F              mov     eax, large fs:0 .text:00401015              push    eax .text:00401016              mov     large fs:0, esp .text:0040101D              sub     esp, l0h .text:00401020              push    ebx .text:00401021              push    esi .text:00401022              push    edi .text:00401023              mov     [ebp + var_18], esp .text:00401026              lea     eax, [ebp + var 1C] .text:00401029              push    eax .text:0040102A              push    offset aD_0     ; "%d" .text:0040102F              call    _scanf .text:00401034              lea     ecx, [ebp + var_20] .text:00401037              push    ecx .text:00401038              push    offset aD_0     ; "%d" .text:0040103D              call    _scanf .text:00401042              add     esp, 10h .text:00401045              mov     [ebp + var_4], 0 .text:0040104C              mov     eax, [ebp + var_1C] .text:0040104F              cdq .text:00401050              idiv    [ebp + var_20] .text:00401053              mov     [ebp + var_1C], eax .text:00401056              push    eax .text:00401057              push    offset aD       ; "%d\n" .text:0040105C             call    _printf .text:00401061              add     esp, 8 .text:00401064              jmp     short loc_401079 .text:00401066 ;----------------------------------------------------- .text:00401066              xor     eax, eax .text:00401068              retn .text:00401069 ;----------------------------------------------------- .text:00401069              mov     esp, [ebp - 18h] .text:0040106C              push    offset aErrorl  ; "Error 1! \n" .text:00401071              call    _printf .text:00401076              add     esp, 4 .text:00401079  loc_401079:                    ; CODE XREF: _main + 64|j .text:00401079              mov     [ebp+var_4], OFFFFFFFFh .text:00401080              xor     eax, eax .text:00401082              mov     ecx, [ebp + var_10] .text:00401085              mov     large fs:0, ecx .text:0040108C              pop     edi .text:0040108D              pop     esi .text:0040108E              pop     ebx .text:0040108F              mov     esp, ebp .text:00401091              pop     ebp .text:00401092              retn .text:00401092  _main       endp

Pay attention to the presence of the standard function prologue, although in normal conditions standard prologues and epilogues are discarded if compilation was carried out with optimization options. Note that IDA Pro specifies five stack variables even though there were only two of them in the source code of the program. Briefly viewing the code, it is easy to determine (for example, by the calls to scanf) that var_1c corresponds to the a variable and var_20 corresponds to the b variable. Although only part of the function code can be enclosed by the __try block, this block influences the entire function.

The standard prologue is followed by an interesting sequence of commands (Listing 3.122). This code fragment deserves special comments.

Listing 3.122: __try block nesting level and exception handling structures

 push     OFFFFFFFFh               ; var_4 push     offset byte_408110       ; push     offset __except_handler3 ; Address of the new exception handler mov      eax, large fs:0          ; Address of the previous                                   ; handler structure push     eax                      ; var_10 mov      large fs:0, esp          ; Address of the new handler

First, some arbitrary constant is sent into the stack. Later, this value will be changed. This value will determine the nesting level of the __try block. Next, the address of some global memory cell is placed into the stack. The third element is the address of the exception handler formed by the compiler. The last three commands are interesting. The address of the old exception handling structure is placed into the stack, then the address of the entire group composed of 4 double words (this is the new exception handling structure) in the stack is placed into fs: 0. This operation adds a new record to a linked list. This list might contain the entire chain of exception handlers. The last structure in the list must contain the value OFFFFFFFFH in the exception handler address. Also note that the stack contains the old EBP value.

Note the mov [ebp + var_4], 0 instruction. It directly precedes the command that carries out the division operation. This command specifies the nesting level of the __try block. Note that the nesting level is determined by the number of __try blocks in the function, whether one block is nested within another block or not. Trace the content of the var_4 variable because it is the key to the structure of the function's __try blocks.

The value of the ESP register is saved when the stack of the function is formed completely: mov [ebp + var_18], esp. This value is used for restoring the stack in the __except block (see Listing 3.123).

Listing 3.123: Customary sequence that precedes the start of the _except block

 jmp      short loc_401079 xor      eax, eax retn

Without diverting to consider standard Assembly commands and calls to the scanf and printf functions, proceed with considering the __except block. You won't encounter any difficulties locating the starting and ending points of this block, because the starting point of this block is always preceded by the customary sequence of commands shown in Listing 3.123.

These commands will always be present, no matter which optimization variant of the Microsoft Visual C++ compiler is used. The end of the __except block is defined by the jmp command.

If you resort to the Borland C++ v. 5.0 compiler, the external manifestation of exceptions also will be clear and easily recognizable, even though a slightly different exception handling mechanism is used. The first indication is the presence of the exception setting in the beginning of the block procedure (replacement of the stored value with the FS:0 address). The second indication is the presence of a typical fragment in the center of the procedure. There is no jump to that fragment, and the fragment is bypassed by the JMP command. Finally, in the end of the procedure the command restoring the old FS:0 value must be present.

Identifying the Main Function and the Start-up Code

Earlier in this chapter, the main function — the function, from which the program starts — was treated as if there were no problems with determining its address. IDA Pro doesn't have any trouble with determining the address of the main function in the executable code of a program compiled using any C++ compiler. However, situations are possible, in which IDA Pro is not close at hand. Furthermore, there are other programming languages, for which the situation is not as simple as for C++.

Any program written in any algorithmic programming language is made up of a certain set of commands. For C++, these are the main or winMain functions. However, it would be an error to consider that the executable code of the program starts execution exactly from this set of commands. As a rule, the compiler inserts some start-up code before it, which contains the calls to library and API functions. This code carries out some preparations: For instance, it requests the memory from the system, defines the command-line parameters, and obtains the identifier of the executable module. Only after completing this is control passed to the initial commands described before. As I have already mentioned, disassemblers (and debuggers) are not always capable of correctly determining where these commands start.

Note

When speaking about the Microsoft Visual C++ compiler, for a console application the start-up codes begin execution from the mainCRTStartup function, which passes control to the main function. For GUI applications, program execution starts from the WinMainCRTStartup function. After execution, the WinMainCRTStartup function passes control to the winMain function. There also is a technique allowing you to create executable modules that start execution directly from the main or from the winMain function. This allows considerable reduction of the executable module size (the actual size of such a module becomes smaller than the size of standard libraries). However, this is not a common practice, because in this case the programmer would intentionally reject the possibility of using standard C libraries.

To find the start of the user part of the executable code, it is necessary to know how this call is executed. First, consider a console application written in the C++ programming language. In general, the prototype of the main function appears as shown in Listing 3.124.

Listing 3.124: Standard prototype of the main function of a console application in C++

 int main( int argc[ , char *argv[ ] [, char *envp[ ] ] ] );

In general, the main function has three input parameters. This is an important indication. For example, consider a typical call to the main function from the executable code created by the Microsoft Visual C++ compiler (Listing 3.125).

Listing 3.125: Typical main function call from executable code from Microsoft Visual C++

 mov      eax, dword_40A724 mov      dword_40A728, eax push     eax push     dword_40A71C push     dword_40A718 call     _main add      esp, OCh

Note that all three parameters turned out to be global variables. This issue is important. Now, it is necessary to study, which library and API functions are called before and after the call to the main function. Armed with this knowledge, you'll be able to easily find the required location within a program. The call to the GetCorrmandLine API function deserves special mention. Using this function, it is possible to obtain the command-line parameters. These parameters are then passed in the second parameter of the main function. What would happen if the main function doesn't have any parameters or if it has one or two parameters? The answer to this question is as follows: Nothing would be changed, and the call will be carried out in the same way. Thus, search criteria also do not change. However, when dealing with the Borland C++ compiler, the situation is more complicated. The call to the main function is carried out by an indirect call appearing like CALL [ESI + N]. If the disassembler doesn't locate and identify the main function automatically, the code investigator has to use the debugger. However, all previously-mentioned considerations related to the preliminary call to library and API functions (for example, GetCommandILine) are applicable even in this case. Knowing these functions, you'll be able to locate the required code section and then find CALL [ESI + N] or a similar call. In this case, it will also be necessary to use the debugger and breakpoints, because manual analysis of the code aimed at locating the address, by which the call is carried out is a tedious procedure.

Consider the winMain function. The prototype of this function is provided in Listing 3.126.

Listing 3.126: Prototype of the WinMain function

 int WinMain(HINSTANCE hInstance,     HINSTANCE hPrevInstance,     LPSTR 1pCmdLine,     int nCmdShow ); _____________________________________________

As you can see, this function has four parameters. This indication is an important search criterion for searching for the start of the program. Listing 3.127 presents a typical fragment demonstrating the call of the winMain function from the executable module created by the Microsoft Visual C++ compiler.

Listing 3.127: Typical WinMain call from the executable module from Microsoft Visual C++

 Push    eax push    dword ptr [ebp - 20h] push    esi push    esi             ; 1pModuleName call    edi             ; GetMcduleHandleA push    eax call    _WinMain@16     ; WinMain(x, x, x, x) mov     edi, eax mov     [ebp - 2Ch], edi cmp     [ebp - 1Ch], esi jnz     short loc_401508 push    edi             ; int call    _exit

The fragment presented in Listing 3.127 is so typical that it is worth memorizing. This search criterion allows you to find the entry point to the executable code of the program without failure. The call to winMain function in a module created by the Borland C++ compiler is executed using the CALL [ESI + N] command, as was the case for the main function. Also, the GetModuleHandle API function is called before this call.

In Delphi, the execution starts from the main module of the program (BEGIN...END.); however, before this module there are calls to one or more procedures that initialize start-up. In particular, one such procedure calls the GetModuleHandle API function.

^[7]In high-level programming languages, the commonly-adopted practice is to distinguish between procedures and functions. From the standpoint of the disassembled text, there is no difference between these two concepts.

^[8]One parameter of the double type must be interpreted as two 32-bit parameters.

^[9]The number enclosed in parentheses specifies the nesting level.

^[10]Both methods and variables are called class members.

^[11]When creating a new object, the programmer might forget to call the initialization procedure or might call it more than once.

^[12]Recall that in the protected mode, segment registers store selectors instead of addresses. Selectors are numbers in the descriptors table that define addresses.

^[13]This is a standard for the C++ programming language; therefore, all C++ compilers support these operators, although implementations might be different.