3.1. Data Identification

Data identification was covered in Section 1.6; however, there I mainly described Assembly language. Analysis of code written using Assembly language is, on one hand, easier and on the other hand, more difficult than analyzing the code written in some high-level programming language. This task is easier because you are writing the same code that will be placed into the compiled program. This task is more difficult because Assembly language practically doesn't limit the options of the programmer. Thus, everything depends on the programmer's self-discipline and formulated tasks. If your goal is to confuse any potential investigator of your code, you won't be able to find a language better than Assembly. When you are writing a program in some high-level programming language, you can't predict what will result after your source code is compiled. Furthermore, most programmers writing their programs in Visual C++ or Delphi never think what the compiler would produce on the basis of their source code. When analyzing such a code, investigators must solve the following problems:

"Grind" the specific features of compiler operation
"Squeeze" their way through the programmer's working style

This section concentrates on the topic of identifying the data used in high-level programming languages.

3.1.1. Global Variables

There is a common opinion that global variables are harmful for programming. Nevertheless, most programmers always used them in the past, use them now, and will continue to use them in the future. Therefore, mastering the technique of recognizing global variables is a must.

Optimization Influence

Optimization by Execution Speed and Code Size

I'll start investigation of the optimization influence with a simple program written in C++.^[1] This program is presented in Listing 3.1. There are three global variables in this program, one of which is not initialized.

Listing 3.1: Simple C++ program containing three global variables, one uninitialized

 #include <stdio.h> int a, b = 20, s = 0; void main() {         a = 10;         s = a + b;         printf("%d", s); };

Consider what the Microsoft Visual C++ (Visual Studio .NET 2003) compiler would produce out of this program. Load the executable module, compiled using the "no optimization" option, into the IDA Pro disassembler. The disassembled code is presented in Listing 3.2. I hope that you won't have any difficulties studying this disassembled text, which I have followed with brief comments.

Listing 3.2: Disassembled code of the program (Listing 3.1) compiled without optimization

 .text:00401000 _main   proc near                ; CODE XREF: start + 16E↓p .text:00401000         push    ebp .text:00401001         mov     ebp, esp .text:00401003         mov     dword_4086E0, 0Ah ; a = 10 .text:0040100D         mov     eax, dword_4086E0 ; a -> eax .text:00401012         add     eax, dword_408040 ; a + b -> eax .text:00401018         mov     dword_4086E4, eax ; eax -> s .text:0040101D         mov     ecx, dword_4086E4 ; s -> ecx .text:00401023         push    ecx .text:00401024         push    offset unk_4060FC ; Formatted printf string .text:00401029         call    _printf .text:0040102E         add     esp, 8 .text:00401031         xor     eax, eax .text:00401033         pop     ebp .text:00401034         retn .text:00401034 _main   endp

Having carefully analyzed Listing 3.2, you'll immediately note the following interesting issues:

IDA Pro has excellently handled the job of recognizing global variables. This is not surprising. The text contains direct references to global variables (dword_4086EO, dword_4086E4, and dword_408040). Assembly commands directly refer to the variable size. Determining the sizes of variables is an important issue related to disassembling. It is not always possible to determine the variable size exactly. Note that the b (dword_408040) variable is located separately from the other two variables. The compiler considers a (dword_4086EO) and s (dword_4086E4) ^[2] variables uninitialized ones. This topic will be covered in more detail later in this section, when discussing variable size and location (see "Variable Size, Location, and Type").
Even a beginner will immediately note that the compiled text is redundant:
- There are the so-called prologue (PUSH EBP/MOV EBP, ESP) and epilogue (POP EBP) of the function. These will be covered in more detail in Section 3.2.1. Both of these elements are redundant in this function, because the EBP register is used for addressing of the stack variables and parameters, which are not present in this program.
- The a variable is initialized, then it is used in the addition operation. Because its value is not printed and is not further used, it is possible to use a simple constant instead of the a variable.
- Unnecessary memory reservation for the s variable immediately attracts attention. Because the result of addition is loaded into the EAX register, it is most logical to use it as the s variable. In other words, it would be expedient to make s a register variable.

Listing 3.3 presents the disassembled code of the same program (see Listing 3.1) compiled with the "create fast code" option. As you can see, now the code doesn't create any function prologue or epilogue. For the moment, this issue is not the main one.

Listing 3.3: Disassembled code (Listing 3.1) compiled with the "create fast code" option

 .text:00401000 _main   proc near              ; CODE XREF: start + 16E↓p .text:00401000         mov  eax,  dword_408040   ; b -> eax .text:00401005         add  eax, OAh             ; The sum is here. .text:00401008         push eax .text:00401009         push offset unk_4060FC .text:0040100E         mov  dword_4086E0,  0Ah   ; 10 -> a .text:00401018         mov  dword_4086E4,  eax   ; eax -> s .text:0040101D         call _printf .text:00401022         add  esp, 8 .text:00401025         xor  eax, eax .text:00401027         retn .text:00401027 _main   endp

Consider how the sum (the s variable) is obtained. The summing is carried out by adding the register content and a constant. This operation is carried out much faster than adding the register content and a variable. Pay special attention to the command grouping. First, the values are pushed onto the stack; then, they are followed by two data exchange commands. This approach is based on the Pentium processor properties. It is known as command pairing. Its main idea is that two commands that satisfy specific predefined conditions are executed in parallel, which means that two commands are carried out as a single command. Thus, the compiler has met some of the optimization requirements.

Note

Contemporary Intel-compatible processors have two pipelines of executing instructions. These pipelines are known as U pipelines and V pipelines. Under certain circumstances, the processor would execute two commands sequentially in different pipelines. As the result, the execution speed would be practically doubled. There are instructions that can be executed only in the U pipeline, and other instructions can be used only in V pipeline. Finally, there are instructions that can be executed in both pipelines. Knowing this, it is possible to group commands to increase the program execution speed as much as possible. Contemporary compilers "know" this processor feature. So, if you encounter an unusual order of instructions in the executable code, you should recall instruction pairing.

Try to optimize by the code size. Disassembling shows that the change in the code size is minimal (compared with that in Listing 3.3): the ADD ESP, 8 command (which takes 3 bytes) is replaced with the POP ECX/POP ECX pair of commands (each command is 1 byte).

Why did I provide all these examples? My goal wasn't to study optimization techniques (this topic deserves a separate book). I simply wanted to prepare you (and provide a certain theoretical background) to perceive that the code you will analyze might be quite unusual as the result of optimization. Nevertheless, in the future I'll explain lots of optimization methods many times.

Note

The examples provided in this section, among other things, demonstrate that trying to provide better optimization than the compiler does (especially as relates to the execution speed) is not a simple job. The test example (see Listing 3.1) considered in this section is simple. The situation will become more complicated with a real-world Assembly program comprising hundreds of commands. Manual optimization of such programs becomes difficult. Thus, in most cases you'll have to rely on the compiler, especially when dealing with such products as Microsoft Visual C++, long famous for its optimization capabilities.

Evaluating Execution Time

When optimizing program code, evaluating the execution time of a specific program fragment becomes the most important issue. The simplest way of achieving this goal is to use two API functions. The first function is QueryPerformanceCounter. Its only argument is the pointer to the LARGE_INTEGER structure. If the function is executed correctly, this structure would store the number of processor clocks elapsed since program start-up. The second function is QueryPerformanceFrequency. Its argument also contains the pointer to the LARGE_INTEGER structure; however, this time the structure contains the clock frequency. Thus, if t1 and t2 stand for the number of clocks elapsed from the start and to the end of the program fragment being investigated, respectively, and fr is the clock frequency, then the number of milliseconds required for executing the given program fragment can be computed by the following formula: (t2 - t1) *1000/fr. This is only a rough evaluation, because in the multitasking environment exact computations of the execution time of the chosen program fragment are out of the question.

Pointers to Global Variables

It is impossible to imagine the C programming language without pointers. Pointers are quintessential of this programming language and determine its fate. Instead operating over a variable, it is possible to operate over the pointer to that variable. To operate over pointers, compilers use indirect addressing. This fact, however, is self-evident. If s is some pointer to data, then the MOV EDX, s command allows the data to be accessed through [EDX]: for example, the MOV EAX, [EDX] command moves a 4-byte value from the data area into the EAX register.

Listing 3.4 demonstrates a sample program, in which one of the global variables is defined by a pointer. The disassembled listing of this program is provided in Listing 3.5.

Listing 3.4: Sample program, in which one global variable is defined using a pointer

 #include <stdio.h> #include <stdlib.h> int a, b = 20; int *s; void main() {         s = (int*)malloc(4);         a = 10;         *s = a + b;         printf("%d", *s);         free(s); };

Listing 3.5: Disassembled code of the program presented in Listing 3.4

 .text:00401000 _main  proc  near               ; CODE XREF: start + 16E↓p .text:00401000        push  ebp .text:00401001        mov   ebp, esp .text:00401003        push  4                  ; Reserve 4 bytes. .text:00401005        call  _malloc .text:0040100A        add   esp, 4             ; Clear the stack. .text:0040100D        mov   dword_4086CO, eax  ; This variable                                                ; contains a pointer. .text:00401012        mov   dword_4086C4, OAh  ; a = 10 .text:0040101C        mov   eax, dword_4086C4  ; a -> eax .text:00401021        add   eax, dword_408040  ; a + b -> eax .text:00401027        mov   ecx, dword_4086CO  ; ECX contains the                                                ; pointer address. .text:0040102D        mov   [ecx], eax         ; The sum is located                                                ; at the address                                                ; referenced by the                                                ; pointer. .text:0040102F        mov   edx, dword_4086CO  ; Pointer -> edx .text:00401035        mov   eax, [edx]         ; Sum -> eax .text:00401037        push  eax                ; The sum is pushed                                                ; into the stack. .text:00401038        push  offset unk_4060FC  ; The formatted string .text:0040103D        call  _printf .text:00401042        add   esp, 8 .text:00401045        mov   ecx, dword_4086CO  ; Pointer -> ecx. .text:0040104B        push  ecx .text:0040104C        call  _free              ; Release the pointer. .text:00401051        add   esp, 4 .text:00401054        xor   eax, eax .text:00401056        pop   ebp .text:00401057        retn .text:00401057        _main endp

Note that Listing 3.5 uses indirect addressing twice (through the ECX and EDX registers). The second case, in which indirect addressing is used (through EDX), looks strange because ECX already contains the address of the s variable. Why use EDX? This is a rhetorical question. After all, I compiled the program having specified that no optimization was needed by the compiler. Thus, the compiler has simply generated one fragment for writing through the pointer and another fragment for reading through the pointer.

What conclusions can be drawn on the basis of this material? Notice that indirect addressing is used when briefly viewing the disassembled code. This means that pointers must be present in the program being investigated.

Global Variables and Constants

Consider an intricate issue: How do you distinguish the address of some global variable from a normal constant?

Consider the example program shown in Listing 3.6. Variables a, b, and c are assigned the values of some numeric constants, then the standard printf library function is used to output them to the console. The C program is correct and unambiguous; it cannot be interpreted incorrectly. However, consider how IDA Pro interprets the executable code of this program (Listing 3.7).

Listing 3.6: Distinction between an address of a global variable and a normal constant

 #include <stdio.h> int a, b, c; void main() {         a = 10;         b = 20;         c = 0x4086d0;         printf("%d %d %d\n", a, b, c); };

Listing 3.7: Disassembled code of the program shown in Listing 3.6

 .text:00401000 _main       proc near          ; CODE XREF: start + 16E↓p .text:00401000             push    ebp .text:00401001             mov     ebp, esp .text:00401003             mov     dword_4086C8, 0Ah .text:0040100D             mov     dword_4086C0, 14h .text:00401017             mov     dword_4086C4, offset unk_4086D0 .text:00401021             mov     eax, dword_4086C4 .text:00401026             push    eax .text:00401027             mov     ecx, dword_4086C0 .text:0040102D             push    ecx .text:0040102E             mov     edx, dword_4086C8 .text:00401034             push    edx .text:00401035             push    offset aDDD  ; "%d %d %d\n" .text:0040103A             call    _printf .text:0040103F             add     esp, 10h .text:00401042             xor     eax, eax .text:00401044             pop     ebp .text:00401045             retn .text:00401045 _main       endp

Consider Listing 3.7, obtained using IDA Pro, more carefully. The dword_4086C8 label is the a variable, dword_4086C0 corresponds to the b variable, and dword_4086C4 stands for the c variable. What does this mean? The dword_4086C4 variable is used to load the address of the unk_4086D0 memory cells. Why? What is the role of these cells? The number 0x4086D0 is simply a constant! However, IDA Pro considered this number to be an address. Strange! It should be pointed out that the unk_ prefix means that the disassembler has doubts and is not sure what is hidden by that address. However, the disassembler's doubts do not matter! In the course of analysis, you must draw an unambiguous conclusion. In this example, the text is simple; therefore, it is not difficult to make the right decision. You must not have any doubts, even though IDA Pro has some. However strange this might seem at first, in this situation the W32Dasm disassembler has done a good job. This is not because of its outstanding capabilities in the field of recognizing addresses and constants. This is because of its lack of such capabilities, which causes this disassembler to interpret everything (or practically everything) as constants.

What would happen if the c variable is assigned the 0x4086c0 value? I hope that you have already guessed. In this case, IDA Pro will obtain additional confirmation that this is an address of some variable. Instead of the mov dword_4086C4, offset unk_4086D0 command, another command would appear in the listing: mov dword_4086C4, offset dword_4086c0. Thus, the disassembler no longer doubts that it is dealing with a variable. However, you know that this is not so. Furthermore, you will easily draw the right conclusion using the disassembled listing.

However, another problem remains. Any disassembler is a program; therefore, it needs a strict criterion that can be implemented algorithmically. In the case being considered, there are no commands that would confirm (or refute) the assumption that it is dealing with an address except for the range. Falling into this range makes a constant a candidate for being an address. What would this range be in the case in question? Everything is straightforward here. First, there is a range of addresses allocated for the data. IDA Pro considers a constant falling into this range one of the indications of a data address. However, there also is the range of code addresses. For example, if a constant is equal to 0x401000, the disassembler would consider that it deals with the address of the _main function. Note that in this case IDA Pro will not "suspect" that the constant represents an address; it would be sure that this is an address.

What conclusion can be drawn on the basis of these considerations? My conclusion is de omnibus dubitandum. In other words, you can never be certain of anything. If the suspicious constant is treated like an address — for instance, using the LEA command — then it is possible to speak about an address with greater certainty. Furthermore, if you notice that the constant is then used in indirect addressing or as a function parameter (which represents an address by definition), then you shouldn't have any doubts.

Variable Size, Location, and Type

Long ago, when MS-DOS was dominant, I encountered in a Pascal manual a statement declaring that the use of 1-byte variables instead of 2-byte ones speeds up program operation. I doubted this statement, so I investigated the Assembly code of such a program. It turned out that the statement was far from true. Has the situation changed? How do 32-bit operating systems behave? Is there any practical advantage in using 1- and 2-byte variables instead of 4-byte ones? Where are variables located, and how can disassemblers determine their size? All of these questions will be answered in this section.

To begin the investigation, recall the material provided in Section 1.1.3. Consider the code fragment shown in Listing 3.8 (it is similar to the one shown in Listing 1.2).

Listing 3.8: Fragment of the test C program for studying variable size, location, and type

 BYTE e = 0xab; WORD c = 0x1234; DWORD b = 0x34567890;

If you view the memory, you will discover that all variables are aligned by a boundary that is a multiple of four. However, it turns out that this alignment is only due to the order, in which these variables were declared. For example, consider code, in which variables are declared in a different order (Listing 3.9).

Listing 3.9: Fragment of Listing 3.8, in which variables are declared in a different order

 WORD c = 0x1234; BYTE e = 0xab; DWORD b = 0x34567890;

In this case, the compiler will place variables in memory so that the first two variables will be located in two neighboring words. The b variable will be aligned by the 4-byte boundary, as in the previous case. There are optimal rules for the alignment of different data types. Table 3.1 outlines information about the alignment of data of different sizes.

Table 3.1: Optimal requirements for alignment of data of different sizes
Data size	Alignment
1 byte	1 (no alignment)
2 bytes	2
4 bytes	4
6 bytes	8
8 bytes	8
10 bytes	16
16 bytes	16

Consider another example (Listing 3.10).

Listing 3.10: Simple example illustrating optimal alignment of data of different sizes

 #include <stdio.h> #include <windows.h> WORD b = 10; BYTE a; DWORD c; void main() {         a = 10;         c = 30;         printf("%d %d %d\n", a, b, c); };

This is a simple example. However, even here there is a particular feature that will be helpful for investigating some patterns of memory allocation for different variables. The a and c variables are not initialized. They are assigned their values directly in the program text. The b variable is initialized. Is there any difference among these variables? As it turns out, there is. Compile the program using the Microsoft Visual C++ compiler, then disassemble the resulting executable module using IDA Pro. Analyze the resulting listings, and you'll find that in IDA Pro all variables will be located in the . data section. However, recall the material provided in Section 1.5.3, where it was explained that initialized variables must be placed into the .data section and uninitialized ones must be added to the .bss section. Curiously, listings produced by IDA Pro clearly show that although all variables are located within the same section, they are placed into different parts of that section: First, there is an initialized variable, then, after a long-enough interval, there are two uninitialized variables. To understand the reason behind such behavior, compile the program with the /Fas command-line option. An intermediate Assembly listing will be generated in the course of compiling. View this listing, and you'll discover an interesting phenomenon: Two segments are present in the listing, one with the _data name (containing an initialized variable) and another one called _bss (containing uninitialized variables). Later, these segments must transform into appropriate sections. However, the compiler knows the names of the _bss and _data segments and later combines them into the single . data section. With all that being so, the data located in the _bss segment always follow the data from the _data segment. To check this statement, write a simple Assembly program containing two data segments (_bss and _data). After linking, only one data section called .data will remain. However, if you slightly change the segment names, for example, replacing _bss with _bss1, then the executable module will have two sections: . data and _bssl (including the underscore character). After checking this statement for the Microsoft Visual C++ compiler, test the behavior of other compilers. In my experiments, compiling the program from Listing 3.10 using Borland C++ v. 5.00 showed that this compiler behaves similarly. In this case, the Assembly code contained two data segments, with the _data and _bss names.

Consider another issue. How is it possible to determine the variable size in the course of disassembling? The generalized answer to this question is as follows: This goal can be achieved by analyzing the commands that operate over a specific variable. This answer is self-evident because the variable behaves in a specific way depending on the operations that are carried out over it. Recall the material provided in Section 1.4, where the format of the Intel microprocessor commands was described. For instance, consider a simple operation that assigns some integer value to a numeric variable. In C, this operation appears, for example, as follows: b = 10. Accordingly, an Assembly command in general will appear as follows: MOV [mem], 10. However, you know that in Assembly such operations require the variable type to be specified explicitly (for example, byte ptr). This requirement is well-grounded. There is a significant difference between placing the number 10 into a WORD variable and placing it into a DWORD variable. Because there is a significant difference in the mathematics, there also must be a difference in the command format.

Consider the complete command codes for the commands assigning values to variables of three types: BYTE, WORD, and DWORD (Listing 3.11).

Listing 3.11: Complete codes of commands assigning values to three types of variables

 C605 C8864000 14             MOV byte ptr [04086C8], 20 66 C705 C8864000 0A00        MOV word ptr [04086C8], 10 C705 C4864000 1E000000       MOV dword ptr [04086C4], 30

Note that MOD R/M bytes for all three commands are identical. The reason is clear: The first operand is an offset for all three commands. Curiously, the code of the command operating over a WORD operand differs by the presence of the 66H prefix from the code of the command operating over the DWORD operand. This prefix specifies that the operand has the WORD type, not the DWORD type. The command, in which the first operand has the BYTE type, has its individual code. Thus, it becomes clear how the disassembler obtains information about the variable size: It simply analyzes the program code.

Until now, floating-point numbers have not been covered. Now it is time to consider them. Consider the program shown in Listing 3.12.

Listing 3.12: Simple program for investigating the behavior of floating-point variables

 #include <stdio.h> #include <windows.h> double s, d; int i; void main() {         s = 0.00;         d = 1.034;         for(i = 0; i < 100; i++)                  s = s + i/d;     printf("%f\n", s); };

As you can see, the program in Listing 3.12 has two double variables. Recall the material provided in Section 1.1.3 — to be precise, in its "Real Numbers" subsection, where floating-point numbers were described. The format of double numbers used in the C++ language corresponds to the format of long floating-point numbers supported by the Intel microprocessor, or, to be precise, by its FPU (see Section 1.2.3). Listing 3.13 contains the disassembled code of the main function from Listing 3.12.

Listing 3.13: Disassembled code of the main function from Listing 3.12

 .text:00401000 _main         proc near   ; CODE XREF: start + 16E↓p .text:00401000         var_8 = qword ptr -8 .text:00401000 .text:00401000              push    ebp .text:00401001              mov     ebp, esp .text:00401003              fld     ds:dbl_408108 .text:00401009              fstp    dbl_40R9D0 .text:0040100F              fld     ds:dbl_408100 .text:00401015              fstp     dbl_40A9C0 .text:0040101B              mov     dword_40A9C8, 0 .text:00401025              jmp     short loc_401034 .text:00401027 loc_401027:               ; CODE XREF: _main + 55↓j .text:00401027              mov     eax, dword_40A9C8 .text:0040102C              add     eax, 1 .text:0040102F              mov     dword_40A9C8, eax .text:00401034 .text:00401034 loc_401034:               ; CODE XREF: _main + 25↑j .text:00401034              cmp     dword_40A9C8, 64h .text:0040103B              jge     short loc_401057 .text:0040103D              fild    dword_40A9C8 .text:00401043              fdiv    dbl_40A9C0 .text:00401049              fadd    dbl_40A9D0 .text:0040104F              fstp    dbl_40A9D0 .text:00401055              jmp     short loc_401027 .text:00401057 ;----------------------------------------------------------------- .text:00401057 .text:00401057 loc_401057:               ; CODE XREF: _main + 3B↑j .text:00401057              fld     dbl_40A9D0 .text:0040105D              sub     esp, 8 .text:00401060              fstp    [esp + 8 + var_8] .text:00401063              push    offset unk_4080FC .text:00401068              call    _printf .text:0040106D              add     esp, 0Ch .text:00401070              xor     eax, eax .text:00401072              pop     ebp .text:00401073              retn .text:00401073 _main        endp

The disassembled listing created by IDA Pro deserves special comments:

For the moment, skip the strange var_8 variable, which will be considered later. Also, skip the function prologue. The four commands that following the prologue are interesting. They represent nothing but the assignment of initial values to the s and d variables. For this purpose, the compiler has reserved places for two floating-point constants (dbl_408108 and dbl_408100) beforehand. Using a sequence of two commands (fld and fstp), the constant is loaded into an appropriate variable (these commands can be found in Table 1.19). Both constants and variables (dbl_40A9D0 and dbl_40A9C0) take 8 bytes, which is quite natural. The next command, resetting the dword_40A9C8 integer variable to zero, is self-evident. It simply assigns an initial value to the loop counter.
Later, there is a jump into the loop body to the loc_401034 label. Before this label, there are three commands, which are intended to increase the loop counter (i++). Therefore, skip these commands the first time. A possible exit from the loop is checked by the cmp dword_40A9C8 and 64h/jge short loc_401057 commands. Naturally, 64h corresponds to 100.
Then, there are four commands whose goal can be guessed by the code of the source program. They correspond to s = s + i/d. The algorithm implemented by these commands is as follows: The fild dword_40A9C8 command loads the integer loop counter into the top of the coprocessor stack, st (0). The next command, fdiv, divides the loop counter by the dbl_40A9C0 variable (this is d). Then, the fadd command adds the division result to the dbl_40A9D0 variable, where the sum will be accumulated. Finally, because the addition result is located in the coprocessor stack, the fstp command is used to place it into the dbl_40A9D0 variable. The coprocessor stack is popped, which means that, for example, the content of ST(1) is moved to ST(0). Later, an unconditional jump returns control to the start of the loop.
Then, there is the call to the printf function. It is necessary to push a floatingpoint number into the stack. This is an instructive technique. The fld dbl_40A9D0 command pushes the computed sum into the coprocessor stack The next command, sub esp, 8, reserves space in the stack for an 8-byte value. This command is equivalent to the two push commands. Then, the fstp [esp + 8 + var_8] command places the sum from the coprocessor stack into the normal stack. The next push command sends the formatted string into the stack.

The case just considered, in which initial values of floating-point variables are stored in constants and then loaded into variables, is practiced by the Microsoft Visual C++ compiler. The Borland C++ compiler uses another technique, which is less illustrative. The disassembled code of the executable module produced by the Borland C++ compiler is shown in Listing 3.14.

Listing 3.14: Disassembled code of the executable module produced by Borland C++

 .text:0040111B             mov      dword ptr dbl_40C2C4, 95810625h .text:00401127             mov      dword ptr dbl_40C2C4 + 4, 3FF08B43h

As you can see, two strange constants are loaded into the memory. From this listing, it is hardly possible to determine that this is a floating-point number and then to determine that number. When analyzing this code, it is impossible to do without the information provided in Section 1.1.3. The same problem is also encountered in Microsoft Visual C++, provided that you operate over float variables. Such variables are short real numbers taking only 32 bits. Therefore, a normal mov command is used for assigning this type of value to a variable. However, for any operations over such variables, coprocessor commands are used. Thus, I strongly recommend that you gain a sound understanding of the structure of real numbers (Section 1.1.3).

Thus, if you encounter FPU commands, you must immediately understand that it will be necessary to spend time investigating floating-point variables.

When dealing with an integer variable, it is important to discover whether it is signed or unsigned. For example, how would you distinguish int variables from unsigned int (DWORD) ones? The general principle is as follows: Analyze the operations over the variables of interest and, on the basis of this analysis, determine their types. A more specific method of determining the type of integer variables is analysis of the conditional constructs, in which they participate. For example, the JL conditional jump command is used for comparing signed numbers, and the JB command is its analogue used for unsigned numbers.

It only remains to answer a single question: Will any performance gain be obtained if you use integer variables smaller than 4 bytes? The answer to this question consists of the following issues:

Using shorter variables allows you to economize on memory.
However, it complicates the algorithm in the compiled code, because 32-bit variables must be used in the program anyway. Complication of the algorithm slows down the execution and results in the growth of the required memory.

Complex Data Types

Strings

Programming languages interpret string data types as sequences of encoded characters. As a rule, ASCII encoding is used. When using this type of encoding, 1 byte is allocated for encoding each character. Nowadays, Unicode encoding is gaining popularity. When using this type of encoding, 2 bytes are allocated for encoding a single character.

Strings look much like arrays. The difference between them is that the string structure contains information that can be used to easily determine its length. There are two different approaches to solving this problem.

The end of the string must be marked in some way. Some specific code can be used for this purpose, made up of 1 or more bytes. In C, the NULL (zero) code is traditionally used for this purpose (it should not be confused with the 0 character). When using Unicode, strings are terminated by two characters with the zero code. In addition, some contemporary compilers can terminate strings with an entire sequence of seven 0 bytes, thus adapting strings for processing in double-word blocks. Taking into account growing memory resources, this approach doesn't seem too wasteful. This mechanism is characterized by the following two drawbacks:
- To discover the string length, it is necessary to view the entire string, no matter how long it might be. Furthermore, all string operations must be based on checking for the presence of the string terminating character, which makes these operations somewhat slower.
- When this approach is used, 0 bytes cannot be used directly within a string.
Information about the string length (or about its end) must be stored somewhere within the string. Using the starting bytes of the string for this purpose is a natural approach. For example, this approach is used in Pascal and in Delphi. This might be only a single byte, in which case the string might not be longer than 255 characters. In Delphi, however, it is possible to create strings with a 4-byte length field. In this case, the maximum possible string length is comparable to the amount of the address space allocated to a process under the Windows operating system.

In addition to the two preceding approaches, it is possible to use a combined approach. In this case, the string length is specified before the string but the string terminator marks its end. This approach is convenient for compatibility. However, because of its redundancy, it is a constant source of headaches for programmers. ^[3]

Note

Programmers with experience in MS-DOS programming, certainly, would immediately recall function 9 of the int 21h interrupt, using which it is possible to output a character string to the screen. This system procedure used the dollar sign ($) as a terminator. This terminator is inconvenient and was moved out of use long ago.

To begin investigation of the string data type, consider a simple example of using Unicode strings (Listing 3.15).

Listing 3.15: Simple example illustrating the use of Unicode strings

 #include <stdio.h> wchar_t  s[] = L"Hello, programmer!"; wchar_t  f[] = L"%s\n"; void main() {     wprintf(f, s); };

Recall that wchar_t specifies the Unicode string type, L stands for the macro converting an ASCII string to a Unicode string, and wprintf is the function for console output of Unicode strings (an analogue of the printf function used for console output of ASCII strings). Note that the format string (f) for the wprintf function also must be in Unicode encoding. Consider how IDA Pro disassembles the call to the wprintf function (Listing 3.16).

Listing 3.16: Disassembled listing of the call to the wprintf function

 .text:00401003       push  offset aHelloProgramme ; "Hello, programmer!" .text:00401008       push  offset aS              ; "%s\n" .text:0040100D       call  _wprintf

This is great, isn't it? IDA Pro has done an excellent job recognizing a Unicode string. Here are these strings as they appear in the data section (Listing 3.17).

Listing 3.17: Unicode strings from Listing 3.15 as they appear in the data section

 .data:00409040       aHelloProgramme:           ; DATA XREF: _main + 3↑o .data:00409040       unicode 0, <Hello, programmer!>, 0

If desired, you can press the <A> key to convert this string into the sequence of ASCII characters. You'll then discover that the codes of ASCII characters belonging to the range from 0 to 127 are converted to Unicode without changes by adding a most significant 0 byte (complementing a byte with a word). Thus, conversion of an English text from ASCII to Unicode is a trivial task.

The next example (Listing 3.18) relates to Delphi. ^[4]

Listing 3.18: Example illustrating the use of Delphi strings

 var   sl:widestring;   s2:string; {by default this is an AnsiString}   s3:shortstring; begin   s1 := 'Hello world!';   s2 := 'Hello programmers!';   s3 := 'Hello hackers!';   writeln(sl);   writeln(s2);   writeln(s3); end.

The program in Listing 3.18 uses three types of strings available in Delphi. What would you see when analyzing the disassembled code produced by IDA Pro? What could be more interesting than programming, except for investigation of the executable code?

Compile this program, load it into IDA Pro, and analyze it automatically. Then, try to find the strings of interest in the Strings window. Strangely, only the Hello world! string can be found there. Hope remains that other strings are near, so you'd be able to find them quickly. This hope is not vain. Here is the code fragment that you needed (Listing 3.19).

Listing 3.19: Code fragment containing strings from the program in Listing 3.18

 CODE:0044CC4D        align 10h CODE:0044CC50        dd 18h CODE:0044CC54 aHelloWorld: CODE:0044CC54                        ; DATA XREF: sub_44CBAC + 21↑o CODE:0044CC54        uniccde 0, <Hello world!>, 0 CODE:0044CC6E        align 10h CODE:0044CC70        dd OFFFFFFFFh, 12h CODE:0044CC78 aHelloProgramme db 'Hello programmers!', 0 CODE:0044CC78                        ; DATA XREF: sub_44CBAC + 30↑o CODE:0044CC8B        align 4 CODE:0044CC8C dword_44CC8C    dd 6C65480Eh, 68206F6Ch, 656B6361h, 217372h CODE:0044CC8C                        ; DATA XREF: sub_44CBAC + 3A↑o

Very well! The disassembler has recognized the s2 string (Listing 3.18). It hasn't placed it into the Strings window; however, this is a minor drawback. It would be interesting to find out what is located at the 0044CC8C address, because the reference to that block from the program code is also present. Move the cursor to that string and press the <A> key (it is also possible to use the Options | Ascii string style menu commands and click the Pascal style button in the dialog box that would appear on the screen. Then the wonder would happen (Listing 3.20).

Listing 3.20: Fragment of the disassembled test program (Listing 3.18) with the s3 string

 CODE:0044CC8C aHelloHackers       db 14, 'Hello hackers!' CODE:0044CC8C                       ; DATA XREF: sub_44CBAC + 3A↑o CODE:0044CC9B                     db 0

As you can see, the third string also has been discovered. Why didn't the disassembler find it immediately? To all appearances, the cause lies in the byte with the 14 value, to which the reference was pointing. This is the string length byte. However, the disassembler, when analyzing the reference, considered that because this is the start of the string, then the text cannot contain a character with the code 14. In principle, this assumption was correct; however, the disassembler never guessed that this is the string length byte.

Thus, it becomes possible to draw conclusions. In case of a short string (shortstring), the reference points to the string length byte. By the way, pay attention that the string is terminated by the NULL character, which is not taken into account when computing the string length (which is correct).

Now consider two other strings. The string located at the 0044CC78 address also is null-terminated. Note that the reference again points to the start of the string and the string is null terminated. What about the string length? This issue is interesting. The string is preceded by two 4-byte values. The 12h number specifies the string length. As you can see, 4 bytes are allocated for the string length. However, the string structure includes 4 more bytes. This is the so-called reference count. Thus, for strings of this type the reference points directly to the string contents. The text information itself is preceded by 8 bytes of auxiliary information.

The last string type is Unicode. The Unicode string starts at the 0044CC54 address. In contrast to the previous case, the string structure includes a 4-byte length, but there is no reference count. In this case, the reference from the program code points to the string contents. The disassembler has located this string because of this. The string is terminated by two 0 bytes.

To conclude the discussion of strings, consider the simple test program shown in Listing 3.21. Compile this program using Microsoft Visual C++.

Listing 3.21: Simple C program illustrating string operations

 #include <stdio.h> #include <string.h> char s[] = "Good-bye!"; void main() {        strcat(s," My love!");        printf("%s\n", s); }

The disassembled code of the program presented in Listing 3.21 is shown in Listing 3.22.

Listing 3.22: Disassembled code of the program shown in Listing 3.21

 .text:00401000 _main        proc near          ; CODE XREF: start + 16E↑p .text:00401000              push    ebp .text:00401001              mov     ebp, esp .text:00401003              push    offset aMyLove  ; char * .text:00401008              push    offset aGoodBye ; char * .text:0040100D              call    _strcat .text:00401012              add     esp, 8 .text:00401015              push    offset aGoodBye ; "Good-bye!" .text:0040101A              push    offset aS       ; "%s\n" .text:0040101F              call    _printf .text:00401024              add     esp, 8 .text:00401027              xor     eax, eax .text:00401029              pop     ebp .text:0040102A              retn .text:0040102A _main        endp

Listing 3.22 is easy and is not worth special comments. It should only be mentioned that both strings are excellently recognized by IDA Pro.

Introduce a small modification into the program shown in Listing 3.21. Make the s variable local by moving its definition into the main function. After compiling the program and disassembling its code, you'll obtain an unusual disassembled code (Listing 3.23).

Listing 3.23: Disassembled code of the modified program (Listing 3.21)

 .text:00401000 _main        proc near          ; CODE XREF: start + 16E↑p .text:00401000 var_C        = byte ptr -0Ch .text:00401000 var_8        = dword ptr -8 .text:00401000 var_4        = word ptr -4 .text:00401000 .text:00401000              push    ebp .text:00401001              mov     ebp, esp .text:00401003              sub     esp, OCh .text:00401006              mov     eax, ds:dword_4060FC .text:0040100B              mov     dword ptr [ebp + var_C], eax .text:0040100E              mov     ecx, ds:dword_406100 .text:00401014              mov     [ebp + var_8], ecx .text:00401017              mov     dx, ds:word_406104 .text:0040101E              mov     [ebp + var_4], dx .text:00401022              push    offset aMyLove               ; char * .text:00401027              lea     eax, [ebp + var_C] .text:0040102A              push    eax                          ; char * .text:0040102B              call    _strcat .text:00401030              add     esp, 8 .text:00401033              lea     ecx, [ebp + var_C] .text:00401036              push    ecx .text:00401037              push    offset aS                    ; "%s\n" .text:0040103C              call    _printf .text:00401041              add     esp, 8 .text:00401044              xor     eax, eax .text:00401046              mov     esp, ebp .text:00401048              pop     ebp .text:00401049              retn .text:00401049 _main        endp

Consider Listing 3.23 more carefully. The code is unusual. The disassembler has determined only one string (a literal). However, the first parameter of the strcat function is the address of the string that the disassembler failed to locate. This can be stated doubtlessly because strcat is a well-known library function. However, what about commands ranging from the 00401006 to the 0040101E address? What do they mean? They move 10 bytes of data into the stack area (recall that the string must be stored in the stack). At the same time, the string in question is exactly 10 bytes in size (taking into account the 0 byte). Thus, it is an intricate method used by the compiler to pass the string from the data section to the stack area. Consider the memory address 004060FC, from which the block passed into the stack starts. Here is this block (Listing 3.24).

Listing 3.24: Memory block passed to the stack

 .rdata:004060FC dword_4060FC   dd 646F6F47h   ; DATA XREF: _main + 6↑r .rdata:00406100 dword_406100   dd 6579622Dh   ; DATA XREF: _main + E↑r .rdata:00406104 word_406104    dw 21h         ; DATA XREF: _main + 17↑r

Press the <A> key and convert the block to the ASCII format. After that, the "lost" string will be found. The conclusion is easy and straightforward: The disassembler failed to locate one of the strings because the compiler treated it simply as a block of data.

Arrays

As shown in the previous section, although strings have a structure that allows you to determine the data size, even such a powerful disassembler as IDA Pro is not always capable of recognizing a string, to speak nothing about arrays. This is because the array size is not explicitly specified in the structure. There are difficulties related to determining the array size. However, arrays can be clearly identified. Consider a simple example. In the program shown in Listing 3.25, an integer array is filled with integer numbers ranging from zero to nine. After compiling this program using Microsoft Visual Studio and loading the executable code into IDA Pro, the disassembled code shown in Listing 3.26 will be obtained.

Listing 3.25: Simple C program for investigating array identification in the executable code

 #include <stdio.h> int a[10]; void main() {         for(int i = 0; i < 10; i++) a[i] = i; };

Listing 3.26: Disassembled code of the program shown in Listing 3.25

 .text:00401000 _main        proc near          ; CODE XREF: start + 16E↑p .text:00401000              var_4  = dword ptr - 4 .text:00401000              push    ebp .text:00401001              mov     ebp, esp .text:00401003              push    ecx .text:00401004              mov     [ebp + var_4], 0 .text:0040100B              jmp     short loc_401016 .text:0040100D loc_40100D:                     ; CODE XREF: _main + 29↑j .text:0040100D              mov     eax, [ebp+var_4] .text:00401010              add     eax, 1 .text:00401013              mov     [ebp + var_4], eax .text:00401016 loc_401016:                     ; CODE XREF: _main + B↑j .text:00401016              cmp     [ebp + var_4], OAh .text:0040101A              jge     short loc_40102B .text:0040101C              mov     ecx, [ebp + var_4] .text:0040101F              mov     edx, [ebp + var_4] .text:00401022              mov     dword_4072C0[ecx*4], edx .text:00401029              jmp     short loc_40100D .text:0040102B loc_40102B:                     ; CODE XREF: _main + 1A↑j .text:0040102B              xor     eax, eax .text:0040102D              mov     esp, ebp .text:0040102F              pop     ebp .text:00401030              retn .text:00401030 _main endp

You encountered the method of loop organization shown in Listing 3.13. As you have certainly guessed, var_4 is nothing but the stack variable — the loop counter. Pay special attention to the mov dword_4072C0 [ecx*4], edx command, which is the key to understanding the operating logic of this program. There is no doubt that this is an array: dword_4072C0 is the start of this array, ecx contains the current index value, and the scaling coefficient equal to four indicates that each element of this array is 4 bytes in size. The array size in this program can be clearly identified. However, you should not rely on the assumption that the number of array elements is always determined by the number of iterations in the loop that processes this array. The programmer might use different parts of the array in different sections of the program. With all this being so, these fragments of the array must not begin from the starting point of that array. Thus, with high probability it is possible to state that the array size is no less than the specified value.

Some problems might arise when using arrays in functions. The argument accepted by the function is simply a pointer. This pointer might be passed farther through a sequence of functions. Assume that in the last function you see some parameter used as a pointer to an array. To locate that array, you'll have to traverse the entire sequence of functions in the reverse direction, which would require time and patience. In such situations, it is better to use the debugger, set a breakpoint to the function where the pointer behaves like a pointer to an array, and obtain the value of that pointer. Having accomplished this, it is necessary to return to disassembler, locate the required array at the address determined using the debugger, and find cross-references from the program code to that array. After that, it will be possible to continue analysis of the executable code.

Structures

A structure is a generalization of an array. In contrast to arrays, which are made up of the elements of the same type, structures can comprise elements of different types. As with arrays, structure elements are accessed on the basis of the base address, which defines the starting point of the structure instance. However, the problem is more complicated than with arrays. Sometimes, it is difficult to make sure that data items of different types belong to the same structure. Consider a C program illustrating the behavior of structures (Listing 3.27).

Listing 3.27: Sample program for investigating the behavior of structures

 #include <stdio.h> #include <windows.h> struct a {         char s[10];         BYTE b;         int i; }; a al; void main() {        for(int j = 0; j < 10; j++) a1.s[j] = 'A';        al.b = 10;        al.i = 10000; };

Compile this program using the Microsoft Visual C++ compiler, then disassemble the executable code using IDA Pro. The disassembled text of this program is shown in Listing 3.28.

Listing 3.28: Disassembled text of the program shown in Listing 3.27

 .text:00401000 _main        proc near        ; CODE XREF: start + 16E↑p .text:00401000              var_4 = dword ptr -4 .text:00401000              push    ebp .text:00401001              mov     ebp, esp .text:00401003              push    ecx .text:00401004              mov     [ebp + var_4], 0 .text:0040100B              jmp     short loc_401016 .text:0040100D loc_40100D:                   ; CODE XREF: _main + 26↑j .text:0040100D              mov     eax, [ebp + var_4] .text:00401010              add     eax, 1 .text:00401013              mov     [ebp + var_4], eax .text:00401016 loc_401016:                   ; CODE XREF: _main + B↑j .text:00401016              cmp     [ebp + var_4], 0Ah .text:0040101A              jge     short loc_401028 .text:0040101C              mov     ecx, [ebp + var_4] .text:0040101F              mov     byte_4072C0[ecx], 41h .text:00401026              jmp     short loc_40100D .text:00401028 loc_401028:                   ;  CODE XREF: _main + 1A↑j .text:00401028              mov     byte_4072CA, 0Ah .text:0040102F              mov     dword_4072CC, 2710h .text:00401039              xor     eax, eax .text:0040103B              mov     esp, ebp .text:0040103D              pop     ebp .text:0040103E              retn .text:0040103E _main        endp

Carefully consider the text shown in Listing 3.28. In this text, you will encounter three different types of data determined by the following pointers: byte_4072C0 (array), byte_4072CA (byte), and dword_4072CC (double word). At the same time, there are no clear indications that these variables must be joined into the same structure. This is of no importance in the current context. Hence, the program must contain operations that would disclose the structure as an integral entity.

Consider the program shown in Listing 3.29. As you can see, the a structure is the parameter of the init procedure. Then consider how this situation is reflected in the program's executable code (Listing 3.30). This program is artificial because the structure passed to the function is not used and is not passed back.

Listing 3.29: Behavior of the structure passed to some function as a parameter

 #include <stdio.h> #include <windows.h> struct a {         char s[10];         BYTE b;         int i; }; a al; void init(a); void main() {         init(al); }; void init(a c) {         for(int j = 0; j < 10; j++) c.s[j] = 'A';         c.b = 10;         c.i = 10000; };

Listing 3.30: Disassembled text of the main function of the program shown in Listing 3.29

 .text:00401000 main        proc near         ; CODE XREF: start + 16E↑p .text:00401000             push    ebp .text:00401001             mov     ebp, esp .text:00401003             sub     esp, 10h .text:00401006             mov     eax, esp .text:00401008             mov     ecx, dword_4072C0 .text:0040100E             mov     [eax], ecx .text:00401010             mov     edx, dword_4072C4 .text:00401016             mov     [eax + 4], edx .text:00401019             mov     ecx, dword_4072C8 .text:0040101F             mov     [eax + 8], ecx .text:00401022             mov     edx, dword_4072CC .text:00401028             mov     [eax + OCh], edx .text:0040102B             call    sub_401040 .text:00401030             add     esp, 10h .text:00401033             xor     eax, eax .text:00401035             pop     ebp .text:00401036             retn .text:00401036 _main endp

Listing 3.30 presents the disassembled code of the main function of the program in Listing 3.29. The sub_401040 procedure, the call to which is carried out by 0040102B, is the init function. The lines of code preceding this procedure are of great interest. Pay special attention to the sub esp, 10h command. It is the equivalent of four PUSH commands. However, note that the size of the structure under consideration is exactly 16 bytes. After the command allocating the space in the stack is the mov eax, esp command. Thus, the EAX register points to the start of the stack area. This stack area is filled with the data. The impression is that you are dealing with 4 double words. IDA Pro has come to the same conclusion. That 16 bytes are allocated simultaneously (the structure length is exactly 15 bytes, but taking into account that the i field is aligned by the 4-byte boundary, the result is 16) must make you vigilant. Nevertheless, this alone doesn't prove anything. To discover what was passed to the function, it is necessary to analyze the code of that function (Listing 3.31).

Listing 3.31: Disassembled text of the init function (Listing 3.29)

 .text:00401040 sub_401040    proc near     ; CODE XREF: _main + 2B↑p .text:00401040       var_4  = dword ptr -4 .text:00401040       arg_0  = byte  ptr   8 .text:00401040       arg_A  = byte  ptr   12h .text:00401040       arg_C  = dword ptr  14h .text:00401040              push    ebp .text:00401041              mov     ebp, esp .text:00401043              push    ecx .text:00401044              mov     [ebp + var_4], 0 .text:0040104B              jmp     short loc_401056 .text:0040104D loc_40104D:                 ; CODE XREF: sub_401040 + 24↓j .text:0040104D              mov     eax, [ebp + var_4] .text:00401050              add     eax, 1 .text:00401053              mov     [ebp + var_4], eax .text:00401056 loc_401056:                 ; CODE XREF: sub_401040 + B↑j .text:00401056              cmp     [ebp + var_4], OAh .text:0040105A              jge     short loc_401066 .text:0040105C              mov     ecx, [ebp + var_4] .text:0040105F              mov     [ebp + ecx + arg_0], 41h .text:00401064              jmp     short loc_40104D .text:00401066 loc_401066:                 ; CODE XREF: sub_401040 + lA↑j .text:00401066              mov     [ebp + arg_A], 0Ah .text:0040106A              mov     [ebp + arg_C], 2710h .text:00401071              mov     esp, ebp .text:00401073              pop     ebp .text:00401074              retn .text:00401074 sub_401040   endp

Consider the code of the init function (see Listing 3.31). Principally, this text is similar to that provided in Listing 3.28. However, this time, taking into account the analysis of the code of the main function (see Listing 3.30), it is possible to understand its meaning. Thus, 16 bytes were passed to the function (4 times, 4 bytes at a time). The function first processes an array (10 bytes in size), then a 0 byte (arg_0), then a 1-byte value (arg_A), and finally a 4-byte value (arg_C). At this point, it is logical to assume that the object you are dealing with is a structure. What allows you to draw such a conclusion? For instance, the 3 independent (at first glance) double words were sent to the stack and the first 10 bytes are combined to form an array within the procedure can confirm this assumption.

Thus, it is possible to conclude that the structures can be disclosed when they are passed as parameters. However, it is necessary to admit that these considerations are too heuristic to delegate this task to a disassembler. An interesting point here is that the Borland C++ compiler in a similar situation acts in approximately the same way as Microsoft Visual C++. Compile the program presented in Listing 3.29 using the Borland C++ compiler, then disassemble it using IDA Pro. The disassembled fragment of the executable code responsible for calling the Init function is shown in Listing 3.32.

Listing 3.32: Fragment calling Init (compiled by Borland C++ and disassembled by IDA Pro)

 .text:00401108         mov     al, byte_40C2C6 .text:0040110E         shl     eax, 10h .text:00401111         mov     ax, word_40C2C4 .text:00401118         push    eax .text:00401119         push    dword_40C2CO .text:0040111F         push    dword_40C2BC .text:00401125         push    dword_40C2B8 .text:0040112B         call    sub_401134

This fragment is notable by a strange variable — word_40C2C4. Where could such a variable of the WORD type come from? After all, there are no such variables in the program. Nevertheless, the total amount of data passed through the stack is 16 bytes as in the previous case — to be precise, 15 bytes. Is Borland more accurate than Microsoft? This is unlikely.

However, there are situations, in which the disassembler can unambiguously determine that it is dealing with a structure. These are situations, in which structures are used as parameters when calling well-known library or API functions. The code fragment shown in Listing 3.33 demonstrates the call to the RegisterClass API function. I have intentionally provided the code lines preceding this call. These code lines contain commands that fill the WndClass structure, which the disassembler recognizes excellently. It cannot fail to recognize this structure, because its address is the parameter of the well-known API function.

Listing 3.33: Disassembled code showing the call to the RegisterClass API function

 .text:0040104D       mov    [ebp+WndClass.style], 0 .text:00401054       mov    [ebp+WndClass.lpfnWndProc], offset sub_401140 .text:0040105B       mov    [ebp+WndClass.cbClsExtra], 0 .text:00401062       mov    [ebp+WndClass.cbWndExtra], 0 .text:00401069       mov    edx, [ebp + hInstance] .text:0040106C       mov    [ebp + WndClass.hInstance], edx .text:0040106F       push   7F00h           ; lpIconName .text:00401074       mov    eax, [ebp + hInstance] .text:00401077       push   eax             ; hInstance .text:00401078       call   ds:LoadIconA .text:0040107E       mov    [ebp + WndClass.hIccn], eax .text:00401081       push   7F00h           ; lpCursorName .text:00401086       push   0               ; hInstance .text:00401088       call   ds:LoadCursorA .text:0040108E       mov    [ebp + WndClass.hCursor], eax .text:00401091       mov    [ebp + WndClass.hbrBackground], 6 .text:00401098       mov    [ebp + WndClass.lpszMenuName], 0 .text:0040109F       lea    ecx, [ebp + ClassName] .text:004010A2       mov    [ebp + WndClass.lpszClassName], ecx .text:004010A5       lea    edx, [ebp + WndClass] .text:004010A8       push   edx             ; lpWndClass .text:004010A9       call   ds:RegisterClassA

In Listing 3.33, the address of the WndClass structure is counted in relation to the contents of the EBP register, which means that the structure is defined as a stack local variable (see Section 3.1.2). However, the essence of these considerations won't change if you make it a global variable. In this case, the structure is identified because it is used as a parameter.

3.1.2. Local Variables

As a rule, local variables are interpreted as variables defined directly within a procedure or a function. As you know, the stack is used for this purpose. In my opinion, this is only a particular case. I understand local variables widely, not only as variables defined in the stack (they might be called stack variables) but also as temporary variables (local in relation to the program run time) and as variables stored in registers.

Variables Defined in the Stack

Variables defined in the stack (stack variables) were already mentioned several times. The program in Listing 3.34 uses only local variables and two functions: main and add. Note that the add function accepts three arguments and that the first argument is a pointer. The s variable is modified in the add function.

Listing 3.34: Example program illustrating the use of local variables

 #include <stdio.h> int add(int *, int, int); void main() {         int i = 10, s, j;         s = 12; j = 20;         printf("%d\n", add(&s, i, j)); }; int add(int *s1, int i1, int jl) {         int n;         *s1 = *s1 + 10;         n = *s1 + j1 + i1;         return n*n; };

The disassembled text of the main function from Listing 3.34 is presented in Listing 3.35. Note that when compiling the test program, the option preventing optimization was set.

Listing 3.35: Disassembled text of the main function from Listing 3.34

 .text:00401000 _main       proc near        ; CODE XREF: start + 16E↑p .text:00401000        var_C = dword ptr -0Ch .text:00401000        var_8 = dword ptr -8 .text:00401000        var_4 = dword ptr -4 .text:00401000             push    ebp .text:00401001             mov     ebp, esp .text:00401003             sub     esp, 0Ch .text:00401006             mov     [ebp + var_4], 0Ah .text:0040100D             mov     [ebp + var_8], 0Ch .text:00401014             mov     [ebp + var_C], 14h .text:0040101B             mov     eax, [ebp + var_C] .text:0040101E             push    eax .text:0040101F             mov     ecx, [ebp + var_4] .text:00401022             push    ecx .text:00401023             lea     edx, [ebp + var_8] .text:00401026             push    edx .text:00401027             call    sub_401050 .text:0040102C             add     esp, 0Ch .text:0040102F             push    eax .text:00401030             push    offset unk_4060FC .text:00401035             call    _printf .text:0040103A             add     esp, 8 .text:0040103D             xor     eax, eax .text:0040103F             mov     esp, ebp .text:00401041             pop     ebp .text:00401042             retn .text:00401042 _main       endp

Skip the standard function prologue, and look at the sub esp, 0CH command. Here, 12 bytes are reserved for local variables — this is the area between the previous value of the stack pointer (to which the EBP register points) and the new value. This corresponds to three variables (see Listing 3.34). Nevertheless, IDA Pro declares these variables as var_4, var_8, and var_C. What do the _4, _8, and _C suffixes mean? These are addresses where the variables are located in relation to the boundary, from which the area of stack variables starts. The address of this boundary is stored in the EBP register.

Next are the commands for data initialization. Note that there is no difference between variables initialized when declared and variables assigned some values in the program.

Addresses from 0040101B to 00401026 are occupied by the commands that send parameters into the stack for calling the add function. Pay special attention to the var_8 variable, which, doubtlessly, corresponds to the s variable in the program source code. To handle this variable, the lea edx, [ebp + var_8]/push edx commands are used, which means that the address of this variable is sent into the stack. This is natural, because in the program it is explicitly specified that the pointer is passed. However, I'd like to warn you against drawing premature conclusions. Compilers often handle pointers with undue familiarity. For the s variable, the pointer is passed to the function used in the program for modifying the s variable. If this were not so (if the s variable were not modified in the add function), then the compiler would be able to pass the variable to the function. This approach produces the same result, but it is much easier. Thus, two other variables, i (var_4) and j (var_C), are passed into the stack by value.

The result of the function call, which, as expected, is stored in the EAX register (nevertheless, see Section 3.2.1), is passed to the function as a parameter for console output.

It is time to consider the code of the add function. The disassembled text is shown in Listing 3.36.

Listing 3.36: Disassembled text of the add function (Listing 3.34)

 .text:00401050 sub_401050    proc near        ; CODE XREF: _main + 27↑p .text:00401050               var_4  = dword ptr -4 .text:00401050               arg_0  = dword ptr  8 .text:00401050               arg_4  = dword ptr  0Ch .text:00401050               arg_8  = dword ptr  10h .text:00401050               push    ebp .text:00401051               mov     ebp, esp .text:00401053               push    ecx .text:00401054               mov     eax, [ebp + arg_0] .text:00401057               mov     ecx, [eax] .text:00401059               add     ecx, 0Ah .text:0040105C               mov     edx, [ebp + arg_0] .text:0040105F               mov     [edx], ecx .text:00401061               mov     eax, [ebp + arg_0] .text:00401064               mov     ecx, [eax] .text:00401066               add     ecx, [ebp + arg_8] .text:00401069               add     ecx, [ebp + arg_4] .text:0040106C               mov     [ebp+var_4], ecx .text:0040106F               mov     eax, [ebp + var_4] .text:00401072               imul    eax, [ebp + var_4] .text:00401076               mov     esp, ebp .text:00401078               pop     ebp .text:00401079               retn .text:00401079 sub_401050    endp

IDA Pro assigns the function parameter names starting with the arg prefix. Thus, as expected, the function has obtained three parameters: arg_0, arg_4, and arg_8. As in case of the stack variables, offsets 0, 4, and 8 are counted in relation to the content of the EBP register; however, this time the offset is counted downward into the area of higher addresses.

Note that at first glance, no space is reserved in the stack for the var_4 variable (in the program, the name of this variable is n). This issue is an interesting one. Why does the compiler reserve stack space for variables in the main function? To reserve the stack space, the push ecx command is used. This can be easily discovered by checking the stack balance in the beginning and in the end of the procedure. To achieve this, count the number of bytes pushed into the stack in the beginning and popped from the stack in the end. The PUSH command is often used for reserving stack space when there is only one stack variable.

It would be interesting to find the parameter that is a pointer to variable among all function parameters. Here, everything is simple. This parameter was the last to be pushed. Because the stack grows upward, toward lower addresses, this parameter will have the smallest offset in the direction of higher addresses. This will be arg_0. Here is the sequence of commands that discloses this: mov eax, [ebp + arg_0] /mov ecx, [eax] /add ecx, 0Ah. This corresponds to *s1 = *s1 + 10.

All further computations are self-evident. They correspond to n = *s1 + j1 + i1. The imul instruction stands for the n*n operation.

Again, it is necessary to mention the optimization. Optimization can change the program code to such an extent that it becomes impossible to recognize it. This is especially true for Microsoft Visual C++. For instance, try to compile the program (see Listing 3.34) using the "create compact code" option. Before compiling, insert some output operator into the add function — for example, printf ("%d\n", n). Otherwise, the optimizer will do without any function call and replace it with the constant that it computes on its own (yes, this is so ^[5]). Now, consider what would happen to the main function after optimization (Listing 3.37).

Listing 3.37: Disassembled code of the optimized main function

 .text:00401029 _main       proc near        ; CODE XREF: start + 16E↓p .text:00401029        var_4 = dword ptr -4 .text:00401029             push    ebp .text:0040102A             mov     ebp, esp .text:0040102C             push    ecx .text:0040102D             push    14h .text:0040102F             lea     eax, [ebp + var_4] .text:00401032             push    0Ah .text:00401034             push    eax .text:00401035             mov     [ebp + var_4], 0Ch .text:0040103C             call    sub_401000 .text:00401041             push    eax .text:00401042             push    offset unk_4060FC .text:00401047             call    _printf .text:0040104C             add     esp, 14h .text:0040104F             xor     eax, eax .text:00401051             leave .text:00401052             retn .text:00401052 _main       endp

Listing 3.37 is an instructive one. The main issue, to which it is necessary to pay attention in the course of analysis, is that only one stack variable has been defined. It would be desirable to guess, which variable this is, even without viewing the listing. This is the s variable. It is this variable whose contents will be modified in the add function. In other words, s is a variable. However, i and j are not variables; rather, they are in essence constants because they are not modified in the course of program execution. The optimizer treats them accordingly. Instead of allocating stack memory for them, it is possible to simply send numeric constants as parameters to the add function. This goal is achieved by the push 14h and push 0Ah commands. The address of the s variable is sent to the stack: lea eax, [ebp + var_4]/... /push eax.

Also, it is necessary to pay attention to another issue: Memory for the stack variable is allocated using the push ecx command, which can confuse the code investigator. However, the optimizer's main goal in this case is to make the code as compact as possible, and it does its best to achieve this. This also explains why only a single leave command is used to restore the stack in the end of the procedure.

Thus, the following conclusion can be drawn in relation to stack variables: If the value of a stack variable is not changed in the course of program execution, the optimizer can replace it with a constant. This information is not particularly important for a simple analysis of the program's actions. However, in my opinion, for a sound understanding of the program operating logic this issue is important.

Also, it is possible to obtain useful information if you compile the program shown in Listing 3.34 using the Borland C++ v. 5.0 compiler. The result of disassembling the executable code of the main function is shown in Listing 3.38.

Listing 3.38: Disassembled main function from Listing 3.34 compiled using Borland C++ 5.0

 .text:00401108 _main       proc near       ; DATA XREF: .data:0040A0B8↓o .text:00401108       var_4 = dword ptr -4 .text:00401108       argc  = dword ptr  0Ch .text:00401108       argv  = dword ptr  10h .text:00401108       envp = dword ptr  14h .text:00401108            push     ebx .text:00401109            push     esi .text:0040110A            push     ecx .text:0040110B            mov      ebx, 0Ah .text:00401110            mov      [esp + 4 + var_4], 0Ch .text:00401117            mov      esi, 14h .text:0040111C            push     esi .text:0040111D            push     ebx .text:0040111E            lea      eax, [esp + 0Ch + var_4] .text:00401122            push     eax .text:00401123            call     sub_401140 .text:00401128            add      esp, 0Ch .text:0040112B            push     eax .text:0040112C            push     offset format ; Format .text:00401131            call     _printf .text:00401136            add     esp, 8 .text:00401139            pop     edx .text:0040113A            pop     esi .text:0040113B            pop     ebx .text:0040113C            retn .text:0040113C_main       endp

Different compilers are characterized by different styles. For instance, in contrast to Microsoft's compiler, which, just to be on the safe side, resets the EAX register to zero even when the main function is declared as void, Borland's compiler interprets the void type literally, which means that it doesn't pay attention to the contents of the EAX register. Another specific feature of Borland's compiler is that it actively uses the ESI and EBX registers. Note that according to generally adopted conventions, a function must not change the contents of the EBX, EBP, ESP, ESI, and EDI registers; so, Borland's compiler must insert PUSH EBX/PUSH EST commands in the beginning of the function and POP ESI/POP EBX commands in the end of function. I suspect that this is just an inherited legacy. In older Intel processors, the CX and DX registers could not be used for addressing.

Like Microsoft's compiler, Borland's compiler analyzes the text and discovers that the i and j variables are constants in their essence. Therefore, it doesn't reserve the memory in the stack for them and uses constants instead. Stack memory is reserved only for the s variable (var_4). Note that this also is carried out using a single PUSH command (push ecx).

Consider the most interesting issue. Borland's compiler doesn't use the EBP register here; it uses the ESP register instead. This is a well-known optimization technique, so you should know about it. However, you might object: The contents of the ESP register changes. You'd be right. But the compiler does not forget about this; it handles this problem excellently by dynamically tracking all changes of the ESP register and correcting the addressing as appropriate. Look, in the beginning was the mov [esp + 4 + var_4], 0Ch command followed by two PUSH commands. The content of ESP was reduced by eight. Therefore, the compiler uses the lea eax, [esp + 0Ch + var_4] command. Everything is correct, because 4 + 8 = 12 = 0Ch. IDA Pro, fortunately, also understands these issues and specifies the var_4 variable in both commands.

Temporary Variables

What are temporary variables? I consider as such the variables used for storing intermediate results of computations. In the course of computations, the processor registers are widely used. Therefore, it is possible to state that the registers are used as temporary variables. Note that you have already encountered such variables. For example, consider Listing 3.13, and recall how the loop was organized there (the 00401027-0040102F addresses). The EAX register plays the role of temporary variable, which for the time of loop execution stores the loop counter. When using real variables for storing intermediate results, the FPU registers are also used. As a rule, these are the first three registers of the coprocessor: ST(0), ST(1), and ST(2). If you recall Listing 3.13, the comments that follow it emphasized the method of start-up initialization of floating-point variables: The floating-point variable is first loaded into the ST(0) coprocessor register using the FLD command. Then, from the ST(0) register the variable is loaded into the memory area allocated for the floating-point variable (using the FSTP command).

How many registers might be needed if the expression to be computed is a complex one? Simple considerations are as follows: Operations over numeric variables are binary operations. Two operands participate in each operation. The result can be placed either into a third operand or into one of the operands participating in the previous operation. The result of execution of any specific operation might be the operand of another binary operation. However, again two operands participate in the binary operation and the result is placed into one of them. These considerations are also applicable if there are parentheses in the expression. Thus, it is possible to conclude that two operands are enough for storing intermediate results. However, what should you do if the operands are 64-bit ones (and you have a 32-bit processor)? The C++ compiler can use library procedures (such as _alldiv), which are provided especially for such cases. Nevertheless, as you'll see later, sometimes the compiler still uses the stack for temporary variables.

It is time to study an instructive example. Some program that carries out numeric computations would be suitable for this purpose. The program in Listing 3.39 provides an example of such a computation, where both integer and floating-point values are used in the expression to be computed.

Listing 3.39: Use of temporary variables on the example of numeric computations

 #include <stdio.h> void main() {         double i, j, s;         int k, d;         i = 10; j = 20; k = 30; d = 40;         s = ((k - 1)*(d - 1))*((i - 1)/(j - 1));         printf("%f\n", s); };

The disassembled code of the main function of this program, obtained using the IDA Pro disassembler, is shown in Listing 3.40.

Listing 3.40: Disassembled code of the main function of the program in Listing 3.39

 .text:00401000 _main          proc near       ; CODE XREF: start + 16E↓p .text:00401000       var_2C = qword ptr -2Ch .text:00401000       var_24 = dword ptr -24h .text:00401000       var_20 = qword ptr -20h .text:00401000       var_18 = dword ptr -18h .text:00401000       var_14 = dword ptr -14h .text:00401000       var_10 = qword ptr -10h .text:00401000       var_8  = qword ptr -8 .text:00401000             push    ebp .text:0040100              mov     ebp, esp .text:00401003             sub     esp, 24h .text:00401006             fld     ds:dbl_408110 .text:0040100C             fstp    [ebp + var_8] .text:0040100F             fld     ds:dbl_408108 .text:00401015             fstp    [ebp + var_20] .text:00401018             mov     [ebp + var_14], 1Eh .text:0040101F             mov     [ebp + var_18], 28h .text:00401026             mov     eax, [ebp + var_14] .text:00401029             sub     eax, 1 .text:0040102C             mov     ecx, [ebp + var_18] .text:0040102F             sub     ecx, 1 .text:00401032             imul    eax, ecx .text:00401035             mov     [ebp + var_24], eax .text:00401038             fild    [ebp + var_24] .text:0040103B             fld     [ebp + var_8] .text:0040103E             fsub    ds:dbl_408100 .text:00401044             fld     [ebp + var_20] .text:00401047             fsub    ds:dbl_408100 .text:0040104D             fdivp   st(1), st .text:0040104F             fmulp   st(1), st .text:00401051             fst     [ebp + var_10] .text:00401054             sub     esp, 8 .text:00401057             fstp    [esp + 2Ch + var_2C] .text:0040105A             push    offset unk_4080FC .text:0040105F             call    _printf .text:00401064             add     esp, 0Ch .text:00401067             xor     eax, eax .text:00401069             mov     esp, ebp .text:0040106B             pop     ebp .text:0040106C             retn .text:0040106C _main endp

For storing local variables, 36 bytes are allocated (sub esp, 24h). This is 4 bytes more than required for five variables. The compiler has allocated the stack memory for storing a temporary variable, although at first glance it might do without it. This is because it is also possible to use the reserves (such as the EDX register) or leave the result in the EAX register (as will be explained later). Microsoft's compiler tries to avoid using the EBX, EDI, and ESI registers for computations because doing so would make it necessary to take steps for recovering these registers in the end of the function.

The start-up initialization commands occupy addresses from 00401006 to 0040101F. As before, for initializing floating-point variables the compiler uses floating-point constants, ^[6] which are stored in the data segment. In this case, the constant is first loaded into the ST(0) FPU register using the fld command and then into appropriate variable (using the fstp command). Integer variables are initialized by directly loading specific values into them using the mov command.

Next, direct computations start. This stage requires more detailed consideration:

Commands from 00401026 to 0040102F load the k and d variables into registers and further prepare them for multiplication. The preparation consists of subtracting one from them. Thus, the EAX register will contain the k - 1 value, and ECX will contain the d - 1 value. Then it is possible to carry out multiplication. Next, the imul eax, ecx command is executed, and the multiplication result is loaded into the EAx register. In other words, the following operation is executed: (k - 1) * (d - 1) -> EAx. Later, it is necessary to decide where the computation result must be stored. The EAx register at first seems suitable because it appears that this register won't be used in later computations. However, there is a small problem here. The resulting integer value must participate in computations with real numbers. At the same time, the fld command loads values from the memory into FPU stack. Thus, the compiler made a reasonable decision to use a temporary variable for storing an intermediate result (the intermediate result will be stored directly in the stack).
Consider further computations. The result of computing the (k - 1) * (d - 1) expression is loaded onto the top of the FPU stack (into the ST(o) register) using the fld command. Then the fld command moves the current ST(0) value into ST(1) and loads the i variable into ST(0). Next, the fsub d. S: dbl_408100 command (located at the 0040203E address) computes the i - l expression, leaving the result in the ST(o) register. The next fld command loads the j variable into ST(o). As this happens (pay attention!), the previous value of ST(0) is moved into ST(+) and the previous value of ST(1) is moved into ST(2). Thus, ST(2) plays the role of a temporary variable. The next fsub command computes the j - 1 value. Then the fdivp st (1), st command carries out division and pops the stack. As a result, the quotient goes into ST ( 0) and the value in ST(2) moves into ST(1). The fmulpst (1) , st command carries out multiplication and pops the stack, which means that the final result goes into ST(0). The last stroke is carried out by the fst [ebp + var_10] command, which corresponds to ST (o) -> s. Note that the fst command loads the value into the variable without popping the stack.
To load a floating-point value into the stack, a well-known technique encountered earlier is used: The sub esp, 8 command, equivalent to the two PUSH commands, prepares the space for a floating-point variable. Then the fstp command (popping the coprocessor stack) places the result of computations into the stack for further use with the printf function.

Thus, temporary variables are used by the compiler for computations. The role of temporary variables can be delegated to general-purpose registers, FPU registers, and stack variables.

Temporary variables are often used when the result of execution of one function is used in another function (Listing 3.41).

Listing 3.41: Temporary variables when the result of executing a function is used in another

 #include <stdio.h> int add(int, int); int sub(int, int); void main() {        int i = 10, j = 20;        printf("%d\n", add (i, sub(i, j ))) ; }; int add(int a, int b) {        return a + b; }; int sub(int a, int b) {        return a - b; };

In the program shown in Listing 3.41, the result of the sub function is used in the add function, and the result of the add function, in turn, is used by the printf function. Listing 3.42 shows the fragment of the disassembled code of this program related to temporary variables.

Listing 3.42: Disassembled code of Listing 3.41 for processing intermediate variables

 .text:00401014        mov     eax, [ebp + var_8] .text:00401017        push    eax .text:00401018        mov     ecx, [ebp + var_4] .text:0040101B        push    ecx .text:0040101C        call    sub_401060 .text:00401021        add     esp, 8 .text:00401024        push    eax .text:00401025        mov     edx, [ebp + var_4] .text:00401028        push    edx .text:00401029        call    sub_401050 .text:0040102E        add     esp, 8 .text:00401031        push    eax .text:00401032        push    offset unk_4060FC .text:00401037        call    _printf .text:0040103C        add     esp, 8

The var_4 and var_8 variables correspond to the i and j variables in the program source code. First, the sub_401060 (sub) function is called. As should be expected, the result of this function is loaded into the EAX register. Later, the EAX register is used as a variable, which is then used as a parameter when calling the add function (sub_401050). Similarly, the result of the add function is loaded into the EAX register and used as a parameter when calling the printf function.

Register Variables

The C programming language makes provision for the register type of variables. Initially, it was assumed that variables declared as register must be stored in registers whenever possible. Contemporary compilers ignore this keyword (although it is considered valid for compatibility). Nowadays, compilers act as they consider expedient, according to the specified optimization options. Consider the example program shown in Listing 3.43. Compile this program using the Microsoft Visual C++ compiler, with the "create compact code" option.

Listing 3.43: Example program illustrating the use of register variables

 #include <stdio.h> void main() {         int i, j, s;         i = 0; j = 1; s = 0;         for(i = 0; i < 100; i++, j++) s = s + j;         printf("%d %d %d \n", i, j, s); };

The disassembled code of this program is shown in Listing 3.44.

Listing 3.44: Disassembled code of the program shown in Listing 3.43

 .text:00401000 _main         proc near          ;   CODE XREF: start + 16E↓p .text:00401000               xor     eax, eax .text:00401002               push    64h .text:00401004               inc     eax .text:00401005               xor     ecx, ecx .text:00401007               pop     edx .text:00401008 loc_401008:                      ; CODE XREF: _main + C↓j .text:00401008               add     ecx, eax .text:0040100A               inc     eax .text:00401005               dec     edx .text:0040100C               jnz     short loc_401008 .text:0040100E               push    ecx .text:0040100F               push    eax .text:00401010               push    64h .text:00401012               push    offset aDDD ; "%d d %d \n" .text:00401017               call    _printf .text:0040101C               add     esp, 10h .text:0040101F               xor     eax, eax .text:00401021               retn .text:00401021_main          endp

Note that although three local variables are defined in the source program, the stack is not used for storing variables in the resulting code. This is exactly the case, in which the compiler has used registers for storing variables. Also note that for code size minimization, the compiler didn't insert a prologue and an epilogue into the main function.

Thus, the ECX register is used for storing the s variable (the xor ecx, ecx command corresponds to s = 0). The xor eax, eax/... /inc eax commands relate to the j variable. As relates to the i variable, the compiler has introduced an interesting modification to reduce the code size. Instead of increasing a value of some variable and comparing it with 100, some variable is first assigned the value of 100 and after each iteration the value of this variable is decremented and compared with 0. This approach is easier and faster. The role of this register variable is delegated to the EDX register.

Finally, because in the end of the loop there is no variable that would contain the value of 100 (as there should be, according to the source code of the program), the number 100 is simply pushed into the stack using the push 64h command.

^[1]The same relates to the C programming language because this program doesn't use any capabilities introduced with the arrival of the C++ language. However, I won't concentrate on these minor details. I always mean the two most widely used C++ compilers, namely, Microsoft Visual C++ and Borland C++.

^[2]Start-up initialization of the s variable doesn't make sense because the initial value of s is not used anywhere.

^[3]A typical complication is deciding what to do if, for example, information about the string length doesn't correspond to the location of the terminator.

^[4]Here and further on, I use the Delphi compiler supplied as part of Borland Delphi 7.0.

^[5]When optimizing a program for maximum operating speed, even this trick won't help!

^[6]In the C++ language, constants stored in the data segment and having, like variables, strictly defined types, are called type safe constants. Constants used only directly in the program code are called literal constants.