Data identification was covered in Section 1.6; however, there I mainly described Assembly language. Analysis of code written using Assembly language is, on one hand, easier and on the other hand, more difficult than analyzing the code written in some high-level programming language. This task is easier because you are writing the same code that will be placed into the compiled program. This task is more difficult because Assembly language practically doesn't limit the options of the programmer. Thus, everything depends on the programmer's self-discipline and formulated tasks. If your goal is to confuse any potential investigator of your code, you won't be able to find a language better than Assembly. When you are writing a program in some high-level programming language, you can't predict what will result after your source code is compiled. Furthermore, most programmers writing their programs in Visual C++ or Delphi never think what the compiler would produce on the basis of their source code. When analyzing such a code, investigators must solve the following problems:
"Grind" the specific features of compiler operation
"Squeeze" their way through the programmer's working style
This section concentrates on the topic of identifying the data used in high-level programming languages.
There is a common opinion that global variables are harmful for programming. Nevertheless, most programmers always used them in the past, use them now, and will continue to use them in the future. Therefore, mastering the technique of recognizing global variables is a must.
I'll start investigation of the optimization influence with a simple program written in C++.[1] This program is presented in Listing 3.1. There are three global variables in this program, one of which is not initialized.
Listing 3.1: Simple C++ program containing three global variables, one uninitialized
#include <stdio.h> int a, b = 20, s = 0; void main() { a = 10; s = a + b; printf("%d", s); };
Consider what the Microsoft Visual C++ (Visual Studio .NET 2003) compiler would produce out of this program. Load the executable module, compiled using the "no optimization" option, into the IDA Pro disassembler. The disassembled code is presented in Listing 3.2. I hope that you won't have any difficulties studying this disassembled text, which I have followed with brief comments.
Listing 3.2: Disassembled code of the program (Listing 3.1) compiled without optimization
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 mov dword_4086E0, 0Ah ; a = 10 .text:0040100D mov eax, dword_4086E0 ; a -> eax .text:00401012 add eax, dword_408040 ; a + b -> eax .text:00401018 mov dword_4086E4, eax ; eax -> s .text:0040101D mov ecx, dword_4086E4 ; s -> ecx .text:00401023 push ecx .text:00401024 push offset unk_4060FC ; Formatted printf string .text:00401029 call _printf .text:0040102E add esp, 8 .text:00401031 xor eax, eax .text:00401033 pop ebp .text:00401034 retn .text:00401034 _main endp
Having carefully analyzed Listing 3.2, you'll immediately note the following interesting issues:
IDA Pro has excellently handled the job of recognizing global variables. This is not surprising. The text contains direct references to global variables (dword_4086EO, dword_4086E4, and dword_408040). Assembly commands directly refer to the variable size. Determining the sizes of variables is an important issue related to disassembling. It is not always possible to determine the variable size exactly. Note that the b (dword_408040) variable is located separately from the other two variables. The compiler considers a (dword_4086EO) and s (dword_4086E4) [2] variables uninitialized ones. This topic will be covered in more detail later in this section, when discussing variable size and location (see "Variable Size, Location, and Type").
Even a beginner will immediately note that the compiled text is redundant:
There are the so-called prologue (PUSH EBP/MOV EBP, ESP) and epilogue (POP EBP) of the function. These will be covered in more detail in Section 3.2.1. Both of these elements are redundant in this function, because the EBP register is used for addressing of the stack variables and parameters, which are not present in this program.
The a variable is initialized, then it is used in the addition operation. Because its value is not printed and is not further used, it is possible to use a simple constant instead of the a variable.
Unnecessary memory reservation for the s variable immediately attracts attention. Because the result of addition is loaded into the EAX register, it is most logical to use it as the s variable. In other words, it would be expedient to make s a register variable.
Listing 3.3 presents the disassembled code of the same program (see Listing 3.1) compiled with the "create fast code" option. As you can see, now the code doesn't create any function prologue or epilogue. For the moment, this issue is not the main one.
Listing 3.3: Disassembled code (Listing 3.1) compiled with the "create fast code" option
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 mov eax, dword_408040 ; b -> eax .text:00401005 add eax, OAh ; The sum is here. .text:00401008 push eax .text:00401009 push offset unk_4060FC .text:0040100E mov dword_4086E0, 0Ah ; 10 -> a .text:00401018 mov dword_4086E4, eax ; eax -> s .text:0040101D call _printf .text:00401022 add esp, 8 .text:00401025 xor eax, eax .text:00401027 retn .text:00401027 _main endp
Consider how the sum (the s variable) is obtained. The summing is carried out by adding the register content and a constant. This operation is carried out much faster than adding the register content and a variable. Pay special attention to the command grouping. First, the values are pushed onto the stack; then, they are followed by two data exchange commands. This approach is based on the Pentium processor properties. It is known as command pairing. Its main idea is that two commands that satisfy specific predefined conditions are executed in parallel, which means that two commands are carried out as a single command. Thus, the compiler has met some of the optimization requirements.
Note | Contemporary Intel-compatible processors have two pipelines of executing instructions. These pipelines are known as U pipelines and V pipelines. Under certain circumstances, the processor would execute two commands sequentially in different pipelines. As the result, the execution speed would be practically doubled. There are instructions that can be executed only in the U pipeline, and other instructions can be used only in V pipeline. Finally, there are instructions that can be executed in both pipelines. Knowing this, it is possible to group commands to increase the program execution speed as much as possible. Contemporary compilers "know" this processor feature. So, if you encounter an unusual order of instructions in the executable code, you should recall instruction pairing. |
Try to optimize by the code size. Disassembling shows that the change in the code size is minimal (compared with that in Listing 3.3): the ADD ESP, 8 command (which takes 3 bytes) is replaced with the POP ECX/POP ECX pair of commands (each command is 1 byte).
Why did I provide all these examples? My goal wasn't to study optimization techniques (this topic deserves a separate book). I simply wanted to prepare you (and provide a certain theoretical background) to perceive that the code you will analyze might be quite unusual as the result of optimization. Nevertheless, in the future I'll explain lots of optimization methods many times.
Note | The examples provided in this section, among other things, demonstrate that trying to provide better optimization than the compiler does (especially as relates to the execution speed) is not a simple job. The test example (see Listing 3.1) considered in this section is simple. The situation will become more complicated with a real-world Assembly program comprising hundreds of commands. Manual optimization of such programs becomes difficult. Thus, in most cases you'll have to rely on the compiler, especially when dealing with such products as Microsoft Visual C++, long famous for its optimization capabilities. |
When optimizing program code, evaluating the execution time of a specific program fragment becomes the most important issue. The simplest way of achieving this goal is to use two API functions. The first function is QueryPerformanceCounter. Its only argument is the pointer to the LARGE_INTEGER structure. If the function is executed correctly, this structure would store the number of processor clocks elapsed since program start-up. The second function is QueryPerformanceFrequency. Its argument also contains the pointer to the LARGE_INTEGER structure; however, this time the structure contains the clock frequency. Thus, if t1 and t2 stand for the number of clocks elapsed from the start and to the end of the program fragment being investigated, respectively, and fr is the clock frequency, then the number of milliseconds required for executing the given program fragment can be computed by the following formula: (t2 - t1) *1000/fr. This is only a rough evaluation, because in the multitasking environment exact computations of the execution time of the chosen program fragment are out of the question.
It is impossible to imagine the C programming language without pointers. Pointers are quintessential of this programming language and determine its fate. Instead operating over a variable, it is possible to operate over the pointer to that variable. To operate over pointers, compilers use indirect addressing. This fact, however, is self-evident. If s is some pointer to data, then the MOV EDX, s command allows the data to be accessed through [EDX]: for example, the MOV EAX, [EDX] command moves a 4-byte value from the data area into the EAX register.
Listing 3.4 demonstrates a sample program, in which one of the global variables is defined by a pointer. The disassembled listing of this program is provided in Listing 3.5.
Listing 3.4: Sample program, in which one global variable is defined using a pointer
#include <stdio.h> #include <stdlib.h> int a, b = 20; int *s; void main() { s = (int*)malloc(4); a = 10; *s = a + b; printf("%d", *s); free(s); };
Listing 3.5: Disassembled code of the program presented in Listing 3.4
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 push 4 ; Reserve 4 bytes. .text:00401005 call _malloc .text:0040100A add esp, 4 ; Clear the stack. .text:0040100D mov dword_4086CO, eax ; This variable ; contains a pointer. .text:00401012 mov dword_4086C4, OAh ; a = 10 .text:0040101C mov eax, dword_4086C4 ; a -> eax .text:00401021 add eax, dword_408040 ; a + b -> eax .text:00401027 mov ecx, dword_4086CO ; ECX contains the ; pointer address. .text:0040102D mov [ecx], eax ; The sum is located ; at the address ; referenced by the ; pointer. .text:0040102F mov edx, dword_4086CO ; Pointer -> edx .text:00401035 mov eax, [edx] ; Sum -> eax .text:00401037 push eax ; The sum is pushed ; into the stack. .text:00401038 push offset unk_4060FC ; The formatted string .text:0040103D call _printf .text:00401042 add esp, 8 .text:00401045 mov ecx, dword_4086CO ; Pointer -> ecx. .text:0040104B push ecx .text:0040104C call _free ; Release the pointer. .text:00401051 add esp, 4 .text:00401054 xor eax, eax .text:00401056 pop ebp .text:00401057 retn .text:00401057 _main endp
Note that Listing 3.5 uses indirect addressing twice (through the ECX and EDX registers). The second case, in which indirect addressing is used (through EDX), looks strange because ECX already contains the address of the s variable. Why use EDX? This is a rhetorical question. After all, I compiled the program having specified that no optimization was needed by the compiler. Thus, the compiler has simply generated one fragment for writing through the pointer and another fragment for reading through the pointer.
What conclusions can be drawn on the basis of this material? Notice that indirect addressing is used when briefly viewing the disassembled code. This means that pointers must be present in the program being investigated.
Consider an intricate issue: How do you distinguish the address of some global variable from a normal constant?
Consider the example program shown in Listing 3.6. Variables a, b, and c are assigned the values of some numeric constants, then the standard printf library function is used to output them to the console. The C program is correct and unambiguous; it cannot be interpreted incorrectly. However, consider how IDA Pro interprets the executable code of this program (Listing 3.7).
Listing 3.6: Distinction between an address of a global variable and a normal constant
#include <stdio.h> int a, b, c; void main() { a = 10; b = 20; c = 0x4086d0; printf("%d %d %d\n", a, b, c); };
Listing 3.7: Disassembled code of the program shown in Listing 3.6
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 mov dword_4086C8, 0Ah .text:0040100D mov dword_4086C0, 14h .text:00401017 mov dword_4086C4, offset unk_4086D0 .text:00401021 mov eax, dword_4086C4 .text:00401026 push eax .text:00401027 mov ecx, dword_4086C0 .text:0040102D push ecx .text:0040102E mov edx, dword_4086C8 .text:00401034 push edx .text:00401035 push offset aDDD ; "%d %d %d\n" .text:0040103A call _printf .text:0040103F add esp, 10h .text:00401042 xor eax, eax .text:00401044 pop ebp .text:00401045 retn .text:00401045 _main endp
Consider Listing 3.7, obtained using IDA Pro, more carefully. The dword_4086C8 label is the a variable, dword_4086C0 corresponds to the b variable, and dword_4086C4 stands for the c variable. What does this mean? The dword_4086C4 variable is used to load the address of the unk_4086D0 memory cells. Why? What is the role of these cells? The number 0x4086D0 is simply a constant! However, IDA Pro considered this number to be an address. Strange! It should be pointed out that the unk_ prefix means that the disassembler has doubts and is not sure what is hidden by that address. However, the disassembler's doubts do not matter! In the course of analysis, you must draw an unambiguous conclusion. In this example, the text is simple; therefore, it is not difficult to make the right decision. You must not have any doubts, even though IDA Pro has some. However strange this might seem at first, in this situation the W32Dasm disassembler has done a good job. This is not because of its outstanding capabilities in the field of recognizing addresses and constants. This is because of its lack of such capabilities, which causes this disassembler to interpret everything (or practically everything) as constants.
What would happen if the c variable is assigned the 0x4086c0 value? I hope that you have already guessed. In this case, IDA Pro will obtain additional confirmation that this is an address of some variable. Instead of the mov dword_4086C4, offset unk_4086D0 command, another command would appear in the listing: mov dword_4086C4, offset dword_4086c0. Thus, the disassembler no longer doubts that it is dealing with a variable. However, you know that this is not so. Furthermore, you will easily draw the right conclusion using the disassembled listing.
However, another problem remains. Any disassembler is a program; therefore, it needs a strict criterion that can be implemented algorithmically. In the case being considered, there are no commands that would confirm (or refute) the assumption that it is dealing with an address except for the range. Falling into this range makes a constant a candidate for being an address. What would this range be in the case in question? Everything is straightforward here. First, there is a range of addresses allocated for the data. IDA Pro considers a constant falling into this range one of the indications of a data address. However, there also is the range of code addresses. For example, if a constant is equal to 0x401000, the disassembler would consider that it deals with the address of the _main function. Note that in this case IDA Pro will not "suspect" that the constant represents an address; it would be sure that this is an address.
What conclusion can be drawn on the basis of these considerations? My conclusion is de omnibus dubitandum. In other words, you can never be certain of anything. If the suspicious constant is treated like an address — for instance, using the LEA command — then it is possible to speak about an address with greater certainty. Furthermore, if you notice that the constant is then used in indirect addressing or as a function parameter (which represents an address by definition), then you shouldn't have any doubts.
Long ago, when MS-DOS was dominant, I encountered in a Pascal manual a statement declaring that the use of 1-byte variables instead of 2-byte ones speeds up program operation. I doubted this statement, so I investigated the Assembly code of such a program. It turned out that the statement was far from true. Has the situation changed? How do 32-bit operating systems behave? Is there any practical advantage in using 1- and 2-byte variables instead of 4-byte ones? Where are variables located, and how can disassemblers determine their size? All of these questions will be answered in this section.
To begin the investigation, recall the material provided in Section 1.1.3. Consider the code fragment shown in Listing 3.8 (it is similar to the one shown in Listing 1.2).
Listing 3.8: Fragment of the test C program for studying variable size, location, and type
BYTE e = 0xab; WORD c = 0x1234; DWORD b = 0x34567890;
If you view the memory, you will discover that all variables are aligned by a boundary that is a multiple of four. However, it turns out that this alignment is only due to the order, in which these variables were declared. For example, consider code, in which variables are declared in a different order (Listing 3.9).
Listing 3.9: Fragment of Listing 3.8, in which variables are declared in a different order
WORD c = 0x1234; BYTE e = 0xab; DWORD b = 0x34567890;
In this case, the compiler will place variables in memory so that the first two variables will be located in two neighboring words. The b variable will be aligned by the 4-byte boundary, as in the previous case. There are optimal rules for the alignment of different data types. Table 3.1 outlines information about the alignment of data of different sizes.
Data size | Alignment |
---|---|
1 byte | 1 (no alignment) |
2 bytes | 2 |
4 bytes | 4 |
6 bytes | 8 |
8 bytes | 8 |
10 bytes | 16 |
16 bytes | 16 |
Consider another example (Listing 3.10).
Listing 3.10: Simple example illustrating optimal alignment of data of different sizes
#include <stdio.h> #include <windows.h> WORD b = 10; BYTE a; DWORD c; void main() { a = 10; c = 30; printf("%d %d %d\n", a, b, c); };
This is a simple example. However, even here there is a particular feature that will be helpful for investigating some patterns of memory allocation for different variables. The a and c variables are not initialized. They are assigned their values directly in the program text. The b variable is initialized. Is there any difference among these variables? As it turns out, there is. Compile the program using the Microsoft Visual C++ compiler, then disassemble the resulting executable module using IDA Pro. Analyze the resulting listings, and you'll find that in IDA Pro all variables will be located in the . data section. However, recall the material provided in Section 1.5.3, where it was explained that initialized variables must be placed into the .data section and uninitialized ones must be added to the .bss section. Curiously, listings produced by IDA Pro clearly show that although all variables are located within the same section, they are placed into different parts of that section: First, there is an initialized variable, then, after a long-enough interval, there are two uninitialized variables. To understand the reason behind such behavior, compile the program with the /Fas command-line option. An intermediate Assembly listing will be generated in the course of compiling. View this listing, and you'll discover an interesting phenomenon: Two segments are present in the listing, one with the _data name (containing an initialized variable) and another one called _bss (containing uninitialized variables). Later, these segments must transform into appropriate sections. However, the compiler knows the names of the _bss and _data segments and later combines them into the single . data section. With all that being so, the data located in the _bss segment always follow the data from the _data segment. To check this statement, write a simple Assembly program containing two data segments (_bss and _data). After linking, only one data section called .data will remain. However, if you slightly change the segment names, for example, replacing _bss with _bss1, then the executable module will have two sections: . data and _bssl (including the underscore character). After checking this statement for the Microsoft Visual C++ compiler, test the behavior of other compilers. In my experiments, compiling the program from Listing 3.10 using Borland C++ v. 5.00 showed that this compiler behaves similarly. In this case, the Assembly code contained two data segments, with the _data and _bss names.
Consider another issue. How is it possible to determine the variable size in the course of disassembling? The generalized answer to this question is as follows: This goal can be achieved by analyzing the commands that operate over a specific variable. This answer is self-evident because the variable behaves in a specific way depending on the operations that are carried out over it. Recall the material provided in Section 1.4, where the format of the Intel microprocessor commands was described. For instance, consider a simple operation that assigns some integer value to a numeric variable. In C, this operation appears, for example, as follows: b = 10. Accordingly, an Assembly command in general will appear as follows: MOV [mem], 10. However, you know that in Assembly such operations require the variable type to be specified explicitly (for example, byte ptr). This requirement is well-grounded. There is a significant difference between placing the number 10 into a WORD variable and placing it into a DWORD variable. Because there is a significant difference in the mathematics, there also must be a difference in the command format.
Consider the complete command codes for the commands assigning values to variables of three types: BYTE, WORD, and DWORD (Listing 3.11).
Listing 3.11: Complete codes of commands assigning values to three types of variables
C605 C8864000 14 MOV byte ptr [04086C8], 20 66 C705 C8864000 0A00 MOV word ptr [04086C8], 10 C705 C4864000 1E000000 MOV dword ptr [04086C4], 30
Note that MOD R/M bytes for all three commands are identical. The reason is clear: The first operand is an offset for all three commands. Curiously, the code of the command operating over a WORD operand differs by the presence of the 66H prefix from the code of the command operating over the DWORD operand. This prefix specifies that the operand has the WORD type, not the DWORD type. The command, in which the first operand has the BYTE type, has its individual code. Thus, it becomes clear how the disassembler obtains information about the variable size: It simply analyzes the program code.
Until now, floating-point numbers have not been covered. Now it is time to consider them. Consider the program shown in Listing 3.12.
Listing 3.12: Simple program for investigating the behavior of floating-point variables
#include <stdio.h> #include <windows.h> double s, d; int i; void main() { s = 0.00; d = 1.034; for(i = 0; i < 100; i++) s = s + i/d; printf("%f\n", s); };
As you can see, the program in Listing 3.12 has two double variables. Recall the material provided in Section 1.1.3 — to be precise, in its "Real Numbers" subsection, where floating-point numbers were described. The format of double numbers used in the C++ language corresponds to the format of long floating-point numbers supported by the Intel microprocessor, or, to be precise, by its FPU (see Section 1.2.3). Listing 3.13 contains the disassembled code of the main function from Listing 3.12.
Listing 3.13: Disassembled code of the main function from Listing 3.12
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 var_8 = qword ptr -8 .text:00401000 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 fld ds:dbl_408108 .text:00401009 fstp dbl_40R9D0 .text:0040100F fld ds:dbl_408100 .text:00401015 fstp dbl_40A9C0 .text:0040101B mov dword_40A9C8, 0 .text:00401025 jmp short loc_401034 .text:00401027 loc_401027: ; CODE XREF: _main + 55↓j .text:00401027 mov eax, dword_40A9C8 .text:0040102C add eax, 1 .text:0040102F mov dword_40A9C8, eax .text:00401034 .text:00401034 loc_401034: ; CODE XREF: _main + 25↑j .text:00401034 cmp dword_40A9C8, 64h .text:0040103B jge short loc_401057 .text:0040103D fild dword_40A9C8 .text:00401043 fdiv dbl_40A9C0 .text:00401049 fadd dbl_40A9D0 .text:0040104F fstp dbl_40A9D0 .text:00401055 jmp short loc_401027 .text:00401057 ;----------------------------------------------------------------- .text:00401057 .text:00401057 loc_401057: ; CODE XREF: _main + 3B↑j .text:00401057 fld dbl_40A9D0 .text:0040105D sub esp, 8 .text:00401060 fstp [esp + 8 + var_8] .text:00401063 push offset unk_4080FC .text:00401068 call _printf .text:0040106D add esp, 0Ch .text:00401070 xor eax, eax .text:00401072 pop ebp .text:00401073 retn .text:00401073 _main endp
The disassembled listing created by IDA Pro deserves special comments:
For the moment, skip the strange var_8 variable, which will be considered later. Also, skip the function prologue. The four commands that following the prologue are interesting. They represent nothing but the assignment of initial values to the s and d variables. For this purpose, the compiler has reserved places for two floating-point constants (dbl_408108 and dbl_408100) beforehand. Using a sequence of two commands (fld and fstp), the constant is loaded into an appropriate variable (these commands can be found in Table 1.19). Both constants and variables (dbl_40A9D0 and dbl_40A9C0) take 8 bytes, which is quite natural. The next command, resetting the dword_40A9C8 integer variable to zero, is self-evident. It simply assigns an initial value to the loop counter.
Later, there is a jump into the loop body to the loc_401034 label. Before this label, there are three commands, which are intended to increase the loop counter (i++). Therefore, skip these commands the first time. A possible exit from the loop is checked by the cmp dword_40A9C8 and 64h/jge short loc_401057 commands. Naturally, 64h corresponds to 100.
Then, there are four commands whose goal can be guessed by the code of the source program. They correspond to s = s + i/d. The algorithm implemented by these commands is as follows: The fild dword_40A9C8 command loads the integer loop counter into the top of the coprocessor stack, st (0). The next command, fdiv, divides the loop counter by the dbl_40A9C0 variable (this is d). Then, the fadd command adds the division result to the dbl_40A9D0 variable, where the sum will be accumulated. Finally, because the addition result is located in the coprocessor stack, the fstp command is used to place it into the dbl_40A9D0 variable. The coprocessor stack is popped, which means that, for example, the content of ST(1) is moved to ST(0). Later, an unconditional jump returns control to the start of the loop.
Then, there is the call to the printf function. It is necessary to push a floatingpoint number into the stack. This is an instructive technique. The fld dbl_40A9D0 command pushes the computed sum into the coprocessor stack The next command, sub esp, 8, reserves space in the stack for an 8-byte value. This command is equivalent to the two push commands. Then, the fstp [esp + 8 + var_8] command places the sum from the coprocessor stack into the normal stack. The next push command sends the formatted string into the stack.
The case just considered, in which initial values of floating-point variables are stored in constants and then loaded into variables, is practiced by the Microsoft Visual C++ compiler. The Borland C++ compiler uses another technique, which is less illustrative. The disassembled code of the executable module produced by the Borland C++ compiler is shown in Listing 3.14.
Listing 3.14: Disassembled code of the executable module produced by Borland C++
.text:0040111B mov dword ptr dbl_40C2C4, 95810625h .text:00401127 mov dword ptr dbl_40C2C4 + 4, 3FF08B43h
As you can see, two strange constants are loaded into the memory. From this listing, it is hardly possible to determine that this is a floating-point number and then to determine that number. When analyzing this code, it is impossible to do without the information provided in Section 1.1.3. The same problem is also encountered in Microsoft Visual C++, provided that you operate over float variables. Such variables are short real numbers taking only 32 bits. Therefore, a normal mov command is used for assigning this type of value to a variable. However, for any operations over such variables, coprocessor commands are used. Thus, I strongly recommend that you gain a sound understanding of the structure of real numbers (Section 1.1.3).
Thus, if you encounter FPU commands, you must immediately understand that it will be necessary to spend time investigating floating-point variables.
When dealing with an integer variable, it is important to discover whether it is signed or unsigned. For example, how would you distinguish int variables from unsigned int (DWORD) ones? The general principle is as follows: Analyze the operations over the variables of interest and, on the basis of this analysis, determine their types. A more specific method of determining the type of integer variables is analysis of the conditional constructs, in which they participate. For example, the JL conditional jump command is used for comparing signed numbers, and the JB command is its analogue used for unsigned numbers.
It only remains to answer a single question: Will any performance gain be obtained if you use integer variables smaller than 4 bytes? The answer to this question consists of the following issues:
Using shorter variables allows you to economize on memory.
However, it complicates the algorithm in the compiled code, because 32-bit variables must be used in the program anyway. Complication of the algorithm slows down the execution and results in the growth of the required memory.
Programming languages interpret string data types as sequences of encoded characters. As a rule, ASCII encoding is used. When using this type of encoding, 1 byte is allocated for encoding each character. Nowadays, Unicode encoding is gaining popularity. When using this type of encoding, 2 bytes are allocated for encoding a single character.
Strings look much like arrays. The difference between them is that the string structure contains information that can be used to easily determine its length. There are two different approaches to solving this problem.
The end of the string must be marked in some way. Some specific code can be used for this purpose, made up of 1 or more bytes. In C, the NULL (zero) code is traditionally used for this purpose (it should not be confused with the 0 character). When using Unicode, strings are terminated by two characters with the zero code. In addition, some contemporary compilers can terminate strings with an entire sequence of seven 0 bytes, thus adapting strings for processing in double-word blocks. Taking into account growing memory resources, this approach doesn't seem too wasteful. This mechanism is characterized by the following two drawbacks:
To discover the string length, it is necessary to view the entire string, no matter how long it might be. Furthermore, all string operations must be based on checking for the presence of the string terminating character, which makes these operations somewhat slower.
When this approach is used, 0 bytes cannot be used directly within a string.
Information about the string length (or about its end) must be stored somewhere within the string. Using the starting bytes of the string for this purpose is a natural approach. For example, this approach is used in Pascal and in Delphi. This might be only a single byte, in which case the string might not be longer than 255 characters. In Delphi, however, it is possible to create strings with a 4-byte length field. In this case, the maximum possible string length is comparable to the amount of the address space allocated to a process under the Windows operating system.
In addition to the two preceding approaches, it is possible to use a combined approach. In this case, the string length is specified before the string but the string terminator marks its end. This approach is convenient for compatibility. However, because of its redundancy, it is a constant source of headaches for programmers. [3]
Note | Programmers with experience in MS-DOS programming, certainly, would immediately recall function 9 of the int 21h interrupt, using which it is possible to output a character string to the screen. This system procedure used the dollar sign ($) as a terminator. This terminator is inconvenient and was moved out of use long ago. |
To begin investigation of the string data type, consider a simple example of using Unicode strings (Listing 3.15).
Listing 3.15: Simple example illustrating the use of Unicode strings
#include <stdio.h> wchar_t s[] = L"Hello, programmer!"; wchar_t f[] = L"%s\n"; void main() { wprintf(f, s); };
Recall that wchar_t specifies the Unicode string type, L stands for the macro converting an ASCII string to a Unicode string, and wprintf is the function for console output of Unicode strings (an analogue of the printf function used for console output of ASCII strings). Note that the format string (f) for the wprintf function also must be in Unicode encoding. Consider how IDA Pro disassembles the call to the wprintf function (Listing 3.16).
Listing 3.16: Disassembled listing of the call to the wprintf function
.text:00401003 push offset aHelloProgramme ; "Hello, programmer!" .text:00401008 push offset aS ; "%s\n" .text:0040100D call _wprintf
This is great, isn't it? IDA Pro has done an excellent job recognizing a Unicode string. Here are these strings as they appear in the data section (Listing 3.17).
Listing 3.17: Unicode strings from Listing 3.15 as they appear in the data section
.data:00409040 aHelloProgramme: ; DATA XREF: _main + 3↑o .data:00409040 unicode 0, <Hello, programmer!>, 0
If desired, you can press the <A> key to convert this string into the sequence of ASCII characters. You'll then discover that the codes of ASCII characters belonging to the range from 0 to 127 are converted to Unicode without changes by adding a most significant 0 byte (complementing a byte with a word). Thus, conversion of an English text from ASCII to Unicode is a trivial task.
The next example (Listing 3.18) relates to Delphi. [4]
Listing 3.18: Example illustrating the use of Delphi strings
var sl:widestring; s2:string; {by default this is an AnsiString} s3:shortstring; begin s1 := 'Hello world!'; s2 := 'Hello programmers!'; s3 := 'Hello hackers!'; writeln(sl); writeln(s2); writeln(s3); end.
The program in Listing 3.18 uses three types of strings available in Delphi. What would you see when analyzing the disassembled code produced by IDA Pro? What could be more interesting than programming, except for investigation of the executable code?
Compile this program, load it into IDA Pro, and analyze it automatically. Then, try to find the strings of interest in the Strings window. Strangely, only the Hello world! string can be found there. Hope remains that other strings are near, so you'd be able to find them quickly. This hope is not vain. Here is the code fragment that you needed (Listing 3.19).
Listing 3.19: Code fragment containing strings from the program in Listing 3.18
CODE:0044CC4D align 10h CODE:0044CC50 dd 18h CODE:0044CC54 aHelloWorld: CODE:0044CC54 ; DATA XREF: sub_44CBAC + 21↑o CODE:0044CC54 uniccde 0, <Hello world!>, 0 CODE:0044CC6E align 10h CODE:0044CC70 dd OFFFFFFFFh, 12h CODE:0044CC78 aHelloProgramme db 'Hello programmers!', 0 CODE:0044CC78 ; DATA XREF: sub_44CBAC + 30↑o CODE:0044CC8B align 4 CODE:0044CC8C dword_44CC8C dd 6C65480Eh, 68206F6Ch, 656B6361h, 217372h CODE:0044CC8C ; DATA XREF: sub_44CBAC + 3A↑o
Very well! The disassembler has recognized the s2 string (Listing 3.18). It hasn't placed it into the Strings window; however, this is a minor drawback. It would be interesting to find out what is located at the 0044CC8C address, because the reference to that block from the program code is also present. Move the cursor to that string and press the <A> key (it is also possible to use the Options | Ascii string style menu commands and click the Pascal style button in the dialog box that would appear on the screen. Then the wonder would happen (Listing 3.20).
Listing 3.20: Fragment of the disassembled test program (Listing 3.18) with the s3 string
CODE:0044CC8C aHelloHackers db 14, 'Hello hackers!' CODE:0044CC8C ; DATA XREF: sub_44CBAC + 3A↑o CODE:0044CC9B db 0
As you can see, the third string also has been discovered. Why didn't the disassembler find it immediately? To all appearances, the cause lies in the byte with the 14 value, to which the reference was pointing. This is the string length byte. However, the disassembler, when analyzing the reference, considered that because this is the start of the string, then the text cannot contain a character with the code 14. In principle, this assumption was correct; however, the disassembler never guessed that this is the string length byte.
Thus, it becomes possible to draw conclusions. In case of a short string (shortstring), the reference points to the string length byte. By the way, pay attention that the string is terminated by the NULL character, which is not taken into account when computing the string length (which is correct).
Now consider two other strings. The string located at the 0044CC78 address also is null-terminated. Note that the reference again points to the start of the string and the string is null terminated. What about the string length? This issue is interesting. The string is preceded by two 4-byte values. The 12h number specifies the string length. As you can see, 4 bytes are allocated for the string length. However, the string structure includes 4 more bytes. This is the so-called reference count. Thus, for strings of this type the reference points directly to the string contents. The text information itself is preceded by 8 bytes of auxiliary information.
The last string type is Unicode. The Unicode string starts at the 0044CC54 address. In contrast to the previous case, the string structure includes a 4-byte length, but there is no reference count. In this case, the reference from the program code points to the string contents. The disassembler has located this string because of this. The string is terminated by two 0 bytes.
To conclude the discussion of strings, consider the simple test program shown in Listing 3.21. Compile this program using Microsoft Visual C++.
Listing 3.21: Simple C program illustrating string operations
#include <stdio.h> #include <string.h> char s[] = "Good-bye!"; void main() { strcat(s," My love!"); printf("%s\n", s); }
The disassembled code of the program presented in Listing 3.21 is shown in Listing 3.22.
Listing 3.22: Disassembled code of the program shown in Listing 3.21
.text:00401000 _main proc near ; CODE XREF: start + 16E↑p .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 push offset aMyLove ; char * .text:00401008 push offset aGoodBye ; char * .text:0040100D call _strcat .text:00401012 add esp, 8 .text:00401015 push offset aGoodBye ; "Good-bye!" .text:0040101A push offset aS ; "%s\n" .text:0040101F call _printf .text:00401024 add esp, 8 .text:00401027 xor eax, eax .text:00401029 pop ebp .text:0040102A retn .text:0040102A _main endp
Listing 3.22 is easy and is not worth special comments. It should only be mentioned that both strings are excellently recognized by IDA Pro.
Introduce a small modification into the program shown in Listing 3.21. Make the s variable local by moving its definition into the main function. After compiling the program and disassembling its code, you'll obtain an unusual disassembled code (Listing 3.23).
Listing 3.23: Disassembled code of the modified program (Listing 3.21)
.text:00401000 _main proc near ; CODE XREF: start + 16E↑p .text:00401000 var_C = byte ptr -0Ch .text:00401000 var_8 = dword ptr -8 .text:00401000 var_4 = word ptr -4 .text:00401000 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, OCh .text:00401006 mov eax, ds:dword_4060FC .text:0040100B mov dword ptr [ebp + var_C], eax .text:0040100E mov ecx, ds:dword_406100 .text:00401014 mov [ebp + var_8], ecx .text:00401017 mov dx, ds:word_406104 .text:0040101E mov [ebp + var_4], dx .text:00401022 push offset aMyLove ; char * .text:00401027 lea eax, [ebp + var_C] .text:0040102A push eax ; char * .text:0040102B call _strcat .text:00401030 add esp, 8 .text:00401033 lea ecx, [ebp + var_C] .text:00401036 push ecx .text:00401037 push offset aS ; "%s\n" .text:0040103C call _printf .text:00401041 add esp, 8 .text:00401044 xor eax, eax .text:00401046 mov esp, ebp .text:00401048 pop ebp .text:00401049 retn .text:00401049 _main endp
Consider Listing 3.23 more carefully. The code is unusual. The disassembler has determined only one string (a literal). However, the first parameter of the strcat function is the address of the string that the disassembler failed to locate. This can be stated doubtlessly because strcat is a well-known library function. However, what about commands ranging from the 00401006 to the 0040101E address? What do they mean? They move 10 bytes of data into the stack area (recall that the string must be stored in the stack). At the same time, the string in question is exactly 10 bytes in size (taking into account the 0 byte). Thus, it is an intricate method used by the compiler to pass the string from the data section to the stack area. Consider the memory address 004060FC, from which the block passed into the stack starts. Here is this block (Listing 3.24).
Listing 3.24: Memory block passed to the stack
.rdata:004060FC dword_4060FC dd 646F6F47h ; DATA XREF: _main + 6↑r .rdata:00406100 dword_406100 dd 6579622Dh ; DATA XREF: _main + E↑r .rdata:00406104 word_406104 dw 21h ; DATA XREF: _main + 17↑r
Press the <A> key and convert the block to the ASCII format. After that, the "lost" string will be found. The conclusion is easy and straightforward: The disassembler failed to locate one of the strings because the compiler treated it simply as a block of data.
As shown in the previous section, although strings have a structure that allows you to determine the data size, even such a powerful disassembler as IDA Pro is not always capable of recognizing a string, to speak nothing about arrays. This is because the array size is not explicitly specified in the structure. There are difficulties related to determining the array size. However, arrays can be clearly identified. Consider a simple example. In the program shown in Listing 3.25, an integer array is filled with integer numbers ranging from zero to nine. After compiling this program using Microsoft Visual Studio and loading the executable code into IDA Pro, the disassembled code shown in Listing 3.26 will be obtained.
Listing 3.25: Simple C program for investigating array identification in the executable code
#include <stdio.h> int a[10]; void main() { for(int i = 0; i < 10; i++) a[i] = i; };
Listing 3.26: Disassembled code of the program shown in Listing 3.25
.text:00401000 _main proc near ; CODE XREF: start + 16E↑p .text:00401000 var_4 = dword ptr - 4 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 push ecx .text:00401004 mov [ebp + var_4], 0 .text:0040100B jmp short loc_401016 .text:0040100D loc_40100D: ; CODE XREF: _main + 29↑j .text:0040100D mov eax, [ebp+var_4] .text:00401010 add eax, 1 .text:00401013 mov [ebp + var_4], eax .text:00401016 loc_401016: ; CODE XREF: _main + B↑j .text:00401016 cmp [ebp + var_4], OAh .text:0040101A jge short loc_40102B .text:0040101C mov ecx, [ebp + var_4] .text:0040101F mov edx, [ebp + var_4] .text:00401022 mov dword_4072C0[ecx*4], edx .text:00401029 jmp short loc_40100D .text:0040102B loc_40102B: ; CODE XREF: _main + 1A↑j .text:0040102B xor eax, eax .text:0040102D mov esp, ebp .text:0040102F pop ebp .text:00401030 retn .text:00401030 _main endp
You encountered the method of loop organization shown in Listing 3.13. As you have certainly guessed, var_4 is nothing but the stack variable — the loop counter. Pay special attention to the mov dword_4072C0 [ecx*4], edx command, which is the key to understanding the operating logic of this program. There is no doubt that this is an array: dword_4072C0 is the start of this array, ecx contains the current index value, and the scaling coefficient equal to four indicates that each element of this array is 4 bytes in size. The array size in this program can be clearly identified. However, you should not rely on the assumption that the number of array elements is always determined by the number of iterations in the loop that processes this array. The programmer might use different parts of the array in different sections of the program. With all this being so, these fragments of the array must not begin from the starting point of that array. Thus, with high probability it is possible to state that the array size is no less than the specified value.
Some problems might arise when using arrays in functions. The argument accepted by the function is simply a pointer. This pointer might be passed farther through a sequence of functions. Assume that in the last function you see some parameter used as a pointer to an array. To locate that array, you'll have to traverse the entire sequence of functions in the reverse direction, which would require time and patience. In such situations, it is better to use the debugger, set a breakpoint to the function where the pointer behaves like a pointer to an array, and obtain the value of that pointer. Having accomplished this, it is necessary to return to disassembler, locate the required array at the address determined using the debugger, and find cross-references from the program code to that array. After that, it will be possible to continue analysis of the executable code.
A structure is a generalization of an array. In contrast to arrays, which are made up of the elements of the same type, structures can comprise elements of different types. As with arrays, structure elements are accessed on the basis of the base address, which defines the starting point of the structure instance. However, the problem is more complicated than with arrays. Sometimes, it is difficult to make sure that data items of different types belong to the same structure. Consider a C program illustrating the behavior of structures (Listing 3.27).
Listing 3.27: Sample program for investigating the behavior of structures
#include <stdio.h> #include <windows.h> struct a { char s[10]; BYTE b; int i; }; a al; void main() { for(int j = 0; j < 10; j++) a1.s[j] = 'A'; al.b = 10; al.i = 10000; };
Compile this program using the Microsoft Visual C++ compiler, then disassemble the executable code using IDA Pro. The disassembled text of this program is shown in Listing 3.28.
Listing 3.28: Disassembled text of the program shown in Listing 3.27
.text:00401000 _main proc near ; CODE XREF: start + 16E↑p .text:00401000 var_4 = dword ptr -4 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 push ecx .text:00401004 mov [ebp + var_4], 0 .text:0040100B jmp short loc_401016 .text:0040100D loc_40100D: ; CODE XREF: _main + 26↑j .text:0040100D mov eax, [ebp + var_4] .text:00401010 add eax, 1 .text:00401013 mov [ebp + var_4], eax .text:00401016 loc_401016: ; CODE XREF: _main + B↑j .text:00401016 cmp [ebp + var_4], 0Ah .text:0040101A jge short loc_401028 .text:0040101C mov ecx, [ebp + var_4] .text:0040101F mov byte_4072C0[ecx], 41h .text:00401026 jmp short loc_40100D .text:00401028 loc_401028: ; CODE XREF: _main + 1A↑j .text:00401028 mov byte_4072CA, 0Ah .text:0040102F mov dword_4072CC, 2710h .text:00401039 xor eax, eax .text:0040103B mov esp, ebp .text:0040103D pop ebp .text:0040103E retn .text:0040103E _main endp
Carefully consider the text shown in Listing 3.28. In this text, you will encounter three different types of data determined by the following pointers: byte_4072C0 (array), byte_4072CA (byte), and dword_4072CC (double word). At the same time, there are no clear indications that these variables must be joined into the same structure. This is of no importance in the current context. Hence, the program must contain operations that would disclose the structure as an integral entity.
Consider the program shown in Listing 3.29. As you can see, the a structure is the parameter of the init procedure. Then consider how this situation is reflected in the program's executable code (Listing 3.30). This program is artificial because the structure passed to the function is not used and is not passed back.
Listing 3.29: Behavior of the structure passed to some function as a parameter
#include <stdio.h> #include <windows.h> struct a { char s[10]; BYTE b; int i; }; a al; void init(a); void main() { init(al); }; void init(a c) { for(int j = 0; j < 10; j++) c.s[j] = 'A'; c.b = 10; c.i = 10000; };
Listing 3.30: Disassembled text of the main function of the program shown in Listing 3.29
.text:00401000 main proc near ; CODE XREF: start + 16E↑p .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, 10h .text:00401006 mov eax, esp .text:00401008 mov ecx, dword_4072C0 .text:0040100E mov [eax], ecx .text:00401010 mov edx, dword_4072C4 .text:00401016 mov [eax + 4], edx .text:00401019 mov ecx, dword_4072C8 .text:0040101F mov [eax + 8], ecx .text:00401022 mov edx, dword_4072CC .text:00401028 mov [eax + OCh], edx .text:0040102B call sub_401040 .text:00401030 add esp, 10h .text:00401033 xor eax, eax .text:00401035 pop ebp .text:00401036 retn .text:00401036 _main endp
Listing 3.30 presents the disassembled code of the main function of the program in Listing 3.29. The sub_401040 procedure, the call to which is carried out by 0040102B, is the init function. The lines of code preceding this procedure are of great interest. Pay special attention to the sub esp, 10h command. It is the equivalent of four PUSH commands. However, note that the size of the structure under consideration is exactly 16 bytes. After the command allocating the space in the stack is the mov eax, esp command. Thus, the EAX register points to the start of the stack area. This stack area is filled with the data. The impression is that you are dealing with 4 double words. IDA Pro has come to the same conclusion. That 16 bytes are allocated simultaneously (the structure length is exactly 15 bytes, but taking into account that the i field is aligned by the 4-byte boundary, the result is 16) must make you vigilant. Nevertheless, this alone doesn't prove anything. To discover what was passed to the function, it is necessary to analyze the code of that function (Listing 3.31).
Listing 3.31: Disassembled text of the init function (Listing 3.29)
.text:00401040 sub_401040 proc near ; CODE XREF: _main + 2B↑p .text:00401040 var_4 = dword ptr -4 .text:00401040 arg_0 = byte ptr 8 .text:00401040 arg_A = byte ptr 12h .text:00401040 arg_C = dword ptr 14h .text:00401040 push ebp .text:00401041 mov ebp, esp .text:00401043 push ecx .text:00401044 mov [ebp + var_4], 0 .text:0040104B jmp short loc_401056 .text:0040104D loc_40104D: ; CODE XREF: sub_401040 + 24↓j .text:0040104D mov eax, [ebp + var_4] .text:00401050 add eax, 1 .text:00401053 mov [ebp + var_4], eax .text:00401056 loc_401056: ; CODE XREF: sub_401040 + B↑j .text:00401056 cmp [ebp + var_4], OAh .text:0040105A jge short loc_401066 .text:0040105C mov ecx, [ebp + var_4] .text:0040105F mov [ebp + ecx + arg_0], 41h .text:00401064 jmp short loc_40104D .text:00401066 loc_401066: ; CODE XREF: sub_401040 + lA↑j .text:00401066 mov [ebp + arg_A], 0Ah .text:0040106A mov [ebp + arg_C], 2710h .text:00401071 mov esp, ebp .text:00401073 pop ebp .text:00401074 retn .text:00401074 sub_401040 endp
Consider the code of the init function (see Listing 3.31). Principally, this text is similar to that provided in Listing 3.28. However, this time, taking into account the analysis of the code of the main function (see Listing 3.30), it is possible to understand its meaning. Thus, 16 bytes were passed to the function (4 times, 4 bytes at a time). The function first processes an array (10 bytes in size), then a 0 byte (arg_0), then a 1-byte value (arg_A), and finally a 4-byte value (arg_C). At this point, it is logical to assume that the object you are dealing with is a structure. What allows you to draw such a conclusion? For instance, the 3 independent (at first glance) double words were sent to the stack and the first 10 bytes are combined to form an array within the procedure can confirm this assumption.
Thus, it is possible to conclude that the structures can be disclosed when they are passed as parameters. However, it is necessary to admit that these considerations are too heuristic to delegate this task to a disassembler. An interesting point here is that the Borland C++ compiler in a similar situation acts in approximately the same way as Microsoft Visual C++. Compile the program presented in Listing 3.29 using the Borland C++ compiler, then disassemble it using IDA Pro. The disassembled fragment of the executable code responsible for calling the Init function is shown in Listing 3.32.
Listing 3.32: Fragment calling Init (compiled by Borland C++ and disassembled by IDA Pro)
.text:00401108 mov al, byte_40C2C6 .text:0040110E shl eax, 10h .text:00401111 mov ax, word_40C2C4 .text:00401118 push eax .text:00401119 push dword_40C2CO .text:0040111F push dword_40C2BC .text:00401125 push dword_40C2B8 .text:0040112B call sub_401134
This fragment is notable by a strange variable — word_40C2C4. Where could such a variable of the WORD type come from? After all, there are no such variables in the program. Nevertheless, the total amount of data passed through the stack is 16 bytes as in the previous case — to be precise, 15 bytes. Is Borland more accurate than Microsoft? This is unlikely.
However, there are situations, in which the disassembler can unambiguously determine that it is dealing with a structure. These are situations, in which structures are used as parameters when calling well-known library or API functions. The code fragment shown in Listing 3.33 demonstrates the call to the RegisterClass API function. I have intentionally provided the code lines preceding this call. These code lines contain commands that fill the WndClass structure, which the disassembler recognizes excellently. It cannot fail to recognize this structure, because its address is the parameter of the well-known API function.
Listing 3.33: Disassembled code showing the call to the RegisterClass API function
.text:0040104D mov [ebp+WndClass.style], 0 .text:00401054 mov [ebp+WndClass.lpfnWndProc], offset sub_401140 .text:0040105B mov [ebp+WndClass.cbClsExtra], 0 .text:00401062 mov [ebp+WndClass.cbWndExtra], 0 .text:00401069 mov edx, [ebp + hInstance] .text:0040106C mov [ebp + WndClass.hInstance], edx .text:0040106F push 7F00h ; lpIconName .text:00401074 mov eax, [ebp + hInstance] .text:00401077 push eax ; hInstance .text:00401078 call ds:LoadIconA .text:0040107E mov [ebp + WndClass.hIccn], eax .text:00401081 push 7F00h ; lpCursorName .text:00401086 push 0 ; hInstance .text:00401088 call ds:LoadCursorA .text:0040108E mov [ebp + WndClass.hCursor], eax .text:00401091 mov [ebp + WndClass.hbrBackground], 6 .text:00401098 mov [ebp + WndClass.lpszMenuName], 0 .text:0040109F lea ecx, [ebp + ClassName] .text:004010A2 mov [ebp + WndClass.lpszClassName], ecx .text:004010A5 lea edx, [ebp + WndClass] .text:004010A8 push edx ; lpWndClass .text:004010A9 call ds:RegisterClassA
In Listing 3.33, the address of the WndClass structure is counted in relation to the contents of the EBP register, which means that the structure is defined as a stack local variable (see Section 3.1.2). However, the essence of these considerations won't change if you make it a global variable. In this case, the structure is identified because it is used as a parameter.
As a rule, local variables are interpreted as variables defined directly within a procedure or a function. As you know, the stack is used for this purpose. In my opinion, this is only a particular case. I understand local variables widely, not only as variables defined in the stack (they might be called stack variables) but also as temporary variables (local in relation to the program run time) and as variables stored in registers.
Variables defined in the stack (stack variables) were already mentioned several times. The program in Listing 3.34 uses only local variables and two functions: main and add. Note that the add function accepts three arguments and that the first argument is a pointer. The s variable is modified in the add function.
Listing 3.34: Example program illustrating the use of local variables
#include <stdio.h> int add(int *, int, int); void main() { int i = 10, s, j; s = 12; j = 20; printf("%d\n", add(&s, i, j)); }; int add(int *s1, int i1, int jl) { int n; *s1 = *s1 + 10; n = *s1 + j1 + i1; return n*n; };
The disassembled text of the main function from Listing 3.34 is presented in Listing 3.35. Note that when compiling the test program, the option preventing optimization was set.
Listing 3.35: Disassembled text of the main function from Listing 3.34
.text:00401000 _main proc near ; CODE XREF: start + 16E↑p .text:00401000 var_C = dword ptr -0Ch .text:00401000 var_8 = dword ptr -8 .text:00401000 var_4 = dword ptr -4 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, 0Ch .text:00401006 mov [ebp + var_4], 0Ah .text:0040100D mov [ebp + var_8], 0Ch .text:00401014 mov [ebp + var_C], 14h .text:0040101B mov eax, [ebp + var_C] .text:0040101E push eax .text:0040101F mov ecx, [ebp + var_4] .text:00401022 push ecx .text:00401023 lea edx, [ebp + var_8] .text:00401026 push edx .text:00401027 call sub_401050 .text:0040102C add esp, 0Ch .text:0040102F push eax .text:00401030 push offset unk_4060FC .text:00401035 call _printf .text:0040103A add esp, 8 .text:0040103D xor eax, eax .text:0040103F mov esp, ebp .text:00401041 pop ebp .text:00401042 retn .text:00401042 _main endp
Skip the standard function prologue, and look at the sub esp, 0CH command. Here, 12 bytes are reserved for local variables — this is the area between the previous value of the stack pointer (to which the EBP register points) and the new value. This corresponds to three variables (see Listing 3.34). Nevertheless, IDA Pro declares these variables as var_4, var_8, and var_C. What do the _4, _8, and _C suffixes mean? These are addresses where the variables are located in relation to the boundary, from which the area of stack variables starts. The address of this boundary is stored in the EBP register.
Next are the commands for data initialization. Note that there is no difference between variables initialized when declared and variables assigned some values in the program.
Addresses from 0040101B to 00401026 are occupied by the commands that send parameters into the stack for calling the add function. Pay special attention to the var_8 variable, which, doubtlessly, corresponds to the s variable in the program source code. To handle this variable, the lea edx, [ebp + var_8]/push edx commands are used, which means that the address of this variable is sent into the stack. This is natural, because in the program it is explicitly specified that the pointer is passed. However, I'd like to warn you against drawing premature conclusions. Compilers often handle pointers with undue familiarity. For the s variable, the pointer is passed to the function used in the program for modifying the s variable. If this were not so (if the s variable were not modified in the add function), then the compiler would be able to pass the variable to the function. This approach produces the same result, but it is much easier. Thus, two other variables, i (var_4) and j (var_C), are passed into the stack by value.
The result of the function call, which, as expected, is stored in the EAX register (nevertheless, see Section 3.2.1), is passed to the function as a parameter for console output.
It is time to consider the code of the add function. The disassembled text is shown in Listing 3.36.
Listing 3.36: Disassembled text of the add function (Listing 3.34)
.text:00401050 sub_401050 proc near ; CODE XREF: _main + 27↑p .text:00401050 var_4 = dword ptr -4 .text:00401050 arg_0 = dword ptr 8 .text:00401050 arg_4 = dword ptr 0Ch .text:00401050 arg_8 = dword ptr 10h .text:00401050 push ebp .text:00401051 mov ebp, esp .text:00401053 push ecx .text:00401054 mov eax, [ebp + arg_0] .text:00401057 mov ecx, [eax] .text:00401059 add ecx, 0Ah .text:0040105C mov edx, [ebp + arg_0] .text:0040105F mov [edx], ecx .text:00401061 mov eax, [ebp + arg_0] .text:00401064 mov ecx, [eax] .text:00401066 add ecx, [ebp + arg_8] .text:00401069 add ecx, [ebp + arg_4] .text:0040106C mov [ebp+var_4], ecx .text:0040106F mov eax, [ebp + var_4] .text:00401072 imul eax, [ebp + var_4] .text:00401076 mov esp, ebp .text:00401078 pop ebp .text:00401079 retn .text:00401079 sub_401050 endp
IDA Pro assigns the function parameter names starting with the arg prefix. Thus, as expected, the function has obtained three parameters: arg_0, arg_4, and arg_8. As in case of the stack variables, offsets 0, 4, and 8 are counted in relation to the content of the EBP register; however, this time the offset is counted downward into the area of higher addresses.
Note that at first glance, no space is reserved in the stack for the var_4 variable (in the program, the name of this variable is n). This issue is an interesting one. Why does the compiler reserve stack space for variables in the main function? To reserve the stack space, the push ecx command is used. This can be easily discovered by checking the stack balance in the beginning and in the end of the procedure. To achieve this, count the number of bytes pushed into the stack in the beginning and popped from the stack in the end. The PUSH command is often used for reserving stack space when there is only one stack variable.
It would be interesting to find the parameter that is a pointer to variable among all function parameters. Here, everything is simple. This parameter was the last to be pushed. Because the stack grows upward, toward lower addresses, this parameter will have the smallest offset in the direction of higher addresses. This will be arg_0. Here is the sequence of commands that discloses this: mov eax, [ebp + arg_0] /mov ecx, [eax] /add ecx, 0Ah. This corresponds to *s1 = *s1 + 10.
All further computations are self-evident. They correspond to n = *s1 + j1 + i1. The imul instruction stands for the n*n operation.
Again, it is necessary to mention the optimization. Optimization can change the program code to such an extent that it becomes impossible to recognize it. This is especially true for Microsoft Visual C++. For instance, try to compile the program (see Listing 3.34) using the "create compact code" option. Before compiling, insert some output operator into the add function — for example, printf ("%d\n", n). Otherwise, the optimizer will do without any function call and replace it with the constant that it computes on its own (yes, this is so [5]). Now, consider what would happen to the main function after optimization (Listing 3.37).
Listing 3.37: Disassembled code of the optimized main function
.text:00401029 _main proc near ; CODE XREF: start + 16E↓p .text:00401029 var_4 = dword ptr -4 .text:00401029 push ebp .text:0040102A mov ebp, esp .text:0040102C push ecx .text:0040102D push 14h .text:0040102F lea eax, [ebp + var_4] .text:00401032 push 0Ah .text:00401034 push eax .text:00401035 mov [ebp + var_4], 0Ch .text:0040103C call sub_401000 .text:00401041 push eax .text:00401042 push offset unk_4060FC .text:00401047 call _printf .text:0040104C add esp, 14h .text:0040104F xor eax, eax .text:00401051 leave .text:00401052 retn .text:00401052 _main endp
Listing 3.37 is an instructive one. The main issue, to which it is necessary to pay attention in the course of analysis, is that only one stack variable has been defined. It would be desirable to guess, which variable this is, even without viewing the listing. This is the s variable. It is this variable whose contents will be modified in the add function. In other words, s is a variable. However, i and j are not variables; rather, they are in essence constants because they are not modified in the course of program execution. The optimizer treats them accordingly. Instead of allocating stack memory for them, it is possible to simply send numeric constants as parameters to the add function. This goal is achieved by the push 14h and push 0Ah commands. The address of the s variable is sent to the stack: lea eax, [ebp + var_4]/... /push eax.
Also, it is necessary to pay attention to another issue: Memory for the stack variable is allocated using the push ecx command, which can confuse the code investigator. However, the optimizer's main goal in this case is to make the code as compact as possible, and it does its best to achieve this. This also explains why only a single leave command is used to restore the stack in the end of the procedure.
Thus, the following conclusion can be drawn in relation to stack variables: If the value of a stack variable is not changed in the course of program execution, the optimizer can replace it with a constant. This information is not particularly important for a simple analysis of the program's actions. However, in my opinion, for a sound understanding of the program operating logic this issue is important.
Also, it is possible to obtain useful information if you compile the program shown in Listing 3.34 using the Borland C++ v. 5.0 compiler. The result of disassembling the executable code of the main function is shown in Listing 3.38.
Listing 3.38: Disassembled main function from Listing 3.34 compiled using Borland C++ 5.0
.text:00401108 _main proc near ; DATA XREF: .data:0040A0B8↓o .text:00401108 var_4 = dword ptr -4 .text:00401108 argc = dword ptr 0Ch .text:00401108 argv = dword ptr 10h .text:00401108 envp = dword ptr 14h .text:00401108 push ebx .text:00401109 push esi .text:0040110A push ecx .text:0040110B mov ebx, 0Ah .text:00401110 mov [esp + 4 + var_4], 0Ch .text:00401117 mov esi, 14h .text:0040111C push esi .text:0040111D push ebx .text:0040111E lea eax, [esp + 0Ch + var_4] .text:00401122 push eax .text:00401123 call sub_401140 .text:00401128 add esp, 0Ch .text:0040112B push eax .text:0040112C push offset format ; Format .text:00401131 call _printf .text:00401136 add esp, 8 .text:00401139 pop edx .text:0040113A pop esi .text:0040113B pop ebx .text:0040113C retn .text:0040113C_main endp
Different compilers are characterized by different styles. For instance, in contrast to Microsoft's compiler, which, just to be on the safe side, resets the EAX register to zero even when the main function is declared as void, Borland's compiler interprets the void type literally, which means that it doesn't pay attention to the contents of the EAX register. Another specific feature of Borland's compiler is that it actively uses the ESI and EBX registers. Note that according to generally adopted conventions, a function must not change the contents of the EBX, EBP, ESP, ESI, and EDI registers; so, Borland's compiler must insert PUSH EBX/PUSH EST commands in the beginning of the function and POP ESI/POP EBX commands in the end of function. I suspect that this is just an inherited legacy. In older Intel processors, the CX and DX registers could not be used for addressing.
Like Microsoft's compiler, Borland's compiler analyzes the text and discovers that the i and j variables are constants in their essence. Therefore, it doesn't reserve the memory in the stack for them and uses constants instead. Stack memory is reserved only for the s variable (var_4). Note that this also is carried out using a single PUSH command (push ecx).
Consider the most interesting issue. Borland's compiler doesn't use the EBP register here; it uses the ESP register instead. This is a well-known optimization technique, so you should know about it. However, you might object: The contents of the ESP register changes. You'd be right. But the compiler does not forget about this; it handles this problem excellently by dynamically tracking all changes of the ESP register and correcting the addressing as appropriate. Look, in the beginning was the mov [esp + 4 + var_4], 0Ch command followed by two PUSH commands. The content of ESP was reduced by eight. Therefore, the compiler uses the lea eax, [esp + 0Ch + var_4] command. Everything is correct, because 4 + 8 = 12 = 0Ch. IDA Pro, fortunately, also understands these issues and specifies the var_4 variable in both commands.
What are temporary variables? I consider as such the variables used for storing intermediate results of computations. In the course of computations, the processor registers are widely used. Therefore, it is possible to state that the registers are used as temporary variables. Note that you have already encountered such variables. For example, consider Listing 3.13, and recall how the loop was organized there (the 00401027-0040102F addresses). The EAX register plays the role of temporary variable, which for the time of loop execution stores the loop counter. When using real variables for storing intermediate results, the FPU registers are also used. As a rule, these are the first three registers of the coprocessor: ST(0), ST(1), and ST(2). If you recall Listing 3.13, the comments that follow it emphasized the method of start-up initialization of floating-point variables: The floating-point variable is first loaded into the ST(0) coprocessor register using the FLD command. Then, from the ST(0) register the variable is loaded into the memory area allocated for the floating-point variable (using the FSTP command).
How many registers might be needed if the expression to be computed is a complex one? Simple considerations are as follows: Operations over numeric variables are binary operations. Two operands participate in each operation. The result can be placed either into a third operand or into one of the operands participating in the previous operation. The result of execution of any specific operation might be the operand of another binary operation. However, again two operands participate in the binary operation and the result is placed into one of them. These considerations are also applicable if there are parentheses in the expression. Thus, it is possible to conclude that two operands are enough for storing intermediate results. However, what should you do if the operands are 64-bit ones (and you have a 32-bit processor)? The C++ compiler can use library procedures (such as _alldiv), which are provided especially for such cases. Nevertheless, as you'll see later, sometimes the compiler still uses the stack for temporary variables.
It is time to study an instructive example. Some program that carries out numeric computations would be suitable for this purpose. The program in Listing 3.39 provides an example of such a computation, where both integer and floating-point values are used in the expression to be computed.
Listing 3.39: Use of temporary variables on the example of numeric computations
#include <stdio.h> void main() { double i, j, s; int k, d; i = 10; j = 20; k = 30; d = 40; s = ((k - 1)*(d - 1))*((i - 1)/(j - 1)); printf("%f\n", s); };
The disassembled code of the main function of this program, obtained using the IDA Pro disassembler, is shown in Listing 3.40.
Listing 3.40: Disassembled code of the main function of the program in Listing 3.39
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 var_2C = qword ptr -2Ch .text:00401000 var_24 = dword ptr -24h .text:00401000 var_20 = qword ptr -20h .text:00401000 var_18 = dword ptr -18h .text:00401000 var_14 = dword ptr -14h .text:00401000 var_10 = qword ptr -10h .text:00401000 var_8 = qword ptr -8 .text:00401000 push ebp .text:0040100 mov ebp, esp .text:00401003 sub esp, 24h .text:00401006 fld ds:dbl_408110 .text:0040100C fstp [ebp + var_8] .text:0040100F fld ds:dbl_408108 .text:00401015 fstp [ebp + var_20] .text:00401018 mov [ebp + var_14], 1Eh .text:0040101F mov [ebp + var_18], 28h .text:00401026 mov eax, [ebp + var_14] .text:00401029 sub eax, 1 .text:0040102C mov ecx, [ebp + var_18] .text:0040102F sub ecx, 1 .text:00401032 imul eax, ecx .text:00401035 mov [ebp + var_24], eax .text:00401038 fild [ebp + var_24] .text:0040103B fld [ebp + var_8] .text:0040103E fsub ds:dbl_408100 .text:00401044 fld [ebp + var_20] .text:00401047 fsub ds:dbl_408100 .text:0040104D fdivp st(1), st .text:0040104F fmulp st(1), st .text:00401051 fst [ebp + var_10] .text:00401054 sub esp, 8 .text:00401057 fstp [esp + 2Ch + var_2C] .text:0040105A push offset unk_4080FC .text:0040105F call _printf .text:00401064 add esp, 0Ch .text:00401067 xor eax, eax .text:00401069 mov esp, ebp .text:0040106B pop ebp .text:0040106C retn .text:0040106C _main endp
For storing local variables, 36 bytes are allocated (sub esp, 24h). This is 4 bytes more than required for five variables. The compiler has allocated the stack memory for storing a temporary variable, although at first glance it might do without it. This is because it is also possible to use the reserves (such as the EDX register) or leave the result in the EAX register (as will be explained later). Microsoft's compiler tries to avoid using the EBX, EDI, and ESI registers for computations because doing so would make it necessary to take steps for recovering these registers in the end of the function.
The start-up initialization commands occupy addresses from 00401006 to 0040101F. As before, for initializing floating-point variables the compiler uses floating-point constants, [6] which are stored in the data segment. In this case, the constant is first loaded into the ST(0) FPU register using the fld command and then into appropriate variable (using the fstp command). Integer variables are initialized by directly loading specific values into them using the mov command.
Next, direct computations start. This stage requires more detailed consideration:
Commands from 00401026 to 0040102F load the k and d variables into registers and further prepare them for multiplication. The preparation consists of subtracting one from them. Thus, the EAX register will contain the k - 1 value, and ECX will contain the d - 1 value. Then it is possible to carry out multiplication. Next, the imul eax, ecx command is executed, and the multiplication result is loaded into the EAx register. In other words, the following operation is executed: (k - 1) * (d - 1) -> EAx. Later, it is necessary to decide where the computation result must be stored. The EAx register at first seems suitable because it appears that this register won't be used in later computations. However, there is a small problem here. The resulting integer value must participate in computations with real numbers. At the same time, the fld command loads values from the memory into FPU stack. Thus, the compiler made a reasonable decision to use a temporary variable for storing an intermediate result (the intermediate result will be stored directly in the stack).
Consider further computations. The result of computing the (k - 1) * (d - 1) expression is loaded onto the top of the FPU stack (into the ST(o) register) using the fld command. Then the fld command moves the current ST(0) value into ST(1) and loads the i variable into ST(0). Next, the fsub d. S: dbl_408100 command (located at the 0040203E address) computes the i - l expression, leaving the result in the ST(o) register. The next fld command loads the j variable into ST(o). As this happens (pay attention!), the previous value of ST(0) is moved into ST(+) and the previous value of ST(1) is moved into ST(2). Thus, ST(2) plays the role of a temporary variable. The next fsub command computes the j - 1 value. Then the fdivp st (1), st command carries out division and pops the stack. As a result, the quotient goes into ST ( 0) and the value in ST(2) moves into ST(1). The fmulpst (1) , st command carries out multiplication and pops the stack, which means that the final result goes into ST(0). The last stroke is carried out by the fst [ebp + var_10] command, which corresponds to ST (o) -> s. Note that the fst command loads the value into the variable without popping the stack.
To load a floating-point value into the stack, a well-known technique encountered earlier is used: The sub esp, 8 command, equivalent to the two PUSH commands, prepares the space for a floating-point variable. Then the fstp command (popping the coprocessor stack) places the result of computations into the stack for further use with the printf function.
Thus, temporary variables are used by the compiler for computations. The role of temporary variables can be delegated to general-purpose registers, FPU registers, and stack variables.
Temporary variables are often used when the result of execution of one function is used in another function (Listing 3.41).
Listing 3.41: Temporary variables when the result of executing a function is used in another
#include <stdio.h> int add(int, int); int sub(int, int); void main() { int i = 10, j = 20; printf("%d\n", add (i, sub(i, j ))) ; }; int add(int a, int b) { return a + b; }; int sub(int a, int b) { return a - b; };
In the program shown in Listing 3.41, the result of the sub function is used in the add function, and the result of the add function, in turn, is used by the printf function. Listing 3.42 shows the fragment of the disassembled code of this program related to temporary variables.
Listing 3.42: Disassembled code of Listing 3.41 for processing intermediate variables
.text:00401014 mov eax, [ebp + var_8] .text:00401017 push eax .text:00401018 mov ecx, [ebp + var_4] .text:0040101B push ecx .text:0040101C call sub_401060 .text:00401021 add esp, 8 .text:00401024 push eax .text:00401025 mov edx, [ebp + var_4] .text:00401028 push edx .text:00401029 call sub_401050 .text:0040102E add esp, 8 .text:00401031 push eax .text:00401032 push offset unk_4060FC .text:00401037 call _printf .text:0040103C add esp, 8
The var_4 and var_8 variables correspond to the i and j variables in the program source code. First, the sub_401060 (sub) function is called. As should be expected, the result of this function is loaded into the EAX register. Later, the EAX register is used as a variable, which is then used as a parameter when calling the add function (sub_401050). Similarly, the result of the add function is loaded into the EAX register and used as a parameter when calling the printf function.
The C programming language makes provision for the register type of variables. Initially, it was assumed that variables declared as register must be stored in registers whenever possible. Contemporary compilers ignore this keyword (although it is considered valid for compatibility). Nowadays, compilers act as they consider expedient, according to the specified optimization options. Consider the example program shown in Listing 3.43. Compile this program using the Microsoft Visual C++ compiler, with the "create compact code" option.
Listing 3.43: Example program illustrating the use of register variables
#include <stdio.h> void main() { int i, j, s; i = 0; j = 1; s = 0; for(i = 0; i < 100; i++, j++) s = s + j; printf("%d %d %d \n", i, j, s); };
The disassembled code of this program is shown in Listing 3.44.
Listing 3.44: Disassembled code of the program shown in Listing 3.43
.text:00401000 _main proc near ; CODE XREF: start + 16E↓p .text:00401000 xor eax, eax .text:00401002 push 64h .text:00401004 inc eax .text:00401005 xor ecx, ecx .text:00401007 pop edx .text:00401008 loc_401008: ; CODE XREF: _main + C↓j .text:00401008 add ecx, eax .text:0040100A inc eax .text:00401005 dec edx .text:0040100C jnz short loc_401008 .text:0040100E push ecx .text:0040100F push eax .text:00401010 push 64h .text:00401012 push offset aDDD ; "%d d %d \n" .text:00401017 call _printf .text:0040101C add esp, 10h .text:0040101F xor eax, eax .text:00401021 retn .text:00401021_main endp
Note that although three local variables are defined in the source program, the stack is not used for storing variables in the resulting code. This is exactly the case, in which the compiler has used registers for storing variables. Also note that for code size minimization, the compiler didn't insert a prologue and an epilogue into the main function.
Thus, the ECX register is used for storing the s variable (the xor ecx, ecx command corresponds to s = 0). The xor eax, eax/... /inc eax commands relate to the j variable. As relates to the i variable, the compiler has introduced an interesting modification to reduce the code size. Instead of increasing a value of some variable and comparing it with 100, some variable is first assigned the value of 100 and after each iteration the value of this variable is decremented and compared with 0. This approach is easier and faster. The role of this register variable is delegated to the EDX register.
Finally, because in the end of the loop there is no variable that would contain the value of 100 (as there should be, according to the source code of the program), the number 100 is simply pushed into the stack using the push 64h command.
[1]The same relates to the C programming language because this program doesn't use any capabilities introduced with the arrival of the C++ language. However, I won't concentrate on these minor details. I always mean the two most widely used C++ compilers, namely, Microsoft Visual C++ and Borland C++.
[2]Start-up initialization of the s variable doesn't make sense because the initial value of s is not used anywhere.
[3]A typical complication is deciding what to do if, for example, information about the string length doesn't correspond to the location of the terminator.
[4]Here and further on, I use the Delphi compiler supplied as part of Borland Delphi 7.0.
[5]When optimizing a program for maximum operating speed, even this trick won't help!
[6]In the C++ language, constants stored in the data segment and having, like variables, strictly defined types, are called type safe constants. Constants used only directly in the program code are called literal constants.