1.5. Structure of the Portable Executable Module

The main goal of this section is to describe the structure of a PE module, a type of executable (EXE) module. Because the main goal of any investigator is studying executable modules, it is necessary to know their structure. This information is of special importance, because this structure is typical not only for executable files but also for DLLs, object modules (OBJ files), and drivers.

1.5.1. General Approach

The PE format was introduced in the UNIX operating system, where its analogue is known as the common object file format (COFF). Microsoft revised this format by introducing considerable modifications to it. Nowadays, it is widely used. As already mentioned, this format is used not only for executable modules but also for DLLs, as well as for kernel-mode drivers. The most interesting issue is that the PE standard also covers OBJ files. You main goal is to master the PE format at such a level that you can understand its structure and use this knowledge in practice.

The main feature of any PE module is the simplicity of loading it into the memory. No additional tuning is needed for this purpose. In essence, a PE module is a snapshot of a main memory region.

Fig. 1.7 shows the general design of the PE format. The first section (in Fig. 1.7, it is shown on the top) deserves the closest attention. Here, the developers have ensured backward compatibility to the MS-DOS operating system. To gain sound understanding of the operating mechanisms of the PE format, it is necessary to consider this section in detail. Thus, any executable module starts with the DOS section, which is necessary when the program is started in the MS-DOS environment. The first 2 bytes (MZ) represent the signature that confirms that you are dealing with an MS-DOS executable module. The MZ signature is the initials of the Microsoft programmer, Mark Zbikowski, who developed the structure of MS-DOS executable modules. If you start a PE program under MS-DOS, the loader of this operating system would read this signature, recognize the module as a normal MS-DOS program, and start it for execution in a normal way. This is so because the MZ signature in a correct PE module is followed by the MS-DOS header, which, in turn, is followed by a small stub procedure. This stub usually displays a text screen informing the user that the current program cannot be executed under the MS-DOS operating system, after which it terminates the operation. The standard stub is shown in Listing 1.28.

Listing 1.28: The standard MS-DOS stub

 PUSH CS ; Data register matches the code register. POP DS MOV DX, OFFSET MSG MOV AH, 9 ; Output the MSG text string. INT 21H MOV AX, 4C01H ; Exit the program with code 1. INT 21H MSG DB ' This program cannot be run in DOS mode $'

image from book
Figure 1.7: The PE file structure

This code might be different. However, it doesn't matter much, because pure MS-DOS can no longer be encountered. Therefore, this stub never gains control. The most convenient way of parsing the MZ header is to study the IMAGE_DOS_HEADER structure that can be found in the winnt.h ^[7] file. This structure is shown in Listing 1.29.

Listing 1.29: The IMAGE_DOS_HEADER structure

 struct IMAGE_DOS_HEADER {    // DOS EXE header        WORD   e_magic;       // Magic number        WORD   e_cblp;        // Bytes on the last page of the file        WORD   e_cp;          // Pages in the file        WORD   e_crlc;        // Relocations        WORD   e_cparhdr;     // Size of the header in paragraphs        WORD   e_minalloc;    // Minimum extra paragraphs needed        WORD   e_maxalloc;    // Maximum extra paragraphs needed        WORD   e_ss;          // Initial (relative) SS value        WORD   e_sp;          // Initial SP value        WORD   e_csum;        // Checksum        WORD   e_ip;          // Initial IP value        WORD   e_cs;          // Initial (relative) CS value        WORD   e_lfarlc;      // File address of the relocation table        WORD   e_ovno;        // Overlay number        WORD   e_res[4];      // Reserved words        WORD   e_oemid;       // OEM identifier (for e_oeminfo)        WORD   e_oeminfo;     // OEM information (e_oemid specific)        WORD   e_res2[10]     // Reserved words        LONG   e_lfanew;      // File address of the new EXE header    }

Only three fields of this structure are of interest from the standpoint of parsing the MZ header. The e_magic field represents the MZ signature. The e_lfarlc field (located at the 18H offset from the start of the file) was initially intended for storing the address of the relocation table. The relocation table was used by the MS-DOS loader to configure relative addresses used within a program. If this field contains the 40H byte, then this file is a PE module. ^[8] Apparently, however, Windows doesn't check the contents of this field; consequently, it is not expedient to consider that the field value equal to 40H is a sure indication of a PE value. Finally, the e_lfanew field contains the relative address (at the offset counted from the start of the file), from which the PE header starts (see Fig. 1.7). This address must contain the PE module signature, the P and E characters, respectively.

Listing 1.30 shows a simple program you can use to determine whether this file is a loadable PE module. The name of the module to be checked must be specified in the command line.

Listing 1.30: A simple program for determining whether this file is a loadable PE module

 #include <windows.h> #include <stdio.h> HANDLE openf(char *) ; HANDLE hf; IMAGE_DOS_HEADER id; IMAGE_NT_HEADERS iw; // The main function int main(int argc, char* argv[]) {         DWORD n;         int er = 0;         LARGE_INTEGER 1; // Check whether parameters are present.         if(argc < 2){printf("No parameters!\n"); er = 1; goto _exit;}; // File name is the first in the list.         if((hf = openf(argv[l])) == INVALID_HANDLE_VALUE)         {                 printf("No file!\n");                 er = 2;                 goto _exit;}; // Determine the file length.                 GetFileSizeEx(hf, &1); // Read the MS-DOS header.                 if(!ReadFile(hf, &id, sizeof(id), &n, NULL))         {                 printf("Read DOS_HEADER error 1!\n");                 er = 3;                 goto _exit;};         if(n < sizeof(id))         {                 printf("Read DOS_HEADER error 2!\n");                 er = 4;                 goto _exit;}; // Check the MS-DOS signature ('MZ').         if(id.e_magic != IMAGE_DOS_SIGNATURE)         {                 printf("No DOS signature!\n");                 er = 5;                 goto _exit;}         printf("DOS signature is OK!\n");         if(id.e_lfanew > l.QuadPart)         {                 printf("No NT signature!\n");                 er = 6;                 goto _exit;}; // Move the pointer.         SetFilePointer(hf, id.e_lfanew, NULL, FILE_BEGIN); // Read the NT header.         if(!ReadFile(hf, &iw, sizeof(iw), &n, NULL))         {                 printf("Read NT_HEADER error 1!\n");                 er = 7;                 goto _exit;};         if(n < sizeof(iw))         {                 printf("Read NT_HEADER error 2!\n");                 er = 8;                 goto _exit;}; // Check the NT signature ('PE').         if(iw.Signature != IMAGE_NT_SIGNATURE)          {                 printf("No NT signature!\n");                 er = 9;                 goto _exit;}         printf("NT signature is OK!\n"); // Close the file descriptor. _exit:         CloseHandle(hf);         return er; }; // Function opens the file for reading. HANDLE openf(char *nf) {         return CreateFile(nf,                 GENERIC_READ,                 FILE_SHARE_WRITE | FILE_SHARE_READ,                 NULL,                 OPEN_EXISTING,                 NULL,                 NULL); };

Thus, you have become acquainted with the IMAGE_DOS_HEADER structure. The IMAGE_NT_HEADERS structure that represents the PE header will be covered in later sections. This structure is defined in the windows.h file. Accordingly, the IMAGE_DOS_SIGNATURE and IMAGE_NT_SIGNATURE constants defining the MZ (5A4Dh) and PE (4550h) signatures are also contained in this header file.

Naturally, the program in Listing 1.30 cannot guarantee that you are or are not dealing with the correct PE header. To achieve this, more detailed analysis of the PE header will be required.

In Appendix 1, the example program analyzes the PE header in more detail. This program was written on the basis of the example presented in Listing 1.30. In addition to the analysis of the file headers, this program displays the contents of the import, export, and resource sections.

1.5.2. The Portable Executable Header

Now, consider the PE header. As already mentioned, this header is in the form of the IMAGE_NT_HEADERS structure (Listing 1.31).

Listing 1.31: The IMAGE_NT_HEADERS structure

 struct IMAGE_NT_HEADERS {     DWORD Signature;     IMAGE_FILE_HEADER FileHeader;     IMAGE_OPTIONAL_HEADER32 OptionalHeader; }

As you can see, this structure is made up of two parts, IMAGE_FILE_HEADER and IMAGE_OPTIONAL_HEADER32. It also contains the Signature field, which is PE. Consider the IMAGE_FILE_HEADER structure, also known as the main header (Listing 1.32).

Listing 1.32: The IMAGE_FILE_HEADER structure

 Struct IMAGE_FILE_HEADER {        WORD    Machine;        WORD    NumberOfSections;        DWORD   TimeDateStamp;        DWORD   PointerToSymbolTable;        DWORD   NumberOfSymbols;        WORD    SizeOfOptionalHeader;        WORD    Characteristics; }

The fields of this structure are briefly outlined as follows:

Machine — This is the type of processor. For Intel i80x86 processors, this value is 014ch.
NumberOfSections — This shows the number of sections in the PE module.
TimeDateStamp — This gives the date and time of the file creation.
PointerToSymbolTable — This field is used for debugging. As a rule, its value is zero.
NumberOfSymbols — This field is used for debugging. As a rule, its value is zero.
SizeOfOptionalHeader — This shows the size of the second part of the PE header (see the description of the IMAGE_OPTIONAL_HEADER32 structure). As a rule, this value is 224 bytes.
Characteristics — This field contains informational bits (flags). In particular, bit 13 specifies whether this module is a DLL (0) or an EXE module (1).

Now, consider the second part of the PE header — an optional header (IMAGE_OPTIONAL_HEADER32). The fields of this header are shown in Listing 1.33.

Listing 1.33: The IMAGE_OPTIONAL_HEADER32 structure

 struct IMAGE_OPTIONAL_HEADER {        WORD    Magic;        BYTE    MajorLinkerVersion;        BYTE    MinorLinkerVersion;        DWORD   SizeOfCode;        DWORD   SizeOflnitializedData;        DWORD   SizeOfUninitializedData;        DWORD   AddressOfEntryPoint;        DWORD   BaseOfCode;        DWORD   BaseOfData;        DWORD   ImageBase;        DWORD   SectionAlignment;        DWORD   FileAlignment;        WORD    MajorOperatingSystemVersion;        WORD    MinorOperatingSystemVersion;        WORD    MajorlmageVersion;        WORD    MinorlmageVersion;        WORD    MajorSubsystemVersion;        WORD    MinorSubsystemVersion;        DWORD   Win32VersionValue;        DWORD   SizeOfImage;        DWORD   SizeOfHeaders;        DWORD   CheckSum;        WORD    Subsystem;        WORD    DllCharacteristics;        DWORD   SizeOfStackReserve;        DWORD   SizeOfStackCommit;        DWORD   SizeOfHeapReserve;        DWORD   SizeOfHeapCommit;        DWORD   LoaderFlags;        DWORD   NumberOfRvaAndSizes; IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES]; }

The fields of this structure are as follows:

Magic — This field defines the main intention of this module. In particular, for a normal executable file this field is 010BH.
MajorLinkerVersion — This is the major version number of the linker used for building this file.
MinorLinkerVersion — This is the minor version number of the linker used to create this file.
SizeOfCode — This field specifies the size (in bytes) of the executable code contained in the file.
SizeOf InitializedData — This is the size of the initialized data section.
SizeOfUninitializedData — This is the size of the uninitialized data section.
AddressOfEntryPoint — This shows the relative virtual address, the address in the virtual address space of the executable module, of the instruction, from which the program execution starts. Accordingly, if the relative address, from which the module starts execution, is 1000H and the module will load at the 400000H address (see the ImageBase field), then the point, from which the program starts execution, will be located at the 401000H address.
BaseOfCode — This gives the relative virtual address of the first program section.
BaseOfData — This gives the relative virtual address, from which the first data section starts. Usually, data sections start after the executable code sections.
ImageBase — This gives the virtual address (not a relative address), from which the module will be loaded. If the loader places this module so that it starts exactly from this address, it won't need to correct addresses further and the loading process will be fast. If the loader cannot place the module at this address, then additional address tuning will be required. For executable modules, this value is usually equal to 400000H.
SectionAlignment — This value defines section alignment in memory. All sections in memory must start from values that are multiples of this value.
FileAlignment — This value defines section alignment within a file. All sections in the file must start from the address that is a multiple of this value.
MajorOperatingSystemVersion — This is the most significant number of the Win32 subsystem required to start the program.
MinorOperatingSystemVersion — This is least significant number of the Win32 subsystem required to start the program.
MajorimageVersion — This is the major version number specified at linking time (the most significant part of n). For link.exe, the command-line option specifying this number must appear as follows: /version:n.m.
MinorimageVersion — This is a minor version number specified at compile time (least significant part of m).
MajorSubsystemVersion, MinorSubsystemVersion — These are the most significant and least significant numbers of the subsystem versions. These fields typically are not used.
Win32VersionValue — Although the name of this field is meaningful, most articles related to various issues with PE headers state that its value must be zero.
SizeOfImage — This gives the total size of the PE header (headers and sections) in memory, aligned by SectionAlignment.
SizeOfHeaders — This gives the size of all headers plus the size of the sections table.
Checksum — This is a checksum of the file. For executable modules, this value is zero.
Subsystem — This field specifies, for which subsystem a given module is intended. The values of this field are as follows: 0000H for unknown subsystem, 0001H for device driver, 0002H for Windows GUI, 0003H for console application, 0005H for OS/2, and 0007H for Posix.
D1lCharacteristics — This field fell out of use starting from Windows NT 3.5.
SizeOfStackReserve — This field specifies the required amount of stack memory.
SizeOfStackCommit — This gives the amount of memory allocated for the stack.
SizeOfHeapReserve — This is the amount of memory required for the local heap.
SizeOfHeapCommit — This is the amount of memory allocated for the local heap.
LoaderFlags — Starting from Windows NT 3.5, this field is out of use.
NumberOfRvaAndSizes — This field is reserved for further extensions of the format (the size of array containing some structures). As a rule, this field is set to 10H.
DataDirectory — This is an array of structures (Listing 1.34). For the moment, the IMAGE_NUMBEROF_DIRECTORY_ENTRIES value is 16. Each structure is made up of two elements, each element being 4 bytes in size. Only the first 12 structures are used. The first element of the structure describes the data location (relative virtual address), and the second element specifies the data size. Array elements are as follows:
- 0 — Table of exported functions
- 1 — Table of imported functions
- 2 — Resource table
- 3 — Table of exceptions
- 4 — Security table
- 5 — Sections table
- 6 — Debug table
- 7 — Description strings
- 8 — Operating speed of the computer, measured in million instructions per second (MIPS)
- 9 — Thread local storage (TLS)
- 10 — Configuration table area
- 11 — Table of import addresses
Listing 1.34: The IMAGE_DATA_DIRECTORY structure
```
 struct IMAGE_DATA_DIRECTORY { DWORD   VirtualAddress; DWORD   Size; } 
```

1.5.3. Sections Table

The sections table comes immediately after the optional PE header. It is possible to compare the value of the SizeOfOptionalHeader field (see the IMAGE_FILE_HEADER STRUCTURE) to the sizeof (IMAGE_NT_HEADERS) - sizeof (IMAGE_FILE_HEADER) - 4 value. It is then possible to access the following address counting from the start of the file: e_lfanew + sizeof (IMAGE_NT_HEADERS).

The sections table is made up of structures, each 40 bytes in size. The number of sections is taken from the NumberOfSections field (see the IMAGE_FILE_HEADER structure from Listing 1.32). Thus, obtaining the list of sections is a trivial task. Listing 1.35 shows the structure that is an element of the sections table.

Listing 1.35: An elementary structure that makes up a typical element of the sections table

 struct IMAGE_SECTION_HEADER {     BYTE    Name[IMAGE_SIZEOF_SHORT_NAME];     union {             DWORD   PhysicalAddress;             DWORD   VirtualSize;     } Misc;     DWORD   VirtualAddress;     DWORD   SizeOfRawData;     DWORD   PointerToRawData;     DWORD   PointerToRelocations;     DWORD   PointerToLinenumbers;     WORD    NumberOfRelocations;     WORD    NumberOfLinenumbers;     DWORD   Characteristics; }

Consider the fields of this structure:

Name — The section name. The IMAGE_SIZEOF_SHORT_NAME value equals 8. If the number of symbols in the name is less than 8, then the remaining bits are filled with zeros.
VirtualSize — The memory amount required for the section.
VirtualAddress — The relative virtual address, at which the loader must download the section.
SizeOfRawData — The size of the virtual section aligned according to the value of the FileAlignment field to the nearest greater value (see the IMAGE_OPTIONAL_HEADER structure in Listing 1.33).
PointerToRawData — The offset within a file, at which this section is located.
PointerToRelocations, PointerToLinenumbers, NumberOfRelocations, and NumberOfLinenumbers — Fields used in OBJ files; they won't be considered here.
Characteristics — The flags that characterize this section (Table 1.29).

Table 1.29: Flags that characterize a section
Value	Description
00000020H	This section contains the program code.
00000040H	This section contains initialized data.
00000080H	This section contains uninitialized data.
00000200H	This section is used by the compiler.
00000800H	This section is used by the compiler.
04000000H	This section cannot be cached.
08000000H	This section has no paged organization.
10000000H	This is a shared section.
20000000H	This is an executable section.
40000000H	This is a read-only section.
80000000H	This is a writable section.

The names and purposes of the sections can differ at the compiler's discretion.

Note

You can create custom sections and assign custom names to them. For example, you can write an Assembly program and assign an arbitrary name to the section or segment, in which the executable code would reside. The program would operate as normal; however, some debuggers and disassemblers would be confused because the entry point to the program is located in a section with a name unknown to them.

Here is an incomplete list of sections created by compilers from Microsoft and Borland.

.text — This section contains executable code (Microsoft).
CODE — This section contains executable code (Borland).
.DATA — This section contains uninitialized global variables (Microsoft).
DATA — This section contains uninitialized global variables (Borland).
bss — All data in this section are uninitialized. The section size within a file is zero.
.CRT — This is another section for initialized data (Microsoft).
CRT — This is the data section (Borland).
.rdata — This section contains read-only data (constants and debug information).
.rsrc — This section contains information about resources.
.edata — This section contains information about exported functions.
.idata — This section contains information about imported functions.
.reloc — This is the settings table. Information contained here might be needed for the Windows loader if for some reason it will have to load the module at an address other than the address specified in the PE header. The table contains the relative addresses of those memory cells that contain the addresses used in the program, the values of which might be modified during the loading. This table is also called the relocation table. More detailed information about investigation of the relocation table can be found in Section 2.1.1.
.icode — This jumps to the import function of older versions of tlink32.exe.
.debug — This section contains debug information.

Thus, using the relocation table, you'll be able to compute the position of the section in a file, as well as its size. After you achieve this, you'll be able to view the information stored in this section, obtain its listing, or even try to disassemble executable code.

Special attention should be drawn to import and export tables and to the section containing resource information. These issues will be covered in the next few sections. However, before proceeding with this investigation, it is necessary to clarify how the PE image appears in the virtual memory. It is different from a copy of the PE module. The simplified algorithm used for loading the module appears as follows:

All headers, including the DOS header, PE header (IMAGE_NT_HEADERS), and sections table, are loaded into the memory.
Sections start to be loaded into the memory. Still, their relative virtual addresses must be aligned according to the value of the SectionAlignment field (see the description of the IMAGE_OPTIONAL_HEADER structure).

What conclusions can be drawn from this information? First, it is necessary to understand how the offset of a specific object within a file can be determined by its virtual address. This important issue is related to the import and export table. In general, the algorithm for obtaining the offset is as follows:

The section where the given object resides is determined by the virtual address.
On the basis of the sections table, the section offset within the PE file is determined.
The offset of the object within a section is determined.
The offset of the object can be obtained by adding the section offset within a file and the object offset within the section.

An example C++ function for determining the offset within the PE file by the relative virtual address is presented in Listing 1.36. Accordingly, it is assumed that the iw = IMAGE_NT_HEADERS global structure is read beforehand and the ais global array made up of the IMAGE_SECTION_HEADER structures (see Listing 1.35) is filled. The vsm input parameter is the relative virtual address of the required object. The function would return the offset of that object within the PE file.

Listing 1.36: The C++ function for determining the object offset in the PE file by its relative virtual address

 DWORD getoffs(DWORD vsm) {         DWORD fi = 0;         if(vsm < ais[0].VirtualAddress)return fi;         for(int i = 0; I < iw.FileHeader.NumberOfSections; i++)         {                 if(vsm < ais[i].VirtualAddress && i > 0){                 fi = ais[i - 1].PointerToRawData +                                        (vsm - ais[i -1].VirtualAddress);                         break;};         };         if(i = iw.FileHeader.NumberOfSections)          fi = ais[i - 1].PointerToRawData +  (vsm - ais[i - 1].VirtualAddress);         return fi; j;

1.5.4. Import Table

It should be pointed out that if you want to find the import section by searching the .idata name in the sections table, you'll fail. Linkers (at least the ones supplied by Microsoft) do not create such a section. Thus, it will be necessary to use the DataDirectory array from the IMAGE_OPTIONAL_HEADER structure (see Listing 1.33). To carry out an elementary investigation of executable modules, it is possible to use the program presented in Appendix 1. You'll immediately notice that in many executable modules there is no .idata section, although the import table is present. If the .idata section is present, then the import table is located there.

Recall that the DataDirectory array is made up of 12 significant elements (the total number of elements is 16). Each element of this array is made up of two fields: VirtualAddress for the virtual address of the object, and Size for the object size (see Listing 1.34). The import table is defined by the second element (index of one). This is the only reliable evidence that allows you to determine the location of the import table. However, this is enough. Recall the considerations at the end of the previous section, and then recall Listing 1.36. Thus, there mustn't be any problems related to finding the import table. Now it only remains to understand its structure.

In the beginning of the import table there is an array of structures, which are shown in Listing 1.37.

Listing 1.37: The array of structures in the beginning of the import table

 struct IMAGE_IMPORT_DESCRIPTOR { union {         DWORD   Characteristics;         DWORD   OriginalFirstThunk;       };         DWORD   TimeDateStamp;         DWORD   ForwarderChain;         DWORD   Name;         DWORD   FirstThunk; }

This array is terminated by the element with zero fields. It is necessary to point out again that at least two fields must be checked for zero values, for example, Characteristics and Name. Now consider the fields in Listing 1.37:

Characteristics — Relative virtual address of another array containing relative virtual addresses of imported functions.
TimeDateStamp — Date and time of the file or DLL creation, or zero.
ForwarderChain — Usually 0FFFFFFFFh.
Name — Address of the ASCII string containing the name of the import library (a DLL). Thus, every element of an array corresponds to its DLL.
FirstThunk — Relative virtual address of an array containing addresses of the names of imported functions. This is a second copy of the array pointed at by the Characteristics field. If the Characteristics field is zero (this is typical for some compilers other than that supplied by Microsoft), then it is necessary to check the FirstThunk field, which points at the second copy of the array.

Note

Hopefully, you understand that in this case you are dealing with DLLs implicitly related to the executable module, not to those that are loaded during the call to the LoadLibrary API function.

Now, consider arrays pointed at by the Characteristics and FirstThunk fields. It is necessary to point out again that these are two different arrays, although their elements point at the same names of imported functions. These arrays are made up of the structures presented in Listing 1.38.

Listing 1.38: The IMAGE_THUNK_DATA32 structure

 struct IMAGE_THUNK_DATA32 { union {         DWORD ForwarderString;         DWORD Function;         DWORD Ordinal;         DWORD AddressOfData;         } ul; }

As you can see, the IMAGE_THUNK_DATA32 structure, in essence, is made up of a single field; however, it is in four different forms. This field specifies the relative virtual address of the name of the imported function for a given DLL. If the most significant word of the field equals 8000H, then the least significant word contains the ordinal number of the imported function (export by ordinal). The array must be terminated by a double word set to zero.

Finally, it is necessary to consider the structure of the imported function name. Without diving deep into details, note that the function name is a simple ASCII string terminated by zero. However, this name starts at the address specified by the IMAGE_THUNK_DATA32 structure plus 2 bytes. The preceding 2 bytes contain the ordinal number for the given imported function from the given DLL.

The array pointed at by the FirstThunk field from the IMAGE_IMPORT_DESCRIPTOR structure (see Listing 1.37) deserves special attention. The CALL commands, which call imported functions, point at the elements of this array directly (for example, CALL DWORD PTR [address] or as follows:MOV ESI, address/CALL ESI) or by calling the stub (JMP DWORD PTR [address]). When the module loads, the loader determines the actual addresses of functions in the memory by their names or ordinals and then places these addresses into this array. The array pointed at by the Characteristics field doesn't change in the course of loading. A detailed example illustrating the procedure of searching for the name of imported function will be provided in Section 1.6.1.

1.5.5. Export Table

The export table is necessary for DLLs to ensure that the application can correctly call the functions provided by the DLLs. As with the import table, to investigate the export table it is necessary to use the DataDirectory array from the IMAGE_NT_HEADERS structure, because the .edata section might be missing from the executable module. In this case, you'll need the first element of the array (index of zero).

The IMAGE_EXPORT_DIRECTORY structure is located at the specified address. This structure contains all information required for investigating exported functions (Listing 1.39).

Listing 1.39: The IMAGE_EXPORT_DIRECTORY structure

 struct IMAGE_EXPORT_DIRECTORY {     DWORD   Characteristics;     DWORD   TimeDateStamp;     WORD    MajorVersion;     WORD    MinorVersion;     DWORD   Name;     DWORD   Base;     DWORD   NumberOfFunctions;     DWORD   NumberOfNarnes;     DWORD   AddressOfFunctions;     DWORD   AddressOfNames;     DWORD   AddressOfNameOrdinals; }

Consider the fields of the IMAGE_EXPORT_DIRECTORY structure:

Characteristics — This field is reserved. To all appearances, it is always set to zero.
TimeDateStamp — This shows the data and time of creation of export data, or zero.
MajorVersion — This gives the major part of the export table version. It usually is zero.
MinorVersion — This gives the minor version of the export table version. It usually is zero.
Name — This is the name of exporting module. In principle, it must not match the file name.
Base — This is the ordinal number of the exported function. Exported functions, besides the name, have an ordinal number, by which they also can be accessed.
NumberOfFunctions — This shows number of elements in the array of addresses of exported functions.
NumberOfNames — This gives number of elements in the array of the exported function names.
AddressOfFunctions — This is a relative virtual address of the array of virtual addresses of exported functions.
AddressOfNames — This is the relative virtual address of the array, where relative virtual addresses of exported functions are contained.
AddressOfNameOrdinals — This is the relative virtual address of the 16-bit array (ordinals array), containing index values for the array of exported functions. To obtain the function ordinal, it is necessary to add the value of the Base field to the index value.

To gain a proper understanding of the mechanisms of obtaining information about exported functions, it is necessary to understand the relationships among the following three arrays: array of function addresses, names array, and ordinals array. The ordinals array is a link between the first two arrays. The number of elements in the names array equals the number of elements in the ordinals array. Thus, to obtain the function address by its name, it is necessary to complete the following steps:

Find the function in the names array by the function name.
Obtain the index, by which the required name can be found in the names array, and then find the element with this value of the index in the ordinals array.
Take the value of the element found in the ordinals array. This value will serve as an index for the array of function addresses. After that, it is enough to access the array of function addresses and obtain the required address.

Analyze the program presented in Appendix 1 to understand how to work with the export table. Experiment with locating the export table for different programs and DLL files.

1.5.6. Resource Section

As in previous cases, to obtain the resource block, it is necessary to use the DataDirectory array from the IMAGE_NT_HEADERS structure. You'll need the array element with an index equal to two. In contrast to the previously-considered objects of a PE module, the resource section has a hierarchical tree structure. In practice, four levels of this structure are used. In addition to this, all addresses used within the resource section are counted from the start of the resource section (in other words, these are not relative virtual addresses). This is natural, because resources are loaded into the memory as they are accessed, not during the loading of the module.

In essence, to understand the structure of resources, only the two structures shown in Listings 1.40 and 1.41 will be needed.

Listing 1.40: The IMAGE_RESOURCE_DIRECTORY structure

 struct IMAGE_RESOURCE_DIRECTORY {     DWORD   Characteristics;     DWORD   TimeDateStamp;     WORD    MajorVersion;     WORD    MinorVersion;     WORD    NumberOfNamedEntries;     WORD    NumberOfIdEntries; }

Listing 1.41: The IMAGE_RESOURCE_DIRECTORY_ENTRY structure

 struct IMAGE_RESOURCE_DIRECTORY_ENTRY {     ULONG   Name;     ULONG   OffsetToData; }

Consider the fields of the IMAGE_RESOURCE_DIRECTORY structure:

Characteristics — This is the flags field, which, to all appearances, is not used nowadays.
TimeDateStamp — This field specifies the data and time of resource creation.
MajorVersion and MinorVersion — These fields specify the major and minor parts of the resource version. They are practically useless.
NumberOfNamedEntries — This is the total number of named resources.
NumberOfIdEntries — This field gives the total number of resources specified by resource identifiers.

The fields of the IMAGE_RESOURCE_DIRECTORY_ENTRY structure are as follows:

Name — This field might be interpreted differently, depending on the level and on the value of the most significant bit. All of these cases will be considered in the sections that follow.
OffsetToData — This field specifies the address computed in relation to the start of the resource section. The objects that can be pointed at by this address will be considered separately.

Thus, by going to the address specified in the second element (index of two) of the DataDirectory array, you'll access the realm of resources. This is where the first hierarchical level starts. It is necessary to point out that if the value of the address is zero, this might mean only that the resource block is missing.

First Level of the Hierarchy

At the top (the first) level of the resource hierarchy, the IMAGE_RESOURCE_DIRECTORY structure resides (see Listing 1.42). The only field that can provide the possibility of investigating the resources is the NumberOfIdEntries field. At the first level, this field contains the number of resource types stored in the PE header. The NumberOfNamedEntries field doesn't have any meaning at the first level.

Listing 1.42: Fragment of the winuser.h file

 #define RT_CURSOR              1 #define RT_BITMAP              2 #define RT_ICON                3 #define RT_MENU                4 #define RT_DIALOG              5 #define RT_STRING              6 #define RT_FONTDIR             7 #define RT_FONT                8 #define RT_ACCELERATOR         9 #define RT_RCDATA              10 #define RT_MESSAGETABLE        11 #define RT_GROUP_CURSOR        12 #define RT_GROUP_ICON          14 #define RT_VERSION             16 #define RT_DLGINCLUDE          17 #define RT_PLUGPLAY            19 #define RT_VXD                 20 #define RT_ANICURSOR           21 #define RT_ANIICON             22 #define RT_HTML                23 #define RT_MANIFEST            24

What can you achieve if you know the number of resource types? As it turns out, this is the key field, because the IMAGE_RESOURCE_DIRECTORY structure is directly followed by the array of IMAGE_RESOURCE_DIRECTORY_ENTRY structures (see Listing 1.41). Their number equals the value stored in the NumberOfidEntries field, so you'll have no problems reading them one by one. The Name field of the IMAGE_RESOURCE_DIRECTORY_ENTRY structure at the first level contains the resource type identifier. Resource type identifiers can be found in the winuser.h file of the Visual Studio .NET product (Listing 1.42).

Thus, at the first level of the resource hierarchy, it is possible to find out how many types of resources are in the module, and to identify them all.

The OffsetToData fields of all elements point to the IMAGE_RESOURCE_DIRECTORY structures located at the second level of the hierarchy.

Second Level of the Hierarchy

The second level of the hierarchy also starts with the IMAGE_RESOURCE_DIRECTORY structures. The number of such structures equals the number of resource types in the module (see the previous section). In these structures, the following two fields are the most important: NumberOfNamedEntries and NumberOfIdEntries. The first field contains the number of named resources, and the second field gives the number of resources specified by resource identifiers. Thus, at the second level, each IMAGE_RESOURCE_DIRECTORY structure is directly followed by an array of the IMAGE_RESOURCE_DIRECTORY_ENTRY structures. The number of elements in such arrays equals the value of the NumberOfNamedEntries+NumberOfIdEntries field. The fields of the IMAGE_RESOURCE_DIRECTORY_ENTRY structures that make up the array deserve special attention. The Name field must now be interpreted differently. If the most significant bit of this field is set to zero, then the field itself represents the resource identifier. If the most significant bit is set to one, then the other bits must be interpreted as the offset of the name of the given resource relative to the start of the block of resources. The structure of the name is as follows: The starting 2 bytes specify the name length in characters (not in bytes), followed by the name itself in Unicode notation.

Again, consider the OffsetToData field. This field for each IMAGE_RESOURCE_DIRECTORY_ENTRY structure of the second level points at the same structure, except that it belongs to the third level.

Third Level of the Hierarchy

Thus, branching finished at the second level. The array of the IMAGE_RESOURCE_DIRECTORY_ENTRY structures at the third level corresponds to the same structures of the second level. Consider how the fields of these structures should be interpreted at the third level. The Name field now defines the number (the identifier) of the resource description language. All identifiers are defined in the winnt.h file. They start with the LANG_ prefix and won't be listed here. As relates to the OffsetToData field, it again points at the IMAGE_RESOURCE_DIRECTORY_ENTRY structure, except that it belongs to the fourth level.

Fourth Level of the Hierarchy

At the fourth level of the hierarchy, the Name field of the IMAGE_RESOURCE_DIRECTORY_ENTRY structure defines the size of the binary image of the given resource. The address (relative to the start of the resource section, as usual) is the address of the memory area where the binary resource description is located. This address is defined by the OffsetToData field.

At this point, the description of resources is completed. It is only necessary to mention that the program presented in Appendix 1 analyzes only two levels of resources. In most cases, this is enough.

1.5.7. About Debug Information

This description of the PE module structure won't be complete without at least a brief description of the debug information. The program presented in Appendix 1 only informs you of the presence of such information (symbolic table and debug info) and the addresses (offsets), at which this information is located within the module being investigated.

Symbolic Table

The location of the symbolic table can be determined using the FileHeader header. The PointerToSymbolTable field contains the relative virtual address of the symbolic table. If this field is zero, then the symbolic table is missing. What is the symbolic table? The term doesn't reflect the actual meaning. In this case, the term symbol must be interpreted as an identifier of the high-level programming language, such as a variable or a function. The symbolic table contains the following information: the symbolic name (the variable or function name), the relative virtual address of the symbol, the type of the symbol (the variable or function), and its memory class (automatic, register, label, etc.). All this information about the identifier is packed into the IMAGE_SYMBOL structure, the description of which can be found in the winnt.h file.

Debug Information

In essence, the debug info must be interpreted as information about the numbers of the code lines of specific program. This information is stored in the PE module in a location other than the symbolic table. Locating this information is not a trivial task. Achieving this goal requires additional effort. First, it is necessary to locate the IMAGE_DEBUG_DIRECTORY header. It is pointed at by the sixth (index of six) element of the DataDirectory array from the IMAGE_NT_HEADERS structure. If the PE file contains several types of debug info, then there is an individual IMAGE_DEBUG_DIRECTORY structure for each of them. The TYPE field of this structure defines the type of debug info. The types of debug info are defined in the winnt.h file. They are specified in the IMAGE_DEBUG_TYPE_ constants. For example, the value 1 corresponds to the debug info in COFF, while the value 9 (IMAGE_DEBUG_TYPE_BORLAND) corresponds to the Borland debug info. The PointerToRawData field of the IMAGE_DEBUG_DIRECTORY structure must contain the offset of the debug info in COFF counted from the start of the debug info block, if the TYPE field is set to one. At this location, the IMAGE_COFF_SYMBOLS_HEADER must reside. This is the key issue. The structure contains information both about the symbolic table (which earlier was found using different method; see the previous section) and about the table of line numbers. The NumberOfSymbols field must contain the number of the identifiers in the symbolic table. This number will equal the contents of the NumberOfSymbols field in the IMAGE_FILE_HEADER structure (see Listing 1.32). The LvaToFirstSymbol field will contain the offset of the symbolic table counted from the start of the IMAGE_COFF_SYMBOLS_HEADER structure. Thus, you'd access the symbolic table using another, more academic method. Finally, the LvaToFirstLinenumber field contains the offset of the COFF line numbers table counted from the start of the structure.

^[7]All structures used in the PE header are taken from the header files.

^[8]Or this file is an NE module used under Windows 3.1. Such programs are rarely encountered nowadays.