Overview of the PE File Format for CLI Files


The PE file format is a typical loader format for an operating system. It happens to be the one used for the DOS and Windows operating systems, but it was derived from the DEC VAX/VMS COFF (Common Object File Format) file format. The Microsoft PE file format specification is reprinted in its entirety in the appendix of this book.

Although some operating systems use this as their native file format, from the point of view of the standard the thing that is important is that this is the portable transfer format for programs. As of this writing, implementations that run on platforms with an alternative file format work by requiring a special program to be run that knows how to read this portable file format and execute its contents. An alternative mechanism that would work equally well would be to write a translator from the portable format to the native format. At this time, we know of no implementations that have used that strategy.

PE files already had an extension mechanism built in, in the form of directory entries. Managed code is distinguished by having its own directory entry. The directory entries are listed in Partition II, section 24.2.3.3; and the managed code entry, CLI Header, is described in section 24.3.3. If the CLI Header entry is present and is not zero, managed code is considered to be included. The CLI header contains the location of the metadata (among other things). Another important element of a file containing managed code is that it must always reference the file mscoree.dll.

Partition II, section 24, specifies in detail the fields that must be filled in, and the values that must be entered. There are a number of fields whose values are specified as 0, to be ignored, etc. They may have meaning to the underlying loader, but they are not relevant to managed code. Therefore, to ensure portability, the suggested value should be written to the file.

In a PE file with unmanaged code, there will be code segments and data segments, and the code segments will consist of native machine instructions. In a PE file for CLI managed code, however, the code sections will contain CIL (Common Intermediate Language) code. Interspersed with this code, for each method, there will be a few bytes of information about the method. This information is referred to as method headers. Method headers typically contain the number of local variables and the exception handlers for the method.

The metadata section stores all of the information needed by compilers and most program analysis tools. It is contained in one contiguous area of memory to make it easy for those tools to find and use it. The information in the method headers is not relevant to such tools but is needed by the runtime. This information is never needed by the class loader, but the information in the metadata is. Typically only interpreters or JIT compilers need the information in the method headers. It makes it easier for these tools that the information they need can be found in the code stream along with what is to be compiled. The information in the method headers is described in Partition III, section 1.

When a PE file is loaded into memory, information in the PE file header describes how it should be modified from its on-disk format to an in-memory format. For each section in the PE file, an entry in the header states where it should be loaded relative to the 0th byte of the file. This is the section's RVA (Relative Virtual Address). The entire file is then loaded into memory starting at some physical address, which becomes the physical address of file address 0. However, nothing in the metadata or managed code ever references physical addresses.

There is an on-disk byte number, and enough information to tell you how it will be relocated when loaded into memory. But all memory locations are expressed with RVAs, which are relative to the start of the file as it would be if loaded into memory.

At the end of this chapter is an annotated dump of a tiny CLI PE file, associating all of the fields and values with their file offsets and RVAs. For anyone attempting to write a valid PE file, it will be very helpful.



The Common Language Infrastructure Annotated Standard (Microsoft. NET Development Series)
The Common Language Infrastructure Annotated Standard (Microsoft. NET Development Series)
ISBN: N/A
EAN: N/A
Year: 2002
Pages: 121

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net