Rebasing Modules | Programming Applications for Microsoft Windows (Microsoft Programming Series)

[Previous] [Next]

Every executable and DLL module has a preferred base address, which identifies the ideal memory address where the module should get mapped into a process's address space. When you build an executable module, the linker sets the module's preferred base address to 0x00400000. For a DLL module, the linker sets a preferred base address of 0x10000000. Using Visual Studio's DumpBin utility (with the /headers switch), you can see an image's preferred base address. Here is an example of using DumpBin to dump its own header information:

 C:\>DUMPBIN /headers dumpbin.exe Microsoft (R) COFF Binary File Dumper Version 6.00.8168 Copyright (C) Microsoft Corp 1992-1998. All rights reserved. Dump of file dumpbin.exe PE signature found File Type: EXECUTABLE IMAGE FILE HEADER VALUES 14C machine (i386) 3 number of sections 3588004A time date stamp Wed Jun 17 10:43:38 1998 0 file pointer to symbol table 0 number of symbols E0 size of optional header 10F characteristics Relocations stripped Executable Line numbers stripped Symbols stripped 32 bit word machine OPTIONAL HEADER VALUES 10B magic # 6.00 linker version 1000 size of code 2000 size of initialized data 0 size of uninitialized data 1320 RVA of entry point 1000 base of code 2000 base of data 400000 image base <-- Module's preferred base address 1000 section alignment 1000 file alignment 4.00 operating system version 0.00 image version 4.00 subsystem version 0 Win32 version 4000 size of image 1000 size of headers 127E2 checksum 3 subsystem (Windows CUI) 0 DLL characteristics 100000 size of stack reserve 1000 size of stack commit

When this executable module is invoked, the operating system loader creates a virtual address for the new process. Then the loader maps the executable module at memory address 0x00400000 and the DLL module at 0x10000000. Why is this preferred base address so important? Let's look at this code:

 int g_x; void Func() { g_x = 5;     // This is the important line. }

When the compiler processes the Func function, the compiler and linker produce machine code that looks something like this:

 MOV [0x00414540], 5

In other words, the compiler and linker have created machine code that is actually hard-coded in the address of the g_x variable: 0x00414540. This address is in the machine code and absolutely identifies the location of the g_x variable in the process's address space. But of course this memory address is correct if and only if the executable module loads at its preferred base address: 0x00400000.

What if we had the exact same code in a DLL module? In that case, the compiler and linker would have generated machine code that looks something like this:

 MOV [0x10014540], 5

Again, notice that the virtual memory address for the DLL's g_x variable is hard-coded in the DLL file's image on the disk drive. And again, this memory address is absolutely correct as long as the DLL does in fact load at its preferred base address.

OK, now let's say that you're designing an application that requires two DLLs. By default, the linker sets the .exe module's preferred base address to 0x00400000 and the linker sets the preferred base address for both DLLs to 0x10000000. If you attempt to run the .exe, the loader creates the virtual address space and maps the .exe module at the 0x00400000 memory address. Then the loader maps the first DLL to the 0x10000000 memory address. But now, when the loader attempts to map the second DLL into the process's address space, it can't possibly map it at the module's preferred base address. It must relocate the DLL module, placing it somewhere else.

Relocating an executable (or DLL) module is an absolutely horrible process, and you should take measures to avoid it. Let's see why. Suppose that the loader relocates the second DLL to address 0x20000000. In that case, the code that changes the g_x variable to 5 should be:

 MOV [0x20014540], 5

But the code in the file's image looks like this:

 MOV [0x10014540], 5

If the code from the file's image is allowed to execute, some 4-byte value in the first DLL module will be overwritten with the value 5. This can't possibly be allowed. The loader must somehow fix this code. When the linker builds your module, it embeds a relocation section in the resulting file. This section contains a list of byte offsets. Each byte offset identifies a memory address used by a machine code instruction. If the loader can map a module at its preferred base address, the module's relocation section is never accessed by the system. This is certainly what we want—you never want the relocation section to be used.

If, on the other hand, the module cannot be mapped at its preferred base address, the loader opens the module's relocation section and iterates though all the entries. For each entry found, the loader goes to the page of storage that contains the machine code instruction to be modified. It then grabs the memory address that the machine instruction is currently using and adds to the address the difference between the module's preferred base address and the address where the module actually got mapped.

So, in the example above, the second DLL was mapped at 0x20000000 but its preferred base address is 0x10000000. This yields a difference of 0x10000000, which is then added to the address in the machine code instruction, giving us this:

 MOV [0x20014540], 5

Now this code in the second DLL will reference its g_x variable correctly.

There are two major drawbacks when a module cannot load at its preferred base address:

The loader has to iterate through the relocation section and modify a lot of the module's code. This produces a major performance hit and can really hurt an application's initialization time.

As the loader writes to the module's code pages, the system's copy-on-write mechanism forces these pages to be backed by the system's paging file.

The second point above is truly bad. It means that the module's code pages can no longer be discarded and reloaded from the module's file image on disk. Instead, the pages are swapped to and from the system's paging file as necessary. This hurts performance too. But wait, it gets worse. Since the paging file backs all of the module's code pages, the system has less storage available for all processes running in the system. This restricts the size of users' spreadsheets, word processing documents, CAD drawings, bitmaps, and so on.

By the way, you can create an executable or DLL module that doesn't have a relocation section in it. You do this by passing the /FIXED switch to the linker when you build the module. Using this switch makes the module smaller in bytes but it means that the module cannot be relocated. If the module cannot load at its preferred base address, it cannot load at all. If the loader must relocate a module but no relocation section exists for the module, the loader kills the entire process and displays an "Abnormal Process Termination" message to the user.

For resource-only DLLs, this is a problem. A resource-only DLL contains no code, so linking the DLL using the /FIXED switch makes a lot of sense. However, if the resource-only DLL can't load at its preferred base address, the module can't load at all. This is ridiculous. To solve this problem, the linker allows you to create a module with information embedded in the header indicating that the module contains no relocation information because none is needed. The Windows 2000 loader works with this header information and allows a resource-only DLL to load without incurring any performance or paging file space penalties.

To create an image without any relocations, link the image using the /SUBSYSTEM:WINDOWS, 5.0 switch or /SUBSYSTEM:CONSOLE, 5.0 switch and do not specify the /FIXED switch. If the linker determines that nothing in the module is subject to relocation fixups, it omits the relocation section from the module and turns off a special IMAGE_FILE_RELOCS_STRIPPED flag in the header. When Windows 2000 loads the module, it sees that the module can be relocated (because the IMAGE_FILE_RELOCS_STRIPPED flag is off) but that the module has no relocations (since the relocation section doesn't exist). Note that this is a new feature of the Windows 2000 loader, which explains why the /SUBSYSTEM switch requires the 5.0 at the end.

You now understand the importance of the preferred base address. So if you have multiple modules that you're loading into a single address space, you must set different preferred base addresses for each module. Microsoft Visual Studio's Project Settings dialog box makes this easy. All you do is select the Link tab and then select the Output category. In the Base Address field, which is blank by default, you enter a number. In the following figure, I've set my DLL module's base address to 0x20000000.

click to view at full size.

By the way, you should always load DLLs from high-memory addresses, working your way down to low-memory addresses to reduce fragmentation of the address space.

NOTE
Preferred base addresses must always start on an allocation-granularity boundary. On all platforms to date, the system's allocation granularity is 64 KB. This could change in the future. Chapter 13 discusses allocation granularity in more detail.

OK, so that's all fine and good. But what if you're loading a lot of modules into a single address space? It would be nice if there were some easy way to set good preferred base addresses for all of them. Fortunately, there is.

Visual Studio ships with a utility called Rebase.exe. If you run Rebase without any command-line arguments, you get the following usage information:

 usage: REBASE [switches] [-R image-root [-G filename] [-O filename] [-N filename]] image-names... One of -b and -i switches are mandatory. [-a] Used with -x. extract All debug info into .dbg file [-b InitialBase] specify initial base address [-c coffbase_filename] generate coffbase.txt -C includes filename extensions, -c does not [-d] top down rebase [-f] Strip relocs after rebasing the image [-i coffbase_filename] get base addresses from coffbase_filename [-l logFilePath] write image bases to log file. [-p] Used with -x. Remove private debug info when extracting [-q] minimal output [-s] just sum image range [-u symbol_dir] Update debug info in .DBG along this path [-v] verbose output [-x symbol_dir] extract debug info into separate .DBG file first [-z] allow system file rebasing [-?] display this message [-R image_root] set image root for use by -G, -O, -N [-G filename] group images together in address space [-O filename] overlay images in address space [-N filename] leave images at their origional address -G, -O, -N, may occur multiple times. File "filename" contains a list of files (relative to "image-root")

The Rebase utility is described in the Platform SDK documentation, so I won't go into detail here. However, I'll just add that there is nothing magical about this utility. Internally, it simply calls the ReBaseImage function repeatedly for each file specified:

 BOOL ReBaseImage( PSTR CurrentImageName, // Pathname of file to be rebased PSTR SymbolPath, // Symbol file path so debug info // is accurate BOOL fRebase, // TRUE to actually do the work; FALSE // to pretend BOOL fRebaseSysFileOk, // FALSE to not rebase system images BOOL fGoingDown, // TRUE to rebase the image below // an address ULONG CheckImageSize, // Maximum size that image can grow to ULONG* pOldImageSize, // Receives original image size ULONG* pOldImageBase, // Receives original image base address ULONG* pNewImageSize, // Receives new image size ULONG* pNewImageBase, // Receives new image base address ULONG TimeStamp); // New timestamp for image

When you execute Rebase, passing it a set of image file names, it does the following:

It simulates creating a process's address space.

It opens all of the modules that would normally be loaded into this address space. It thus gets the preferred base address and size of each module.

It simulates relocating the modules in the simulated address space so that none of the modules overlap.

For the relocated modules, it parses that module's relocation section and modifies the code in the module file on disk.

It updates the header of each relocated module to reflect the new preferred base address.

Rebase is an excellent tool, and I strongly encourage you to use it. You should run it toward the end of your build cycle, after all of your application's modules are built. Also, if you use Rebase, you can ignore setting the base address in the Project Settings dialog box. The linker will give the DLL a base of 0x10000000, but Rebase will override that.

By the way, you should never, ever rebase any of the modules that ship with the operating system. Microsoft runs Rebase on all the operating system-supplied files before shipping Windows so that none of the operating system modules overlap if you map them all into a single address space.

I added a special feature to the ProcessInfo.exe application presented in Chapter 4. The tool shows you the list of all modules that are in the process's address space. Under the BaseAddr column, you see the virtual memory address where the module is loaded. Right next to the BaseAddr column is the ImagAddr column. Usually this column is blank, which indicates that the module loaded at its preferred base address. You hope to see this for all modules. However, if another address appears in parentheses, the module did not load at its preferred base address and the number indicates the module's preferred base address as read from header information in the module's disk file.

Here is the ProcessInfo.exe tool looking at the Acrord32.exe process. Notice that some of the modules did load at their preferred base address. Some of them did not. You'll also notice that all of these modules had a preferred base address of 0x10000000, indicating that they are DLLs and that the creator of these modules did not worry about rebasing issues—shame on them.

click to view at full size.