|
Many people have asked me why I keep recommending that everyone create MAP files with their release builds. Simply put, MAP files are the only textual representation of your program's global symbols and source file and line number information. Although using CrashFinder is far easier than deciphering a MAP file, your MAP files can be read anywhere and anytime, without requiring a supporting program and without requiring all your program's binaries to get the same information. Trust me, at some point in the future, you're going to need to figure out where a crash happened on an older version of your software, and the only way you'll be able to find the information is with your MAP file.
MAP files are useful only for release builds because when creating a MAP File, the linker has to turn off incremental linking. Setting up to generate MAP files is much easier in Microsoft Visual C++ .NET than in previous versions. In your project Property Pages dialog box, Linker folder, Debugging property page, you just have to set the Generate Map File field, Map Exports, and the Map Lines fields to Yes. Doing so turns on the linker /MAP, /MAPINFO:EXPORTS, and /MAPINFO:LINES options. As you can guess, SettingsMaster from Chapter 9, with the default project files, will add these settings automatically.
If you're working on a real-world project, you probably have your binary files going to their own output directory. By default, the linker writes the MAP file to the same directory as the intermediate files, so you need to specify that the MAP file goes to the binary file output directory. In the Map File Name edit box, you can type $(OutDir)/$(ProjectName).map. The $(OutDir) is a built-in macro that the build system will substitute with the real output directory, and $(ProjectName) substitutes the project name. Figure 12-3 shows the completed MAP file settings for the release build of the MapDLL project, which is included with this book's sample files.
Figure 12-3: The MAP file settings in the project Property Pages dialog box
Although you might not need the MAP files in your day-to-day operation, chances are that you'll need them in the future. CrashFinder and your debugger rely on symbol tables and a symbol engine to read them. If the format of the symbol table changes or if you forget to save the Program Database (PDB) files, you're completely out of luck. Forgetting to save the PDB files is your fault, but you have no control over symbol table formats. They change frequently. For example, many people who upgraded from Microsoft Visual Studio 6 to Microsoft Visual Studio .NET noticed that tools such as CrashFinder quit working with programs compiled with Visual Studio .NET. Microsoft changed the symbol table format and does so on a regular basis. MAP files are your only savior at that time.
Even though you, as a developer, might be up to Window Server 2008 with Visual Studio .NET 2007 Service Pack 6 in five years, I can assure you that you'll still have customers who will be running the software you released back in 2003. When they call you in alarm and give you a crash address, you could spend the next two days trying to find the Visual Studio .NET CDs so that you can read your saved PDB files. Or if you have the MAP files, you can find the problem in five minutes.
Listing 12-1 shows an example MAP file. The top part of the MAP file contains the module name, the timestamp indicating when LINK.EXE linked the module, and the preferred load address. After the header comes the section information that shows which sections the linker brought in from the various OBJ and LIB files.
Listing 12-1: Example MAP file
MapDLL Timestamp is 3e2b44a3 (Sun Jan 19 19:36:51 2003) Preferred load address is 03900000 Start Length Name Class 0001:00000000 00000304H .text CODE 0002:00000000 00000028H .idata$5 DATA 0002:00000030 000000f8H .rdata DATA 0002:00000128 00000063H .rdata$debug DATA 0002:00000190 00000004H .rdata$sxdata DATA 0002:00000194 00000004H .rtc$IAA DATA 0002:00000198 00000004H .rtc$IZZ DATA 0002:0000019c 00000004H .rtc$TAA DATA 0002:000001a0 00000004H .rtc$TZZ DATA 0002:000001a4 00000014H .idata$2 DATA 0002:000001b8 00000014H .idata$3 DATA 0002:000001cc 00000028H .idata$4 DATA 0002:000001f4 00000082H .idata$6 DATA 0002:00000280 0000007bH .edata DATA 0003:00000000 00000004H .CRT$XCA DATA 0003:00000004 00000004H .CRT$XCZ DATA 0003:00000008 00000004H .CRT$XIA DATA 0003:0000000c 00000004H .CRT$XIZ DATA 0003:00000010 00000004H .data DATA 0003:00000014 00000014H .bss DATA Address Publics by Value Rva+Base Lib:Object 0000:00000001 ___safe_se_handler_count 00000001 <absolute> 0001:00000000 _DllMain@12 03901000 f MapDLL.obj 0001:00000006 ?MapDLLFunction@@YAHXZ 03901006 f MapDLL.obj 0001:00000023 ?MapDLLHappyFunc@@YAPADPAD@Z 03901023 f MapDLL.obj 0001:0000003c __CRT_INIT@12 0390103c f MSVCRT:crtdll. obj 0001:000000fa __DllMainCRTStartup@12 039010fa f MSVCRT:crtdll. obj 0001:000001de __initterm 039011de f MSVCRT:MSVCR71 .dll 0001:000001e4 __onexit 039011e4 f MSVCRT:atonexi t.obj 0001:0000020a _atexit 0390120a f MSVCRT:atonexi t.obj 0001:0000021c __RTC_Initialize 0390121c f MSVCRT:initsec t.obj 0001:00000260 __RTC_Terminate 03901260 f MSVCRT:initsec t.obj 0001:000002a4 ___CppXcptFilter 039012a4 f MSVCRT:MSVCR71 .dll 0001:000002ac __SEH_prolog 039012ac f MSVCRT:sehprol g.obj 0001:000002e7 __SEH_epilog 039012e7 f MSVCRT:sehprol g.obj 0001:000002f8 __except_handler3 039012f8 f MSVCRT:MSVCR71 .dll 0001:000002fe ___dllonexit 039012fe f MSVCRT:MSVCR71 .dll 0002:00000000 __imp__printf 03902000 MSVCRT:MSVCR71 .dll 0002:00000004 __imp__free 03902004 MSVCRT:MSVCR71 .dll 0002:00000008 __imp___initterm 03902008 MSVCRT:MSVCR71 .dll 0002:0000000c __imp__malloc 0390200c MSVCRT:MSVCR71 .dll 0002:00000010 __imp___adjust_fdiv 03902010 MSVCRT:MSVCR71 .dll 0002:00000014 __imp____CppXcptFilter 03902014 MSVCRT:MSVCR71 .dll 0002:00000018 __imp___except_handler3 03902018 MSVCRT:MSVCR71 .dll 0002:0000001c __imp____dllonexit 0390201c MSVCRT:MSVCR71 .dll 0002:00000020 __imp___onexit 03902020 MSVCRT:MSVCR71 .dll 0002:00000024 \177MSVCR71_NULL_THUNK_DATA 03902024 MSVCRT:MSVCR7 1.dll 0002:0000007c ??_C@_0CE@EBHAJKCA@Whoops?0?5a?5crash?5is?5about?5to?5 occu@ 0390207c MapDLL.obj 0002:000000a0 ??_C@_0CD@OILENIKO@Hello?5from?5InternalStaticFunctio@ 039020a0 MapDLL.obj 0002:000000c4 ??_C@_0BM@DFMPKPOD@Hello?5from?5MapDLLFunction?$CB?6?$ AA@ 039020c4 MapDLL.obj 0002:000000e0 __load_config_used 039020e0 MSVCRT:loadcfg .obj 0002:00000190 ___safe_se_handler_table 03902190 <linker- defined> 0002:00000194 ___rtc_iaa 03902194 MSVCRT:initsec t.obj 0002:00000198 ___rtc_izz 03902198 MSVCRT:initsec t.obj 0002:0000019c ___rtc_taa 0390219c MSVCRT:initsec t.obj 0002:000001a0 ___rtc_tzz 039021a0 MSVCRT:initsec t.obj 0002:000001a4 __IMPORT_DESCRIPTOR_MSVCR71 039021a4 MSVCRT:MSVCR7 1.dll 0002:000001b8 __NULL_IMPORT_DESCRIPTOR 039021b8 MSVCRT:MSVCR71 .dll 0003:00000000 ___xc_a 03903000 MSVCRT:cinitex e.obj 0003:00000004 ___xc_z 03903004 MSVCRT:cinitex e.obj 0003:00000008 ___xi_a 03903008 MSVCRT:cinitex e.obj 0003:0000000c ___xi_z 0390300c MSVCRT:cinitex e.obj 0003:00000010 ___security_cookie 03903010 MSVCRT:seccook .obj 0003:00000018 __adjust_fdiv 03903018 <common> 0003:0000001c ___onexitend 0390301c <common> 0003:00000020 ___onexitbegin 03903020 <common> 0003:00000024 __pRawDllMain 03903024 <common> entry point at 0001:000000fa Static symbols 0001:00000016 ?InternalStaticFunction@@YAXXZ 03901016 f MapDLL .obj Line numbers for .\Release\MapDLL.obj(d:\dev\booktwo\disk\chapter exam ples\ch apter 12\mapfile\mapdll\mapdll.cpp) segment .text 11 0001:00000000 20 0001:00000000 21 0001:00000003 26 0001:00000006 25 0001:00000006 27 0001:00000012 28 0001:00000015 31 0001:00000016 32 0001:00000016 33 0001:00000022 37 0001:00000023 36 0001:00000023 38 0001:00000028 39 0001:00000033 41 0001:0000003b Line numbers for R:\VSNET2003\Vc7\lib\MSVCRT.lib(f:\vs70builds\2292\vc\crtbld \crt\src\atonexit.c) segment .text 81 0001:000001e4 76 0001:000001e4 90 0001:00000209 96 0001:0000020a 95 0001:0000020a 97 0001:0000021b Line numbers for R:\VSNET2003\Vc7\lib\MSVCRT.lib(f:\vs70builds\2292\vc\crtbld \crt\src\crtdll.c) segment .text 134 0001:0000003c 129 0001:0000003c 135 0001:00000044 136 0001:0000 004c 158 0001:00000052 163 0001:00000065 168 0001:0000007a 170 0001:0000 007e 172 0001:00000081 178 0001:0000008b 179 0001:00000090 184 0001:0000 009a 189 0001:000000ab 192 0001:000000b2 219 0001:000000b8 220 0001:0000 00c1 225 0001:000000c3 226 0001:000000cf 234 0001:000000e5 236 0001:0000 00ec 234 0001:000000f3 240 0001:000000f4 241 0001:000000f7 249 0001:0000 00fa 250 0001:00000106 252 0001:0000010c 257 0001:00000111 258 0001:0000 011e 260 0001:00000124 262 0001:0000012d 263 0001:00000136 265 0001:0000 0142 266 0001:0000014b 268 0001:0000015a 269 0001:0000015c 272 0001:0000 015e 275 0001:0000016e 283 0001:00000177 286 0001:00000181 288 0001:0000 018a 289 0001:00000198 291 0001:0000019b 292 0001:000001a9 298 0001:0000 01b7 294 0001:000001bc 295 0001:000001d4 299 0001:000001d6 Exports ordinal name 1 ?MapDLLFunction@@YAHXZ (int __cdecl MapDLLFunction(void)) 2 ?MapDLLHappyFunc@@YAPADPAD@Z (char * __cdecl MapDLLHappyFunc(cha r *))
After the section information, you get to the good stuff, the public function information. Notice the "public" part. If you have static-declared functions, they are placed in a similar table after the public functions table. Fortunately, the line numbers are not separated out and appear together.
The important parts of the public function information are the function names and the information in the Rva+Base column, which is the starting address of the function. The f after some of the Rva+Base addresses indicates that the address is an actual function and not a global variable or imported address of some kind. The line information follows the public function section. The lines are shown as follows:
24 0001:00000006
The first number is the line number, and the second is the offset from the beginning of the code section in which this line occurred. Yes, that sounds confusing, but later I'll show you the calculation you need to convert an address into a source file and line number.
If the module contains exported functions, the final section of a MAP file lists the exports. You can get this same information by running DUMPBIN /EXPORTS <modulename>.
The algorithm for extracting the source file, function name, and line number from a MAP file is straightforward, but you need to do a few hexadecimal calculations when using it. As an example, let's say that a crash in MAPDLL.DLL, the module shown in Listing 12-1, occurs at address 0x03901038.
The first step is to look in your project's MAP files for the file that contains the crash address. First look at the preferred load address and the last address in the public function section. If the crash address is between those values, you're looking at the correct MAP file.
To find the function, scan down the Rva+Base column until you find the first function address that's greater than the crash address. The preceding entry in the MAP file is the function that had the crash. For example, in Listing 12-1, the first function address greater than the 0x03901038 crash address is 0x03901023, so the function that crashed is ?MapDLLHappyFunc@@YAPADPAD@Z. Any function name that starts with a question mark is a C++ decorated name.
You're probably wondering why I didn't mention the C++ name decoration when I talked about the calling convention name decoration in Chapter 6. Although both serve similar purposes, they come from different places. The calling convention name decoration simply tells the code generator how to generate the parameter pushes and stack cleanup, and it comes from the operating system definitions. C++ name decoration comes as a result of the language. Since you can have overloaded methods, the compiler has to have some way to differentiate them. It "decorates" the name with the return type, calling convention, and parameter information. That way it will know exactly what function you meant to call. To translate the name, pass it as a command-line parameter to the program UNDNAME.EXE, which is included with Visual Studio .NET. In the example, ?MapDLLHappyFunc@@YAPADPAD@Z translates into char * __cdecl MapDLLHappyFunc(char *). You probably could have figured out that MapDLLHappyFunc was the function name just by looking at the decorated name. Other C++ decorated names are harder to decipher, especially when overloaded functions are used.
To find the line number, you get to do a little hexadecimal subtraction by using the following formula:
(crash address) – (preferred load address) – 0x1000
Remember that the addresses are offsets from the beginning of the first code section, so the formula does that conversion. You can probably guess why you subtract the preferred load address, but you earn extra credit if you know why you still have to subtract 0x1000. The crash address is an offset from the beginning of the code section, but the code section isn't the first part of the binary. The first part of the binary is the PE (portable executable) header and associated DOS stub, which is 0x1000 bytes long. Yes, all Win32 binaries still have that MS-DOS heritage in them.
I'm not sure why the linker still generates MAP files that require this odd calculation. The linker team put in the Rva+Base column a while ago, so I don't see why they didn't just fix up the line number at the same time.
Once you've calculated the offset, look through the MAP file line information until you find the closest number that isn't over the calculated value. Keep in mind that during the generation phase the compiler can jiggle the code around so that the source lines aren't in ascending order. With my crash example, I used the following formula:
0x03901038 – 0x03900000 – 0x1000 = 0x38
If you look through the MAP file in Listing 12-1, you'll see that the closest line that isn't over 0x38 is 39 0001:00000033 (Line 39) in MAPDLL.CPP.
One issue that keeps coming up when I discuss finding crash addresses with other developers is the lament that you've already got code out in the field in which you don't have MAP files. Other eagle eye developers have also pointed out that having perfect MAP files means you have to set the base address of all your DLLs as part of the build. If you're working on an existing project that's about to ship, you might not want to destabilize the build by changing a bunch of settings. Additionally, without SettingsMaster from Chapter 9, Visual Studio doesn't make it convenient to make those global project settings changes. That's a primary reason why people simply default to using REBASE.EXE to take care of setting their DLL's base addresses.
Being one not to let any challenge go unmet, I took a look at the problem. Really all I needed was a way to enumerate functions, source files, and source lines. Given that the DBGHELP.DLL symbol engine already does that, it was a piece of cake to take the next step to generate a MAP file from a PDB file.
The first problem I ran into was that the SymGetSymNext and SymGetSymPrev functions don't return what you would expect. I thought I could get an address in a source file, call SymGetSymPrev until I got to the beginning of the source file, and roll down the end of the source file with SymGetSymNext. What I forgot to take into account are small things called inline functions. Those functions and source lines can occur in the middle of a function, so the source line information is really stored in ranges. This meant that I had to come up with a scheme to keep track of all the ranges so that I could condense the source and line information. Once I got over that hurdle, the program was pretty easy to develop.
The only other thing that got me had nothing to do with symbol engines—it was the Standard Template Library (STL). I first started out implementing my data structures in STL and quickly found that even a partial implementation of PDB2MAP.EXE was excruciatingly slow. That was mostly my fault because I was using the vector class in a linear search way that was just plain stupid. After fiddling some more, I realized that STL was always going to be much slower as it was doing quite a bit of memory allocation and copying behind the scenes. After much gnashing of teeth trying to make sense of some of the STL implementation details, I figured out I was making the problem much more complicated than it needed to be. I ended up manually coding a simple multiple array system that was blindingly fast and super simple to understand. It also had the added benefit of being much more maintainable than anything I could have created in STL.
The files produced by PDB2MAP are close to actual MAP files. Since the DBGHELP.DLL symbol engine doesn't return static functions, there's no way for me to output that information. As you look at a .P2M file, you'll see that you should have no trouble reading it. I considered using the crazy MAP file line number system for old-times' sake, but instead used PDB2MAP, which I brought into the modern age. My line information is generated using real addresses that appear in memory.
One other interesting tidbit of data that you might be interested in is output in your .P2M file. As I mentioned back in Chapter 2, small code is good code. However, other than looking at the total size of the binary, there's no way to see how different compiler switches will affect the size of individual functions. Additionally, there's no way to see what effect inline functions have on a particular function. Since I was doing PDB2MAP, I figured I might as well report symbol sizes because DBGHELP.DLL's symbol engine can report the individual sizes. After the header information in your .P2M file, the function information shows the size of each function between the function address and name, as shown in Listing 12-2, which is an abbreviated .P2M file. Although nearly all functions will have their sizes, DBGHELP.DLL doesn't guarantee that sizes will be returned, so you might see sizes of 0.
Listing 12-2: An abbreviated .P2M file
PDB2MAP Generated Map File Image: AssertTest Timestamp is 3E0E7E2A -> Sat Dec 28 23:46:34 2002 Preferred load address is 00400000 Address Size Function 0x00401050 36 ??2@YAPAXI@Z 0x00401080 260 ?MyThread@@YGKPAX@Z 0x00401190 38 ?SleepThread@@YGKPAX@Z 0x004011C0 535 ?TestThree@@YAXPAD@Z 0x004013E0 258 ?TestTwo@@YAXXZ 0x004014F0 421 ?TestOne@@YAXPAG@Z 0x004016A0 453 _wWinMain@16 0x00401A5E 6 _InitCommonControls@0 0x00401A64 6 _SuperAssertionW . . . Line numbers for d:\dev\booktwo\disk\bugslayerutil\tests\asserttest\as serttest.cpp 16 : 0x00401080 18 : 0x0040109F 19 : 0x004010CB 20 : 0x004 010DE 21 : 0x004010E5 22 : 0x0040113C 23 : 0x0040113E 26 : 0x004 01190 27 : 0x00401194 28 : 0x004011A8 29 : 0x004011AA 32 : 0x004 011C0 33 : 0x004011D7 39 : 0x004011DE 40 : 0x00401201 41 : 0x004 01207 43 : 0x00401223 44 : 0x0040127A 45 : 0x0040129D 46 : 0x004 012B2 47 : 0x0040131A 48 : 0x0040131F 49 : 0x00401334 50 : 0x004 0139C 53 : 0x004013E0 55 : 0x004013F7 57 : 0x0040140F 59 : 0x004 01427 60 : 0x0040143A 61 : 0x0040143C 62 : 0x0040143E 63 : 0x004 01498 64 : 0x004014A5 67 : 0x004014F0 68 : 0x00401515 70 : 0x004 01527 74 : 0x0040153E 76 : 0x00401548 78 : 0x004015CF 80 : 0x004 01632 81 : 0x00401638 82 : 0x0040163D 90 : 0x004016A0 91 : 0x004 016C4 92 : 0x0040171B 93 : 0x00401772 94 : 0x004017D2 96 : 0x004 01829 97 : 0x00401838 98 : 0x00401845 99 : 0x00401852 100 : 0x004 01854 Line numbers for f:\vs70builds\2292\vc\crtbld\crt\src\atonexit.c 76 : 0x00402810 81 : 0x00402814 90 : 0x0040284B 95 : 0x004 02850 96 : 0x00402853 97 : 0x00402866
|