< Day Day Up > |
Tracing Software Problems to the Source: Using the gdb DebuggerIf thinking about the problem, trying to do things as correctly as possible, and examining all the debugging information yields only an application that doesn't run correctly, you still have the option of digging around in the code. Thankfully, Apple has provided the GNU debugger, gdb, as part of the development tools. The GNU debugger is to the Unix debugging world what the GNU compiler is to the Unix programming world a flexible, community-supported, de facto standard for programmer productivity. The easiest way to explain how to use gdb is to demonstrate its use. The program has copious online help, as well as man pages and an INFO section available through the emacs M-x info command. Before the demonstration, however, Table 14.1 contains a summary of command-line options and common internal commands.
To use gdb, you first need something on which to use it. Type in the little program shown in Listing 14.1, just as it appears here. Alternatively, you can download it from macosxunleashed.com's downloads directory: curl -O http://www.macosxunleashed.com/downloads/addme.c Name the file addme.c. Listing 14.1. The Source for the addme.c Demo C Program/* addme.c A really silly C demo program */ /* 990325 WCR */ /* Usage is <progname> <filename> */ #include <stdio.h> int addem(a,b) int a, b; { return a+b; } void main(argc,argv) int argc; char *argv[]; { int i; char infilename[8]; int j; FILE *infile; char number[100]; char *infilename2=infilename; strcpy(infilename2,argv[1]); i=0; j=0; infile = fopen(infilename2,"r"); if(infile==NULL) { printf("couldn't open file %s please try again\n",infilename2); exit(1); } i=0; while (fgets(number,90,infile) != '\0') { sscanf(number,"%d",&j); i=addem(i,j); } printf("Your total is %d\n",i); exit(0); } This simple little C program takes a list of integers from a file, one per line, and adds them together. So that you'll have a file to work from, create a file named numbers with the following contents: 1 2 13 15 Make sure that there are no blank lines above or below the data. Also create a file with a very long name, such as supercalifragilisticzowie, and put the same data in it. Note there's a bit of trickery involved in the way this code is written that's specifically there to generate an error. Even though there are a few errors in this code, some systems are sloppy enough with memory management that the program might run intermittently. Also, if you rearrange the definition of the variables i and j, you decrease the likelihood of a crash. Weird, huh? So, let's see what we have. Time to compile the program. We don't have a makefile, so we'll have to do it by hand. Issue the command cc -g -o addemup addme.c After a few seconds, your machine should return you to a command line. The compiler should respond with a warning similar to the following: addme.c: In function `main': addme.c:16: warning: return type of `main' is not `int' addme.c:40: warning: incompatible implicit declaration of built-in function 'exit' It should return you to the command line. If it does anything else, for instance, outputs addme.c: In function `main': addme.c:15: parse error before `char' addme.c:23: subscripted value is neither array nor pointer that means you've typed the program in incorrectly, or have otherwise messed up your downloaded version. Specifically, if you got this error, in all likelihood you forgot the semicolon after the line that says int argc;. The warning that's returned when it's correct is just that: a warning, not an error. The most recent revision of the C programming language has a preference for a particular return type for the main program, and the compiler is just being pedantic and reminding me that having a void return value is archaic. After you get the program to compile cleanly with no errors, you're ready for the next step trying it out. Issue the command ./addemup and see what happens. Note that the command is addemup, not something related to addme. I could actually have named it anything I wanted, simply by changing the -o addemup part of the cc command. If you don't specify any output filename, cc names the output file a.out by default. Also, just so that you know, the -g flag tells the compiler to turn on the debugging output. This slows the program but gives the debugger important information. ./addemup Bus Error Well, that doesn't sound good. What could be wrong? You can probably figure it out just by looking at the code at this point, but on a more complicated program, that would be impossible. Instead, let's start the gdb debugger and take a look. brezup:software source $ gdb ./addemup GNU gdb 5.3-20030128 (Apple version gdb-365) (Sun Oct 24 12:57:07 GMT 2004) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "powerpc-apple-darwin"... Reading symbols for shared libraries .. done (gdb) Okay, we're at a prompt. What do we do? The gdb debugger actually has a complete selection of online help available. To access the help system, simply enter the command help. (gdb) help List of classes of commands: aliases -- Aliases of other commands breakpoints -- Making program stop at certain points data -- Examining data files -- Specifying and examining files internals -- Maintenance commands obscure -- Obscure features running -- Running the program stack -- Examining the stack status -- Status inquiries support -- Support facilities tracepoints -- Tracing of program execution without stopping the program user-defined -- User-defined commands Type "help" followed by a class name for a list of commands in that class. Type "help" followed by command name for full documentation. Command name abbreviations are allowed if unambiguous. (gdb) I'll leave some of the interesting items here for you to explore, rather than walk you through them. Right now, let's get back to debugging our program. To start the program, simply issue the command r. (gdb) r Starting program: /Users/software/Documents/source/addemup Reading symbols for shared libraries . done Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_PROTECTION_FAILURE at address: 0x00000000 0x978c6168 in strcpy () (gdb) So, gdb knows something. Not a very intelligible something at this point, but something none the less. That address 0x00000000 is most disturbing (above and beyond the errors). It's almost impossibly unlikely that a program would actually have data at location zero. If it had any other value, I might suspect a logic error, but an address of zero for anything suggests that something somewhere isn't being assigned to what it should be. Let's see whether gdb can be a bit more informative. (gdb) where #0 0x978c6168 in strcpy () #1 0x00002bb0 in main (argc=1, argv=0xbffffcb6) at addme.c:23 (gdb) gdb says the program broke in a procedure named strcpy, which was called from a procedure named main, in line 23 of our file addme.c. Depending on the compiler version and gdb version, you might also see a line or two for start(), which is Mac OS X and gdb initializing and starting the program. Let's take a look at the region of the code in your file (line 23) that gdb indicates was the last place that things were working. (gdb) l 23 18 char infilename[8]; 19 int j; 20 FILE *infile; 21 char number[100]; 22 char *infilename2=&infilename; 23 strcpy(infilename2,argv[1]); 24 i=0; j=0; 25 infile = fopen(infilename2,""r""); 26 27 if(infile==NULL) (gdb) Line 23 has a function strcpy on it (this C function copies the contents of one character array [string] variable to another). The debugger seems to be on to something here. Let's set a breakpoint (a place we want the program to stop running and wait for us) at line 23 and see what happens.
(gdb) b 23 Breakpoint 1 at 0x2b9c: file addme.c, line 23. (gdb) So far, so good. Now let's run the program again and see where this takes us. (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /Users/software/Documents/source/addemup numbers Breakpoint 1, main (argc=1, argv=0xbffffcb6) at addme.c:23 23 strcpy(infilename2,argv[1]); (gdb) Note that gdb asked me whether I wanted to restart from the beginning, and I told it to go ahead. Now it has run up to our breakpoint and is waiting for me to do something. Even if I don't know what strcpy does, there's still something obviously wrong with this line. I know I've got a variable named infilename2 and a funny variable named argv[1]. Let's see what gdb has to say about them. (gdb) p infilename2 $1 = 0xbffffb58 "????" (gdb) The $1 indicates that it's telling us about the first variable we asked about. The 0xbfff99c is the memory location where it's stored. Don't be surprised if yours is different, or your version of gdb doesn't show the location without tweaking some customization settings the default behavior depends on a number of factors outside the scope of this discussion. If you use gdb much, you'll pick up how to set your configuration to display the data you like in the format you want. The built-in help system is, well, helpful here. The ???? is the current content of this variable, which, other than not being anything you might expect, might appear to be meaningless. Depending on your gdb settings, you might see other gibberish instead, which will probably look something like L\000\000@. Other than the fact that there's nothing interpretable at that memory location, this might not seem to be too informative yet. It will make more sense shortly. (Don't be surprised if yours has something else in whatever memory location shows up on your machine.) Moving on, what can we tell about this argv[1] variable? (gdb) p argv[1] $2 = 0x0 (gdb) Hmmm… 0x0 is a hexadecimal 0, or NULL in the C world. Examining the code again certainly suggests that something useful should be happening here. It looks as if infilename2 gets used to open a file in just a few lines, and neither L\000\000@ nor NULL looks promising as a filename. Nulls are used in C, but frequently they're signs of a problem, so let's think about this. The program is trying to do something with a variable named argv[1]. The only other place this variable (argv) appears is in the main statement the statement that starts the actual program execution. It certainly looks as if there should be something other than a NULL here. Wait a minute, what did it say in the comments at the top? It said I need to give it a filename at the command line when I run it! I didn't give it a filename, and it's trying to copy something that doesn't exist to get one. Aren't programmers supposed to check for that? Let's see whether I'm right. I'll rerun the program with a filename this time. (gdb) r numbers The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /Users/software/Documents/source/addemup numbers Breakpoint 1, main (argc=2, argv=0xbffffca2) at addme.c:23 23 strcpy(infilename2,argv[1]); (gdb) I started it over, but I forgot to turn off my breakpoint. Still, this is a good opportunity for me to check whether I was right. (gdb) p infilename2 $3 = 0xbffffb38 "???\236" (gdb) That's different, but just as useless as before. (gdb) p argv[1] $4 = 0xbffffd3a "numbers" (gdb) Now we're getting somewhere! If we remember to give it a filename, it actually gets one! To continue past the breakpoint, I can enter c. (gdb) c Continuing. Your total is 31 Program exited normally. (gdb) The program now does exactly what it should. If I want to test it again without stopping at the breakpoint, I can delete the breakpoint and run it again. (gdb) d 1 (gdb) r Starting program: /Users/software/Documents/source/addemup numbers Your total is 31 Program exited normally. (gdb) The command d 1 deletes breakpoint 1 (you can have multiple breakpoints if you need them). Note that I didn't have to give it the command-line argument numbers this time when I entered r because gdb conveniently remembers command-line arguments between runs. As you can see, it runs properly to completion. Quitting gdb with the quit command and trying it on the command line produces the same results. brezup:software source $ ./addemup numbers Your total is 31 brezup:software source $ Now let's see whether we can demonstrate another type of error. Do you still remember what your very long filename is? Try using that filename instead of numbers and see what happens. brezup:software source $ ./addemup supercalifragilisticzowie couldn't open file supercal please try again brezup:software source $ Huh? I didn't call it supercal. Something happened to my filename. Time to break out gdb again and have another look. brezup:software source $ gdb ./addemup GNU gdb 5.3-20030128 (Apple version gdb-365) (Sun Oct 24 12:57:07 GMT 2004) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "powerpc-apple-darwin"... Reading symbols for shared libraries .. done (gdb) r supercalifragilisticzowie Starting program: addemup supercalifragilisticzowie Reading symbols for shared libraries .. done couldn't open file supercal please try again Program exited with code 01. (gdb) Basically, it says the same thing. There must be something more we can find out, though. Let's look at the code and see whether we can figure out where that weird truncation came from. (gdb) l 13 void main(argc,argv) 14 int argc; 15 char *argv[]; 16 { 17 int i; 18 char infilename[8]; 19 int j; 20 FILE *infile; 21 char number[100]; 22 char *infilename2=infilename; (gdb) 23 strcpy(infilename2,argv[1]); 24 i=0; j=0; 25 infile = fopen(infilename2,"r"); 26 27 if(infile==NULL) 28 { 29 printf("couldn't open file %s please try again\n",infilename2); 30 exit(1); 31 } 32 (gdb) Line 29 seems to be where the error message is coming from. Let's set a breakpoint there and see what happens. (gdb) b 29 Breakpoint 1 at 0x2be4: file addme.c, line 29. (gdb) r Starting program: addemup supercalifragilisticzowie Breakpoint 1, main (argc=2, argv=0xbffffc80) at addme.c:29 29 printf("couldn't open file %s please try again\n",infilename2); (gdb) We're at our breakpoint. infilename2 is supposed to be supercalifragilisticzowie, and it is… (gdb) p infilename2 $1 = 0xbffffb18 "supercal" (gdb) …not!. Something's wrong here! Time to back up to our trusty breakpoint at line 23 and watch what happens from the top down. (gdb) b 23 Breakpoint 2 at 0x2b9c: file addme.c, line 23. (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: addemup supercalifragilisticzowie Breakpoint 2, main (argc=2, argv=0xbffffd2c) at addme.c:23 23 strcpy(infilename2,argv[1]); (gdb) p argv[1] $2 = 0xbffffd18 "supercalifragilisticzowie" (gdb) p infilename $3 = "???|\000\000\000\f" So, the previous culprit isn't a problem here. (gdb) p infilename2 $4 = 0xbffffb18 "???|" (gdb) There's nothing interesting there. Let's see what happens on the next line use the gdb command n to step to the next line. Before you step to the next line, the line you are currently on executes, so you should expect to see the results of that strcpy on line 23 after stepping forward to line 24. (gdb) n 24 i=0; j=0; (gdb) p infilename2 $6 = 0xbffffb18 "supercalifragilisticzowie" (gdb) As expected, infilename2 contains our atrociously long filename. Nothing wrong here now, but when we ran all the way through, by the time execution hit line 29, it was broken. Let's step forward again and see what happens. (gdb) n 25 infile = fopen(infilename2,""r""); (gdb) p infilename2 $8 = 0xbffffb18 "supercal" (gdb) Wait a minute! Now it's wrong! What happened? All that the program did between those two lines was to assign both the variables i and j to be zero, and somehow it affected infilename2. You wouldn't think this could happen, variables just changing their values willy-nilly. In fact, if the program were written properly, this wouldn't happen. As a nonprogrammer, this is where you usually give up. That isn't to say that the exercise has been useless. With this information, you can more easily explain to the author or online support community what problems you've observed so that they can fix it more easily and quickly. Program authors hate it when they get bug reports that say, "it didn't work." This doesn't mean anything to them because if they could duplicate the problem on their end, they'd probably have found and fixed it already. Instead, you say, "it doesn't work because for some reason, between line 23 and line 25, the infilename2 variable contents get tromped on," and help the author find the bug by localizing it to a very small portion of the code. By taking these extra steps, the information you can provide about the program's problems can mean the difference between a fix that takes a few minutes to appear and a fix that never appears.
|
< Day Day Up > |