Summary

 < Day Day Up > 

Tracing Software Problems to the Source: Using the gdb Debugger

If thinking about the problem, trying to do things as correctly as possible, and examining all the debugging information yields only an application that doesn't run correctly, you still have the option of digging around in the code. Thankfully, Apple has provided the GNU debugger, gdb, as part of the development tools. The GNU debugger is to the Unix debugging world what the GNU compiler is to the Unix programming world a flexible, community-supported, de facto standard for programmer productivity.

The easiest way to explain how to use gdb is to demonstrate its use. The program has copious online help, as well as man pages and an INFO section available through the emacs M-x info command. Before the demonstration, however, Table 14.1 contains a summary of command-line options and common internal commands.

Table 14.1. The Command Documentation Table for the gdb Debugger

gdb

GNU debugger

gdb [-help] [-nx] [-q] [-batch] [-cd=<dir>] [-f] [-b <bps>] [-tty=<dev>] [-s <symfile>] [-e <prog>] [-se <prog>] [-c <core>] [-x <cmds>] [-d <dir>] [<prog> [<core> | <procID>]]

gdb can be used to debug programs written in C, C++, and Modula-2.

Arguments other than options specify an executable file and a core file or process ID. The first argument encountered with no associated option flag is equivalent to the -se option; the second, if any, is equivalent to the -c option, if it is a file. Options and command-line arguments are processed in sequential order. The order makes a difference when the -x option is specified.

-help

Lists all options with brief explanations.

-h

 

-symbols=<file>

Reads symbol table from file <file>.

-s <file>

 

-write

Enables writing into executable and core files.

-exec=<file>

Uses <file> as the executable file to execute when appropriate, and for examining pure data in conjunction with a core dump.

-e <file>

-se=<file>

Reads symbol table from <file> and uses it as the executable file.

-core=<file>

Uses <file> as a core dump to examine.

-c <file>

 

-command=<file>

Executes gdb commands from <file>.

-x <file>

 

-directory=<directory>

Adds <directory> to the path to search for source files.

-d <directory>-nx

Does not execute commands from any .gdbinit files. Commands in these files are normally executed after all the command options and arguments have been processed.

-n

-quiet

Quiet mode. Does not print the introductory and copyright messages. Also suppresses them in batch mode.

-q

-batch

Batch mode. Exits with status 0 after processing all the command files associated with the -x option (and .gdbinit, if not inhibited). Exits with nonzero status if an error occurs while executing the gdb commands in the command files.

-cd=<directory>

Runs gdb using <directory> as the working directory rather than using the current directory as the working directory.

-fullname

Outputs information used by emacs-gdb interface.

-f

 

-b <bps>

Sets the line speed (baud rate or bits per second) of any serial interface used by gdb for remote debugging.

-tty=<device>

Runs using <device> for your program's standard input and output.

These are some of the more frequently needed gdb commands:

break [<file>]<function>

Sets a breakpoint at <function> (in <file>).

run [<arglist>]

Starts your program (with <arglist>, if specified).

bt

Backtrace. Displays the program stack.

print <expr>

Displays the value of an expression.

c

Continues running your program (after stopping, such as at a breakpoint).

next

Executes the next program line (after stopping); steps over any function calls in the line.

step

Executes the next program line (after stopping); steps into any function calls in the line.

help [<name>]

Shows information about gdb command <name>, or general information about using gdb.

quit

Exits gdb.


NOTE

When following this debugging example, an almost overwhelmingly large number of details appear in the output. These all have important meanings to someone studying the inner workings of the program, but for the purpose of just trying to see what might be wrong, and whether you understand enough to fix it, you really only need to follow along with the details discussed in the example.

Don't let the other details intimidate you and convince you to ignore the possibilities the debugger presents. Even accomplished programmers sometimes let the apparent complexity of debugging output sidetrack them into using less effective tools and wasting time. You can learn an incredible amount and get good at cleaning up little software errors by starting from these humble beginnings. All it takes is a willingness to experiment and pay attention to deeper details each time you learn something new.

In light of this, don't consider or expect this example to be a comprehensive discussion of how you use gdb to debug software. It's designed to show you what real errors look like and to demonstrate that if you pay attention, it really is within the grasp of ordinary, everyday users to hunt for, and potentially to fix, software errors.


To use gdb, you first need something on which to use it. Type in the little program shown in Listing 14.1, just as it appears here. Alternatively, you can download it from macosxunleashed.com's downloads directory:

 curl -O http://www.macosxunleashed.com/downloads/addme.c 

Name the file addme.c.

Listing 14.1. The Source for the addme.c Demo C Program
 /* addme.c   A really silly C demo program */ /* 990325 WCR                              */ /* Usage is <progname> <filename>          */ #include <stdio.h> int addem(a,b) int a, b; {   return a+b; } void main(argc,argv) int argc; char *argv[]; {   int i;   char infilename[8];   int j;   FILE *infile;   char number[100];   char *infilename2=infilename;   strcpy(infilename2,argv[1]);   i=0; j=0;   infile = fopen(infilename2,"r");   if(infile==NULL)   {      printf("couldn't open file %s please try again\n",infilename2);      exit(1);   }   i=0;   while (fgets(number,90,infile) != '\0')   {      sscanf(number,"%d",&j);      i=addem(i,j);   }   printf("Your total is %d\n",i);   exit(0); } 

This simple little C program takes a list of integers from a file, one per line, and adds them together. So that you'll have a file to work from, create a file named numbers with the following contents:

 1 2 13 15 

Make sure that there are no blank lines above or below the data.

Also create a file with a very long name, such as supercalifragilisticzowie, and put the same data in it.

Note there's a bit of trickery involved in the way this code is written that's specifically there to generate an error. Even though there are a few errors in this code, some systems are sloppy enough with memory management that the program might run intermittently. Also, if you rearrange the definition of the variables i and j, you decrease the likelihood of a crash. Weird, huh?

So, let's see what we have. Time to compile the program. We don't have a makefile, so we'll have to do it by hand. Issue the command

 cc -g -o addemup addme.c 

After a few seconds, your machine should return you to a command line. The compiler should respond with a warning similar to the following:

 addme.c: In function `main': addme.c:16: warning: return type of `main' is not `int' addme.c:40: warning: incompatible implicit declaration of built-in function 'exit' 

It should return you to the command line. If it does anything else, for instance, outputs

 addme.c: In function `main': addme.c:15: parse error before `char' addme.c:23: subscripted value is neither array nor pointer 

that means you've typed the program in incorrectly, or have otherwise messed up your downloaded version. Specifically, if you got this error, in all likelihood you forgot the semicolon after the line that says int argc;.

The warning that's returned when it's correct is just that: a warning, not an error. The most recent revision of the C programming language has a preference for a particular return type for the main program, and the compiler is just being pedantic and reminding me that having a void return value is archaic.

After you get the program to compile cleanly with no errors, you're ready for the next step trying it out. Issue the command ./addemup and see what happens. Note that the command is addemup, not something related to addme. I could actually have named it anything I wanted, simply by changing the -o addemup part of the cc command. If you don't specify any output filename, cc names the output file a.out by default. Also, just so that you know, the -g flag tells the compiler to turn on the debugging output. This slows the program but gives the debugger important information.

 ./addemup Bus Error 

Well, that doesn't sound good. What could be wrong? You can probably figure it out just by looking at the code at this point, but on a more complicated program, that would be impossible. Instead, let's start the gdb debugger and take a look.

 brezup:software source $ gdb ./addemup GNU gdb 5.3-20030128 (Apple version gdb-365) (Sun Oct 24 12:57:07 GMT 2004) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB.  Type "show warranty" for details. This GDB was configured as "powerpc-apple-darwin"... Reading symbols for shared libraries  .. done (gdb) 

Okay, we're at a prompt. What do we do? The gdb debugger actually has a complete selection of online help available. To access the help system, simply enter the command help.

 (gdb) help List of classes of commands: aliases -- Aliases of other commands breakpoints -- Making program stop at certain points data -- Examining data files -- Specifying and examining files internals -- Maintenance commands obscure -- Obscure features running -- Running the program stack -- Examining the stack status -- Status inquiries support -- Support facilities tracepoints -- Tracing of program execution without stopping the program user-defined -- User-defined commands Type "help" followed by a class name for a list of commands in that class. Type "help" followed by command name for full documentation. Command name abbreviations are allowed if unambiguous. (gdb) 

I'll leave some of the interesting items here for you to explore, rather than walk you through them. Right now, let's get back to debugging our program. To start the program, simply issue the command r.

 (gdb) r Starting program: /Users/software/Documents/source/addemup Reading symbols for shared libraries . done Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_PROTECTION_FAILURE at address: 0x00000000 0x978c6168 in strcpy () (gdb) 

So, gdb knows something. Not a very intelligible something at this point, but something none the less. That address 0x00000000 is most disturbing (above and beyond the errors). It's almost impossibly unlikely that a program would actually have data at location zero. If it had any other value, I might suspect a logic error, but an address of zero for anything suggests that something somewhere isn't being assigned to what it should be. Let's see whether gdb can be a bit more informative.

 (gdb) where #0  0x978c6168 in strcpy () #1  0x00002bb0 in main (argc=1, argv=0xbffffcb6) at addme.c:23 (gdb) 

gdb says the program broke in a procedure named strcpy, which was called from a procedure named main, in line 23 of our file addme.c. Depending on the compiler version and gdb version, you might also see a line or two for start(), which is Mac OS X and gdb initializing and starting the program. Let's take a look at the region of the code in your file (line 23) that gdb indicates was the last place that things were working.

 (gdb) l 23 18        char infilename[8]; 19        int j; 20        FILE *infile; 21        char number[100]; 22        char *infilename2=&infilename; 23        strcpy(infilename2,argv[1]); 24        i=0; j=0; 25        infile = fopen(infilename2,""r""); 26 27        if(infile==NULL) (gdb) 

Line 23 has a function strcpy on it (this C function copies the contents of one character array [string] variable to another). The debugger seems to be on to something here. Let's set a breakpoint (a place we want the program to stop running and wait for us) at line 23 and see what happens.

TIP

C functions have man page entries too. You can get documentation on most anything you see as a function in a C program like this. If you're not a programmer, the meat of the documentation might not be much use to you, but for something like strcpy, knowing that the function is supposed to copy the contents of one argument into another can be useful when looking at debugging output.


 (gdb) b 23 Breakpoint 1 at 0x2b9c: file addme.c, line 23. (gdb) 

So far, so good. Now let's run the program again and see where this takes us.

 (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /Users/software/Documents/source/addemup numbers Breakpoint 1, main (argc=1, argv=0xbffffcb6) at addme.c:23 23        strcpy(infilename2,argv[1]); (gdb) 

Note that gdb asked me whether I wanted to restart from the beginning, and I told it to go ahead. Now it has run up to our breakpoint and is waiting for me to do something. Even if I don't know what strcpy does, there's still something obviously wrong with this line. I know I've got a variable named infilename2 and a funny variable named argv[1]. Let's see what gdb has to say about them.

 (gdb) p infilename2 $1 = 0xbffffb58 "????" (gdb) 

The $1 indicates that it's telling us about the first variable we asked about. The 0xbfff99c is the memory location where it's stored. Don't be surprised if yours is different, or your version of gdb doesn't show the location without tweaking some customization settings the default behavior depends on a number of factors outside the scope of this discussion. If you use gdb much, you'll pick up how to set your configuration to display the data you like in the format you want. The built-in help system is, well, helpful here. The ???? is the current content of this variable, which, other than not being anything you might expect, might appear to be meaningless. Depending on your gdb settings, you might see other gibberish instead, which will probably look something like L\000\000@. Other than the fact that there's nothing interpretable at that memory location, this might not seem to be too informative yet. It will make more sense shortly. (Don't be surprised if yours has something else in whatever memory location shows up on your machine.)

Moving on, what can we tell about this argv[1] variable?

 (gdb) p argv[1] $2 = 0x0 (gdb) 

Hmmm… 0x0 is a hexadecimal 0, or NULL in the C world. Examining the code again certainly suggests that something useful should be happening here. It looks as if infilename2 gets used to open a file in just a few lines, and neither L\000\000@ nor NULL looks promising as a filename. Nulls are used in C, but frequently they're signs of a problem, so let's think about this.

The program is trying to do something with a variable named argv[1]. The only other place this variable (argv) appears is in the main statement the statement that starts the actual program execution. It certainly looks as if there should be something other than a NULL here. Wait a minute, what did it say in the comments at the top? It said I need to give it a filename at the command line when I run it! I didn't give it a filename, and it's trying to copy something that doesn't exist to get one. Aren't programmers supposed to check for that?

Let's see whether I'm right. I'll rerun the program with a filename this time.

 (gdb) r numbers The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /Users/software/Documents/source/addemup numbers Breakpoint 1, main (argc=2, argv=0xbffffca2) at addme.c:23 23        strcpy(infilename2,argv[1]); (gdb) 

I started it over, but I forgot to turn off my breakpoint. Still, this is a good opportunity for me to check whether I was right.

 (gdb) p infilename2 $3 = 0xbffffb38 "???\236" (gdb) 

That's different, but just as useless as before.

 (gdb) p argv[1] $4 = 0xbffffd3a "numbers" (gdb) 

Now we're getting somewhere! If we remember to give it a filename, it actually gets one! To continue past the breakpoint, I can enter c.

 (gdb) c Continuing. Your total is 31 Program exited normally. (gdb) 

The program now does exactly what it should. If I want to test it again without stopping at the breakpoint, I can delete the breakpoint and run it again.

 (gdb) d 1 (gdb) r Starting program: /Users/software/Documents/source/addemup numbers Your total is 31 Program exited normally. (gdb) 

The command d 1 deletes breakpoint 1 (you can have multiple breakpoints if you need them). Note that I didn't have to give it the command-line argument numbers this time when I entered r because gdb conveniently remembers command-line arguments between runs. As you can see, it runs properly to completion.

Quitting gdb with the quit command and trying it on the command line produces the same results.

 brezup:software source $ ./addemup numbers Your total is 31 brezup:software source $ 

Now let's see whether we can demonstrate another type of error. Do you still remember what your very long filename is? Try using that filename instead of numbers and see what happens.

 brezup:software source $ ./addemup supercalifragilisticzowie couldn't open file supercal please try again brezup:software source $ 

Huh? I didn't call it supercal. Something happened to my filename. Time to break out gdb again and have another look.

 brezup:software source $ gdb ./addemup GNU gdb 5.3-20030128 (Apple version gdb-365) (Sun Oct 24 12:57:07 GMT 2004) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB.  Type "show warranty" for details. This GDB was configured as "powerpc-apple-darwin"... Reading symbols for shared libraries  .. done (gdb) r supercalifragilisticzowie Starting program: addemup supercalifragilisticzowie Reading symbols for shared libraries .. done couldn't open file supercal please try again Program exited with code 01. (gdb) 

Basically, it says the same thing. There must be something more we can find out, though. Let's look at the code and see whether we can figure out where that weird truncation came from.

 (gdb) l 13      void main(argc,argv) 14      int argc; 15      char *argv[]; 16      { 17        int i; 18        char infilename[8]; 19        int j; 20        FILE *infile; 21        char number[100]; 22        char *infilename2=infilename; (gdb) 23        strcpy(infilename2,argv[1]); 24        i=0; j=0; 25        infile = fopen(infilename2,"r"); 26 27        if(infile==NULL) 28        { 29           printf("couldn't open file %s please try again\n",infilename2); 30           exit(1); 31        } 32 (gdb) 

Line 29 seems to be where the error message is coming from. Let's set a breakpoint there and see what happens.

 (gdb) b 29 Breakpoint 1 at 0x2be4: file addme.c, line 29. (gdb) r Starting program: addemup supercalifragilisticzowie Breakpoint 1, main (argc=2, argv=0xbffffc80) at addme.c:29 29           printf("couldn't open file %s please try again\n",infilename2); (gdb) 

We're at our breakpoint. infilename2 is supposed to be supercalifragilisticzowie, and it is…

 (gdb) p infilename2 $1 = 0xbffffb18 "supercal" (gdb) 

…not!. Something's wrong here! Time to back up to our trusty breakpoint at line 23 and watch what happens from the top down.

 (gdb) b 23 Breakpoint 2 at 0x2b9c: file addme.c, line 23. (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: addemup supercalifragilisticzowie Breakpoint 2, main (argc=2, argv=0xbffffd2c) at addme.c:23 23        strcpy(infilename2,argv[1]); (gdb) p argv[1] $2 = 0xbffffd18 "supercalifragilisticzowie" (gdb) p infilename $3 = "???|\000\000\000\f" 

So, the previous culprit isn't a problem here.

 (gdb) p infilename2 $4 = 0xbffffb18 "???|" (gdb) 

There's nothing interesting there. Let's see what happens on the next line use the gdb command n to step to the next line. Before you step to the next line, the line you are currently on executes, so you should expect to see the results of that strcpy on line 23 after stepping forward to line 24.

 (gdb) n 24        i=0; j=0; (gdb) p infilename2 $6 = 0xbffffb18 "supercalifragilisticzowie" (gdb) 

As expected, infilename2 contains our atrociously long filename. Nothing wrong here now, but when we ran all the way through, by the time execution hit line 29, it was broken. Let's step forward again and see what happens.

 (gdb) n 25        infile = fopen(infilename2,""r""); (gdb) p infilename2 $8 = 0xbffffb18 "supercal" (gdb) 

Wait a minute! Now it's wrong! What happened? All that the program did between those two lines was to assign both the variables i and j to be zero, and somehow it affected infilename2. You wouldn't think this could happen, variables just changing their values willy-nilly.

In fact, if the program were written properly, this wouldn't happen. As a nonprogrammer, this is where you usually give up. That isn't to say that the exercise has been useless. With this information, you can more easily explain to the author or online support community what problems you've observed so that they can fix it more easily and quickly. Program authors hate it when they get bug reports that say, "it didn't work." This doesn't mean anything to them because if they could duplicate the problem on their end, they'd probably have found and fixed it already. Instead, you say, "it doesn't work because for some reason, between line 23 and line 25, the infilename2 variable contents get tromped on," and help the author find the bug by localizing it to a very small portion of the code.

By taking these extra steps, the information you can provide about the program's problems can mean the difference between a fix that takes a few minutes to appear and a fix that never appears.

NOTE

If you're curious, and you keep a C handbook around, fixing this particular error isn't that difficult. The error here is that the variable infilename has been defined to hold only eight characters. infilename2 is essentially an alias to infilename1 and is needed to fool the debugger into not telling you about the problem immediately. The assignment of the very long filename to infilename2 actually works most of the time, despite the fact that there theoretically isn't enough room to hold it. It works because there's enough slop in the assignment of memory space that it's unlikely that it will write over anything important, although the supercalifragilisticzowie value hangs out the end of it and into unknown memory space, and as long as it doesn't write over anything important, the operating system doesn't really care what a program does in its own memory space.

The thing that actually makes the error show up almost all the time is the placement of the definitions of i and j around the definition of infilename. Most compilers will arrange variables in memory in the same order they were defined in the program. Because the compiler doesn't know you're going to stuff a huge string into infilename, it chooses memory close to infilename for the storage of i and j. With optimization turned off, most compilers will place i and j flanking infilename in memory, and a sufficiently long value in infilename will overlap the memory used by i and j. By assigning both i and j to 0 after assigning infilename, it's almost guaranteed that part of infilename will be damaged and that the program will fail. To fix the program so that this can't occur with any reasonable filename, simply change the definition of infilename to something like char infilename[256]; instead of char infilename[8];.


DEBUGGING IN GORY DETAIL

Although details of kernel process tracing are a subject for a book on advanced system programming, experienced programmers reading here might be interested in knowing that Mac OS X ships with kernel tracing enabled. The kTRace command writes kernel trace logs for processes and the kdump command reads ktrace logs and formats them into human-readable output. For the nonprogrammer interested in seeing just what the operating system is doing when a program is running (as opposed to the program-centric view shown by gdb), run kTRace on a command such as ls (kTRace ls; kdump | less). Working through the meaning of each call used in a program such as ls, by using the man pages to look up functions, is an excellent excercise if you're interested in really understanding how the system works.

This command can also be a real lifesaver when you have a program that fails with some cryptic "can't open file" error message, but doesn't inform you as to what file it was trying to open. Digging through the kTRace output can sometimes give you enough clues to determine, for example, that the program was trying to read a configuration file that has gone missing or write a log file into a directory that doesn't exist. That information is sometimes enough to let you find or reconstruct the data that the program needs to get up and running again.


     < Day Day Up > 


    Mac OS X Tiger Unleashed
    Mac OS X Tiger Unleashed
    ISBN: 0672327465
    EAN: 2147483647
    Year: 2005
    Pages: 251

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net