13.1. GNU Debugger (GDB)If you spend much time developing Linux applications, you will undoubtedly spend many hours getting to know the GNU Debugger. GDB is arguably the most important tool in the developer's toolbox. It has a long history, and its capabilities have blossomed to include low-level hardware-specific debugging support for a wide variety of architectures and microprocessors. It should be noted that the user manual for GDB is nearly as large as this book. Our intention here is to introduce GDB to get you started. You are encouraged to study the user manual referenced later under Section 13.7.1, "Suggestions for Additional Reading." Because this is a book about embedded Linux development, we use a version of GDB that has been compiled as a cross-debugger. That is, the debugger itself runs on your development host, but it understands binary executables in the architecture for which it was configured at compile time. In the next few examples, we use GDB compiled for a Red Hat Linux-compatible development host, and an XScale (ARM) target processor. Although we use the short name gdb, we are presenting examples based on the XScale-enabled cross-gdb from the Monta Vista embedded Linux distribution for ARM XScale. The binary is called xscale_be-gdb. It is still GDB, simply configured for a cross-development environment. The GDB debugger is a complex program with many configuration options during the build process. It is not our intention to provide guidance on building gdbthat has been covered in other literature. For the purposes of this chapter, we assume that you have obtained a working GDB configured for the architecture and host development environment you will be using. 13.1.1. Debugging a Core DumpOne of the most common reasons to drag GDB out of the toolbox is to evaluate a core dump. It is quick and easy, and often leads to immediate identification of the offending code. A core dump results when an application program generates a fault, such as accessing a memory location that it does not own. Many conditions can trigger a core dump,[1] but SIGSEGV (segmentation fault) is by far the most common. A SIGSEGV is a Linux kernel signal that is generated on illegal memory accesses by a user process. When this signal is generated, the kernel terminates the process. The kernel then dumps a core image, if so enabled.
To enable generation of a core dump, your process must have the resource limits to enable a core dump. This is achieved by setting the process's resource limits using the setrlimit() function call, or from a BASH or BusyBox shell command prompt, using ulimit. It is not uncommon to find the following line in the initialization scripts of an embedded system to enable the generation of core dumps on process errors: $ ulimit -c unlimited This BASH built-in command is used to set the size limit of a core dump. In the previous instance, the size is set to unlimited. When an application program generates a segmentation fault (for example, by writing to a memory address outside its permissible range), Linux terminates the process and generates a core dump, if so enabled. The core dump is a snapshot of the running process at the time the segmentation fault occurred. It helps to have debugging symbols enabled in your binary. GDB produces much more useful output with debugging symbols (gcc -g) enabled during the build. However, it is still possible to determine the sequence of events leading to the segmentation fault, even if the binary was compiled without debugging symbols. You might need to do a bit more investigative work without the aid of debugging symbols. You must manually correlate virtual addresses to locations within your program. Listing 13-1 shows the results of a core dump analysis session using GDB. The output has been reformatted slightly to fit the page. We have used some demonstration software to intentionally produce a segmentation fault. Here is the output of the process (called webs) that generated the segmentation fault: root@coyote:/workspace/websdemo# ./webs Segmentation fault (core dumped) Listing 13-1. Core Dump Analysis Using GDB
13.1.2. Invoking GDBThe first line of Listing 13-1 shows how GDB was invoked from the command line. Because we are doing cross-debugging, we need the cross-version of GDB that has been compiled for our host and target system. We invoke our version of cross-gdb as shown and pass xscale_be-gdb the name of the binary followed by the name of the core dump filein this case, simply core. After GDB prints several banner lines describing its configuration and other information, it prints the reason for the termination: signal 11, the indication of a segmentation fault.[2] Several lines follow as GDB loads the binary, the libraries it depends on, and the core file. The last line printed upon GDB startup is the current location of the program when the fault occurred. The line preceded by the #0 string indicates the stack frame (stack frame zero in a function called ClearBlock() at virtual address 0x00012ac4). The following line preceded by 43 is the line number of the offending source line from a file called led.c. From there, GDB displays its command prompt and waits for input.
To provide some context, we enter the gdb list command, using its abbreviated form l. GDB recognizes command abbreviations where there is no ambiguity. Here the program error begins to present itself. The offending line, according to GDB's analysis of the core dump is: 43 *ptr = 0; Next we issue the gdb print command on the ptr variable, again abbreviated as p. As you can see from Listing 13-1, the value of the pointer ptr is 0. So we conclude that the reason for the segmentation fault is the dereference of a null pointer, a common programming error. From here, we can elect to use the backtrace command to see the call chain leading to this error, which might lead us back to the actual source of the error. Listing 13-2 displays these results. Listing 13-2. Backtrace Command
The backtrace displays the call chain all the way back to main(), the start of the user's program. A stack frame number precedes each line of the backtrace. You can switch to any given stack frame using the gdb frame command. Listing 13-3 is an example of this. Here we switch to stack frame 2 and display the source code in that frame. As in the previous examples, the lines preceded with (gdb) are the commands we issue to GDB, and the other lines are the GDB output. Listing 13-3. Moving Around Stack Frames in GDB
As you can see, with a little help from the source code available using the list command, it would not be difficult to trace the code back to the source of the errant null pointer. In fact, the astute reader will notice the source of the segmentation fault we have produced for this example. From Listing 13-3, we see that the check of the return value in the call to malloc() has been commented out. In this example, the malloc() call failed, leading to the operation on a null pointer two frames later in the call chain. Although this example is both contrived and trivial, many crashes of this type are remarkably easy to track down using a similar method with GDB and core dumps. You can also see the null pointer by looking at the parameter values in the function call. This often leads you directly to the frame where the null pointer originated. 13.1.3. Debug Session in GDBWe conclude this introduction to GDB by showing a typical debug session. In the previous demonstration of a program crash, we could have elected to step through the code to narrow down the cause of the failure. Of course, if you get a core dump, you should always start there. However, in other situations, you might want to set breakpoints and step through running code. Listing 13-4 details how we start GDB in preparation for a debug session. Note that the program must have been compiled with the debug flag enabled in the gcc command line for GDB to be useful in this context. Refer back to Figure 12-1 in Chapter 12, "Embedded Development Environment"; this is a cross-debug session with GDB running on your development host, debugging a program running on your target. We cover complete details of remote application debugging in Chapter 15, "Debugging Embedded Linux Applications." Listing 13-4. Initiating a GDB Debug Session
Following through this simple debug session, first we connect to our target board using the gdb target command. We cover remote debugging in more detail in Chapter 15. When we are connected to our target hardware, we set a breakpoint at main() using the gdb break (abbreviated b) command. Then we issue the gdb continue (abbreviated c) command to resume execution of the program. If we had any program arguments, we could have issued them on the command line when we invoked GDB. We hit the breakpoint set at main(), and set another one at ErrorInHandler(), followed by the continue command, again abbreviated. When this new breakpoint is hit, we begin to step through the code using the next command. There we encounter the call to malloc(). Following the malloc() call, we examine the return value and discover the failure as indicated by the null return value. Finally, we print the value of the parameter in the malloc() call and see that a very large memory region (100 million bytes) is being requested, which fails. Although trivial, the GDB examples in this section should enable the newcomer to become immediately productive with GDB. Few of us have really mastered GDBit is very complex and has many capabilities. Later in Section 13.2, "Data Display Debugger," we introduce a graphical front end to GDB that can ease the transition for those unfamiliar with GDB. One final note about GDB: No doubt you have noticed the many banner lines GDB displays on the console when it is first invoked, as in Listing 13-1. In these examples, as stated earlier, we used a cross-gdb from the Monta Vista embedded Linux distribution. The banner lines contain a vital piece of information that the embedded developer must be aware of: GDB's host and target specifications. From Listing 13-1, we saw the following output when GDB was invoked: This GDB was configured as "--host=i686-pc-linux-gnu - target=armv5teb-montavista-linuxeabi" In this instance, we were invoking a version of GDB that was compiled to execute from a Linux PCspecifically, an i686 running the GNU/Linux operating system. Equally critical, this instance of GDB was compiled to debug ARM binary code generated from the armv5teb big endian toolchain. One of the most common mistakes made by newcomers to embedded development is to use the wrong GDB while trying to debug target executables. If something isn't working right, you should immediately check your GDB configuration to make sure that it makes sense for your environment. You cannot use your native GDB to debug target code! |