Section 14.3. Debugging the Linux Kernel

14.3. Debugging the Linux Kernel

One of the more common reasons you might find yourself stepping through kernel code is to modify or customize the platform-specific code for your custom board. Let's see how this might be done using the AMCC Yosemite board. We place a breakpoint at the platform-specific architecture setup function and then continue until that breakpoint is encountered. Listing 14-4 shows the sequence.

Listing 14-4. Debugging Architecture-Setup Code

(gdb) b yosemite_setup_arch     Breakpoint 3 at 0xc021a488:         file arch/ppc/platforms/4xx/yosemite.c, line 308. (gdb) c Continuing. Can't send signals to this remote system.  SIGILL not sent. Breakpoint 3, yosemite_setup_arch () at arch/ppc/platforms/4xx/yosemite.c:308 308                  yosemite_set_emacdata(); (gdb) l 303     } 304 305     static void __init 306     yosemite_setup_arch(void) 307     { 308              yosemite_set_emacdata(); 309 310              ibm440gx_get_clocks(&clocks, YOSEMITE_SYSCLK, 6 * 1843200); 311              ocp_sys_info.opb_bus_freq = clocks.opb; 312  (gdb)

When the breakpoint at yosemite_setup_arch() is encountered, control passes to gdb at line 308 of yosemite.c. The list (l) command displays the source listing centered on the breakpoint at line 308. The warning message displayed by gdb after the continue (c) command can be safely ignored. It is part of gdb's way of testing the capabilities of the remote system. It first sends a remote continue_with_signal command to the target. The KGDB implementation for this target board does not support this command; therefore, it is NAK'd by the target. gdb responds by displaying this informational message and issuing the standard remote continue command instead.

14.3.1. gdb Remote Serial Protocol

gdb includes a debug switch that enables us to observe the remote protocol being used between gdb on your development host and the target. This can be very useful for understanding the underlying protocol, as well as troubleshooting targets that exhibit unusual or errant behavior. To enable this debug mode, issue the following command:

(gdb) set debug remote 1

With remote debugging enabled, it is instructive to observe the continue command in action and the steps taken by gdb. Listing 14-5 illustrates the use of the continue command with remote debugging enabled.

Listing 14-5. `continue` Remote Protocol Example

(gdb) c Continuing. Sending packet: $mc0000000,4#80...Ack Packet received: c022d200 Sending packet: $Mc0000000,4:7d821008#68...Ack Packet received: OK Sending packet: $mc0016de8,4#f8...Ack Packet received: 38600001 Sending packet: $Mc0016de8,4:7d821008#e0...Ack Packet received: OK Sending packet: $mc005bd5c,4#23...Ack Packet received: 38600001 Sending packet: $Mc005bd5c,4:7d821008#0b...Ack Packet received: OK Sending packet: $mc021a488,4#c8...Ack Packet received: 4bfffbad Sending packet: $Mc021a488,4:7d821008#b0...Ack Packet received: OK Sending packet: $c#63...Ack     <<< program running, gdb waiting for event

Although it might look daunting at first, what is happening here is easily understood. In summary, gdb is restoring all its breakpoints on the target. Recall from Listing 14-3 that we entered two breakpoints, one at panic() and one at sys_sync(). Later in Listing 14-4, we added a third breakpoint at yosemite_setup_arch(). Thus, there are three active user-specified breakpoints. These can be displayed by issuing the gdb info breakpoints command. As usual, we use the abbreviated version.

[View full width]

(gdb) i b Num Type Disp Enb Address What 1 breakpoint keep y 0xc0016de8 in panic at kernel/panic.c:74 2 breakpoint keep y 0xc005bd5c in sys_sync at fs/buffer.c:296 3 breakpoint keep y 0xc021a488 in yosemite_setup_arch at arch/ppc/platforms/4xx

/yosemite.c:308 breakpoint already hit 1 time (gdb)

Now compare the previous breakpoint addresses with the addresses in the gdb remote $m packet in Listing 14-5. The $m packet is a "read target memory" command, and the $M packet is a "write target memory" command. Once for each breakpoint, the address of the breakpoint is read from target memory, stored away locally on the host by gdb (so it can be restored later), and replaced with the PowerPC TRap instruction twge r2, r2 (0x7d821008), which results in control passing back to the debugger. Figure 14-4 illustrates this action.

Figure 14-4. `gdb` inserting target memory breakpoints

You might have noticed that gdb is updating four breakpoints, whereas we entered only three. The first one at target memory location 0xc000_0000 is put there by gdb automatically upon startup. This location is the base address of the linked kernel image from the ELF fileessentially, _start. It is equivalent to a breakpoint at main() for user space debugging and is done by gdb automatically. The other three breakpoints are the ones we entered earlier.

The same thing happens in reverse when an event occurs that returns control to gdb. Listing 14-6 details the action when our breakpoint at yosemite_setup_arch() is encountered.

Listing 14-6. Remote Protocol: Breakpoint Hit

Packet received: T0440:c021a488;01:c020ff90; Sending packet: $mc0000000,4#80...Ack  <<< Read memory @c0000000 Packet received: 7d821008 Sending packet: $Mc0000000,4:c022d200#87...Ack  <<< Write memory Packet received: OK Sending packet: $mc0016de8,4#f8...Ack Packet received: 7d821008 Sending packet: $Mc0016de8,4:38600001#a4...Ack Packet received: OK Sending packet: $mc005bd5c,4#23...Ack Packet received: 7d821008 Sending packet: $Mc005bd5c,4:38600001#cf...Ack Packet received: OK Sending packet: $mc021a488,4#c8...Ack Packet received: 7d821008 Sending packet: $Mc021a488,4:4bfffbad#d1...Ack Packet received: OK Sending packet: $mc021a484,c#f3...Ack Packet received: 900100244bfffbad3fa0c022 Breakpoint 3, yosemite_setup_arch () at arch/ppc/platforms/4xx/yosemite.c:308 308              yosemite_set_emacdata(); (gdb)

The $T packet is a gdb Stop Reply packet. It is sent by the target to gdb when a breakpoint is encountered. In our example, the $T packet returned the value of the program counter and register r1.^[4] The rest of the activity is the reverse of that in Listing 14-5. The PowerPC trap breakpoint instructions are removed, and gdb restores the original instructions to their respective memory locations.

^[4] As pointed out earlier, the gdb remote protocol is detailed in the gdb manual cited at the end of this chapter in Section 14.6.1, "Suggestions for Additional Reading."

14.3.2. Debugging Optimized Kernel Code

At the start of this chapter, we said that one of the challenges identified in debugging kernel code results from compiler optimization. We noted that the Linux kernel is compiled by default with optimization level -O2. In the examples up to this point, we used -O1 optimization to simplify the debugging task. Here we illustrate one of the many ways optimization can complicate debugging.

The related Internet mail lists are strewn with questions related to what appear to be broken tools. Sometimes the poster reports that his debugger is single-stepping backward or that his line numbers do not line up with his source code. Here we present an example to illustrate the complexities that optimizing compilers bring to source-level debugging. In this example, the line numbers that gdb reports when a breakpoint is hit do not match up with the line numbers in our source file due to function inlining.

For this demonstration, we use the same debug code snippet as shown in Listing 14-4. However, for this example, we have compiled the kernel with the compiler optimization flag -O2. This is the default for the Linux kernel. Listing 14-7 shows the results of this debugging session.

Listing 14-7. Optimized Architecture-Setup Code

$ ppc_44x-gdb --silent vmlinux (gdb) target remote /dev/ttyS0 Remote debugging using /dev/ttyS0 breakinst () at arch/ppc/kernel/ppc-stub.c:825 825     } (gdb) b panic Breakpoint 1 at 0xc0016b18: file kernel/panic.c, line 74. (gdb) b sys_sync Breakpoint 2 at 0xc005a8c8: file fs/buffer.c, line 296. (gdb) b yosemite_setup_arch Breakpoint 3 at 0xc020f438: file arch/ppc/platforms/4xx/yosemite.c, line 116. (gdb) c Continuing. Breakpoint 3, yosemite_setup_arch ()     at arch/ppc/platforms/4xx/yosemite.c:116 116             def = ocp_get_one_device(OCP_VENDOR_IBM, OCP_FUNC_EMAC, 0); (gdb) l 111             struct ocp_def *def; 112             struct ocp_func_emac_data *emacdata; 113 114             /* Set mac_addr and phy mode for each EMAC */ 115 116             def = ocp_get_one_device(OCP_VENDOR_IBM, OCP_FUNC_EMAC, 0); 117             emacdata = def->additions; 118             memcpy(emacdata->mac_addr, __res.bi_enetaddr, 6); 119             emacdata->phy_mode = PHY_MODE_RMII; 120 (gdb) p yosemite_setup_arch $1 = {void (void)} 0xc020f41c <yosemite_setup_arch>

Referring back to Listing 14-4, notice that the function yosemite_setup_arch() actually falls on line 306 of the file yosemite.c. Compare that with Listing 14-7. We hit the breakpoint, but gdb reports the breakpoint at file yosemite.c line 116. It appears at first glance to be a mismatch of line numbers between the debugger and the corresponding source code. Is this a gdb bug? First let's confirm what the compiler produced for debug information. Using the readelf^[5] tool described in Chapter 13, "Development Tools," we can examine the debug information for this function produced by the compiler.

^[5] Remember to use your cross-version of readelffor example, ppc_44x-readelf for the PowerPC 44x architecture.

$ ppc_44x-readelf --debug-dump=info vmlinux | grep -u6 \   yosemite_setup_arch | tail -n 7     DW_AT_name        : (indirect string, offset: 0x9c04): yosemite_setup_arch     DW_AT_decl_file   : 1     DW_AT_decl_line   : 307     DW_AT_prototyped  : 1     DW_AT_low_pc      : 0xc020f41c     DW_AT_high_pc     : 0xc020f794     DW_AT_frame_base  : 1 byte block: 51       (DW_OP_reg1)

We don't have to be experts at reading DWARF2 debug records^[6] to recognize that the function in question is reported at line 307 in our source file. We can confirm this using the addr2line utility, also introduced in Chapter 13. Using the address derived from gdb in Listing 14-7:

^[6] A reference for the Dwarf debug specification appears at the end of this chapter in Section 14.6.1, "Suggestions for Additional Reading."

$ ppc_44x-addr2line -e vmlinux 0xc020f41c arch/ppc/platforms/4xx/yosemite.c:307

At this point, gdb is reporting our breakpoint at line 116 of the yosemite.c file. To understand what is happening, we need to look at the assembler output of the function as reported by gdb. Listing 14-8 is the output from gdb after issuing the disassemble command on the yosemite_setup_arch() function.

Listing 14-8. Disassemble Function `yosemite_setup_arch`

(gdb) disassemble yosemite_setup_arch 0xc020f41c <yosemite_setup_arch+0>:     mflr    r0 0xc020f420 <yosemite_setup_arch+4>:     stwu    r1,-48(r1) 0xc020f424 <yosemite_setup_arch+8>:     li      r4,512 0xc020f428 <yosemite_setup_arch+12>:    li      r5,0 0xc020f42c <yosemite_setup_arch+16>:    li      r3,4116 0xc020f430 <yosemite_setup_arch+20>:    stmw    r25,20(r1) 0xc020f434 <yosemite_setup_arch+24>:    stw     r0,52(r1) 0xc020f438 <yosemite_setup_arch+28>:    bl      0xc000d344 <ocp_get_one_device> 0xc020f43c <yosemite_setup_arch+32>:    lwz     r31,32(r3) 0xc020f440 <yosemite_setup_arch+36>:    lis     r4,-16350 0xc020f444 <yosemite_setup_arch+40>:    li      r28,2 0xc020f448 <yosemite_setup_arch+44>:    addi    r4,r4,21460 0xc020f44c <yosemite_setup_arch+48>:    li      r5,6 0xc020f450 <yosemite_setup_arch+52>:    lis     r29,-16350 0xc020f454 <yosemite_setup_arch+56>:    addi    r3,r31,48 0xc020f458 <yosemite_setup_arch+60>:    lis     r25,-16350 0xc020f45c <yosemite_setup_arch+64>:    bl      0xc000c708 <memcpy> 0xc020f460 <yosemite_setup_arch+68>:    stw     r28,44(r31) 0xc020f464 <yosemite_setup_arch+72>:    li      r4,512 0xc020f468 <yosemite_setup_arch+76>:    li      r5,1 0xc020f46c <yosemite_setup_arch+80>:    li      r3,4116 0xc020f470 <yosemite_setup_arch+84>:    addi    r26,r25,15104 0xc020f474 <yosemite_setup_arch+88>:    bl      0xc000d344 <ocp_get_one_device> 0xc020f478 <yosemite_setup_arch+92>:    lis     r4,-16350 0xc020f47c <yosemite_setup_arch+96>:    lwz     r31,32(r3) 0xc020f480 <yosemite_setup_arch+100>:   addi    r4,r4,21534 0xc020f484 <yosemite_setup_arch+104>:   li      r5,6 0xc020f488 <yosemite_setup_arch+108>:   addi    r3,r31,48 0xc020f48c <yosemite_setup_arch+112>:   bl      0xc000c708 <memcpy> 0xc020f490 <yosemite_setup_arch+116>:   lis     r4,1017 0xc020f494 <yosemite_setup_arch+120>:   lis     r5,168 0xc020f498 <yosemite_setup_arch+124>:   stw     r28,44(r31) 0xc020f49c <yosemite_setup_arch+128>:   ori     r4,r4,16554 0xc020f4a0 <yosemite_setup_arch+132>:   ori     r5,r5,49152 0xc020f4a4 <yosemite_setup_arch+136>:   addi    r3,r29,-15380 0xc020f4a8 <yosemite_setup_arch+140>:   addi    r29,r29,-15380 0xc020f4ac <yosemite_setup_arch+144>:   bl      0xc020e338 <ibm440gx_get_clocks> 0xc020f4b0 <yosemite_setup_arch+148>:   li      r0,0 0xc020f4b4 <yosemite_setup_arch+152>:   lis     r11,-16352 0xc020f4b8 <yosemite_setup_arch+156>:   ori     r0,r0,50000 0xc020f4bc <yosemite_setup_arch+160>:   lwz     r10,12(r29) 0xc020f4c0 <yosemite_setup_arch+164>:   lis     r9,-16352 0xc020f4c4 <yosemite_setup_arch+168>:   stw     r0,8068(r11) 0xc020f4c8 <yosemite_setup_arch+172>:   lwz     r0,84(r26) 0xc020f4cc <yosemite_setup_arch+176>:   stw     r10,8136(r9) 0xc020f4d0 <yosemite_setup_arch+180>:   mtctr   r0 0xc020f4d4 <yosemite_setup_arch+184>:   bctrl 0xc020f4d8 <yosemite_setup_arch+188>:   li      r5,64 0xc020f4dc <yosemite_setup_arch+192>:   mr      r31,r3 0xc020f4e0 <yosemite_setup_arch+196>:   lis     r4,-4288 0xc020f4e4 <yosemite_setup_arch+200>:   li      r3,0 0xc020f4e8 <yosemite_setup_arch+204>:   bl      0xc000c0f8 <ioremap64> End of assembler dump. (gdb)

Once again, we need not be PowerPC assembly language experts to understand what is happening here. Notice the labels associated with the PowerPC bl instruction. This is a function call in PowerPC mnemonics. The symbolic function labels are the important data points. After a cursory analysis, we see several function calls near the start of this assembler listing:

Address	Function
0xc020f438	ocp_get_one_device()
0xc020f45c	memcpy()
0xc020f474	ocp_get_one_device()
0xc020f48c	memcpy()
0xc020f4ac	ibm440gx_get_clocks()

Listing 14-9 reproduces portions of the source file yosemite.c. Correlating the functions we found in the gdb disassemble output, we see those labels occurring in the function yosemite_set_emacdata(), around the line numbers reported by gdb when the breakpoint at yosemite_setup_arch() was encountered. The key to understanding the anomaly is to notice the subroutine call at the very start of yosemite_setup_arch(). The compiler has inlined the call to yosemite_set_emacdata() instead of generating a function call, as would be expected by simple inspection of the source code. This inlining produced the mismatch in the line numbers when gdb hit the breakpoint. Even though the yosemite_set_emacdata() function was not declared using the inline keyword, GCC inlined the function as a performance optimization.

Listing 14-9. Portions of Source File `yosemite.c`

109 static void __init yosemite_set_emacdata(void) 110 { 111         struct ocp_def *def; 112         struct ocp_func_emac_data *emacdata; 113 114         /* Set mac_addr and phy mode for each EMAC */ 115 116         def = ocp_get_one_device(OCP_VENDOR_IBM, OCP_FUNC_EMAC, 0); 117         emacdata = def->additions; 118         memcpy(emacdata->mac_addr, __res.bi_enetaddr, 6); 119         emacdata->phy_mode = PHY_MODE_RMII; 120 121         def = ocp_get_one_device(OCP_VENDOR_IBM, OCP_FUNC_EMAC, 1); 122         emacdata = def->additions; 123         memcpy(emacdata->mac_addr, __res.bi_enet1addr, 6); 124         emacdata->phy_mode = PHY_MODE_RMII; 125 } 126 ... 304 305 static void __init 306 yosemite_setup_arch(void) 307 { 308         yosemite_set_emacdata(); 309 310         ibm440gx_get_clocks(&clocks, YOSEMITE_SYSCLK, 6 * 1843200); 311         ocp_sys_info.opb_bus_freq = clocks.opb; 312 313         /* init to some ~sane value until calibrate_delay() runs */ 314         loops_per_jiffy = 50000000/HZ; 315 316         /* Setup PCI host bridge */ 317         yosemite_setup_hose(); 318 319 #ifdef CONFIG_BLK_DEV_INITRD 320        if (initrd_start) 321                ROOT_DEV = Root_RAM0; 322         else 323 #endif 324 #ifdef CONFIG_ROOT_NFS 325                 ROOT_DEV = Root_NFS; 326 #else 327                 ROOT_DEV = Root_HDA1; 328 #endif 329 330         yosemite_early_serial_map(); 331 332         /* Identify the system */ 333         printk( "AMCC PowerPC " BOARDNAME " Platform\n" ); 334 } 335

To summarize the previous discussion:

We entered a breakpoint in gdb at yosemite_setup_arch().
When the breakpoint was hit, we found ourselves at line 116 of the source file, which was far removed from the function where we defined the breakpoint.
We produced a disassembly listing of the code at yosemite_setup_arch() and discovered the labels to which this sequence of code was branching.
Comparing the labels back to our source code, we discovered that the compiler had placed the yosemite_set_emacdata() subroutine inline with the function where we entered a breakpoint, causing potential confusion.

This explains the line numbers reported by gdb when the original breakpoint in yosemite_setup_arch() was hit.

Compilers employ many different kinds of optimization algorithms. This example presented but one: function inlining. Each can confuse a debugger (the human and the machine) in a different way. The challenge is to understand what is happening at the machine level and translate that into what we as developers had intended. You can see now the benefits of using the minimum possible optimization level for debugging.

14.3.3. gdb User-Defined Commands

You might already realize that gdb looks for an initialization file on startup, called .gdbinit. When first invoked, gdb loads this initialization file (usually found in the user's home directory) and acts on the commands within it. One of my favorite combinations is to connect to the target system and set initial breakpoints. In this case, the contents of .gdbinit would look like Listing 14-10.

Listing 14-10. Simple `gdb` Initialization File

$ cat ~/.gdbinit set history save on set history filename ~/.gdb_history set output-radix 16 define connect #   target remote bdi:2001     target remote /dev/ttyS0     b panic     b sys_sync end

This simple .gdbinit file enables the storing of command history in a user-specified file and sets the default output radix for printing of values. Then it defines a gdb user-defined command called connect. (User-defined commands are also often called macros.) When issued at the gdb command prompt, gdb connects to the target system via the desired method and sets the system breakpoints at panic() and sys_sync(). One method is commented out; we discuss this method shortly in Section 14.4.

There is no end to the creative use of gdb user-defined commands. When debugging in the kernel, it is often useful to examine global data structures such as task lists and memory maps. Here we present several useful gdb user-defined commands capable of displaying specific kernel data that you might need to access during your kernel debugging.

14.3.4. Useful Kernel gdb Macros

During kernel debugging, it is often useful to view the processes that are running on the system, as well as some common attributes of those processes. The kernel maintains a linked list of tasks described by struct task_struct. The address of the first task in the list is contained in the kernel global variable init_task, which represents the initial task spawned by the kernel during startup. Each task contains a struct list_head, which links the tasks in a circular linked list. These two ubiquitous kernel structures are described in the following header files:

struct task_struct            .../include/linux/sched.h struct list_head              .../include/linux/list.h

Using gdb macros, we can traverse the task list and display useful information about the tasks. It is easy to modify the macros to extract the data you might be interested in. It is also a very useful tool for learning the details of kernel internals.

The first macro we examine (in Listing 14-11) is a simple one that searches the kernel's linked list of task_struct structures until it finds the given task. If it is found, it displays the name of the task.

Listing 14-11. `gdb find_task` Macro

 1 # Helper function to find a task given a PID or the  2 # address of a task_struct.  3 # The result is set into $t  4 define find_task  5   # Addresses greater than _end: kernel data...  6   # ...user passed in an address  7   if ((unsigned)$arg0 > (unsigned)&_end)  8     set $t=(struct task_struct *)$arg0  9   else 10     # User entered a numeric PID 11     # Walk the task list to find it 12     set $t=&init_task 13     if (init_task.pid != (unsigned)$arg0) 14       find_next_task $t 15       while (&init_task!=$t && $t->pid != (unsigned)$arg0) 16         find_next_task $t 17       end 18       if ($t == &init_task) 19         printf "Couldn't find task; using init_task\n" 20       end 21     end 22   end 23   printf "Task \"%s\":\n", $t->comm 24 end

Place this text into your .gdbinit file and restart gdb, or source^[7] it using gdb's source command. (We explain the find_next_task macro later in Listing 14-15.) Invoke it as follows:

^[7] A helpful shortcut for macro development is the gdb source command. This command opens and reads a source file containing macro definitions.

(gdb) find_task 910    Task "syslogd":

(gdb) find_task 0xCFFDE470    Task "bash":

Line 4 defines the macro name. Line 7 decides whether the input argument is a PID (numeric entry starting at zero and limited to a few million) or a task_struct address that must be greater than the end of the Linux kernel image itself, defined by the symbol _end.^[8] If it's an address, the only action required is to cast it to the proper type to enable dereferencing the associated task_struct. This is done at line 8. As the comment in line 3 states, this macro returns a gdb convenience variable typecasted to a pointer to a struct task_struct.

^[8] The symbol _end is defined in the linker script file during the final link.

If the input argument is a numeric PID, the list is traversed to find the matching task_struct. Lines 12 and 13 initialize the loop variables (gdb does not have a for statement in its macro command language), and lines 15 through 17 define the search loop. The find_next_task macro is used to extract the pointer to the next task_struct in the linked list. Finally, if the search fails, a sane return value is set (the address of init_task) so that it can be safely used in other macros.

Building on the find_task macro in Listing 14-11, we can easily create a simple ps command that displays useful information about each process running on the system.

Listing 14-12 defines a gdb macro that displays interesting information from a running process, extracted from the struct task_struct for the given process. It is invoked like any other gdb command, by typing its name followed by any required input parameters. Notice that this user-defined command requires a single argument, either a PID or the address of a task_struct.

Listing 14-12. `gdb` Macro: Print Process Information

 1 define ps  2   # Print column headers  3   task_struct_header  4   set $t=&init_task  5   task_struct_show $t  6   find_next_task $t  7   # Walk the list  8   while &init_task!=$t  9     # Display useful info about each task 10     task_struct_show $t 11     find_next_task $t 12   end 13 end 14 15 document ps 16 Print points of interest for all tasks 17 end

This ps macro is similar to the find_task macro, except that it requires no input arguments and it adds a macro (task_struct_show) to display the useful information from each task_struct. Line 3 prints a banner line with column headings. Lines 4 through 6 set up the loop and display the first task. Lines 8 through 11 loop through each task, calling the task_struct_show macro for each.

Notice also the inclusion of the gdb document command. This allows the gdb user to get help by issuing the help ps command from the gdb command prompt as follows:

(gdb) help ps    Print points of interest for all tasks

Listing 14-13 displays the output of this macro on a target board running only minimal services.

Listing 14-13. `gdb ps` Macro Output

(gdb) ps Address      PID State      User_NIP  Kernel-SP  device comm 0xC01D3750     0 Running              0xC0205E90 (none) swapper 0xC04ACB10     1 Sleeping  0x0FF6E85C 0xC04FFCE0 (none) init 0xC04AC770     2 Sleeping             0xC0501E90 (none) ksoftirqd/0 0xC04AC3D0     3 Sleeping             0xC0531E30 (none) events/0 0xC04AC030     4 Sleeping             0xC0533E30 (none) khelper 0xC04CDB30     5 Sleeping             0xC0535E30 (none) kthread 0xC04CD790    23 Sleeping             0xC06FBE30 (none) kblockd/0 0xC04CD3F0    45 Sleeping             0xC06FDE50 (none) pdflush 0xC04CD050    46 Sleeping             0xC06FFE50 (none) pdflush 0xC054B7B0    48 Sleeping             0xC0703E30 (none) aio/0 0xC054BB50    47 Sleeping             0xC0701E20 (none) kswapd0 0xC054B410   629 Sleeping             0xC0781E60 (none) kseriod 0xC054B070   663 Sleeping             0xCFC59E30 (none) rpciod/0 0xCFFDE0D0   675 Sleeping  0x0FF6E85C 0xCF86DCE0 (none) udevd 0xCF95B110   879 Sleeping  0x0FF0BE58 0xCF517D80 (none) portmap 0xCFC24090   910 Sleeping  0x0FF6E85C 0xCF61BCE0 (none) syslogd 0xCF804490   918 Sleeping  0x0FF66C7C 0xCF65DD70 (none) klogd 0xCFE350B0   948 Sleeping  0x0FF0E85C 0xCF67DCE0 (none) rpc.statd 0xCFFDE810   960 Sleeping  0x0FF6E85C 0xCF5C7CE0 (none) inetd 0xCFC24B70   964 Sleeping  0x0FEEBEAC 0xCF64FD80 (none) mvltd 0xCFE35B90   973 Sleeping  0x0FF66C7C 0xCFEF7CE0 ttyS1  getty 0xCFE357F0   974 Sleeping  0x0FF4B85C 0xCF6EBCE0 (none) in.telnetd 0xCFFDE470   979 Sleeping  0x0FEB6950 0xCF675DB0 ttyp0  bash 0xCFFDEBB0   982<Running   0x0FF6EB6C 0xCF7C3870 ttyp0  sync (gdb)

The bulk of the work done by this ps macro is performed by the task_struct_show macro. As shown in Listing 14-13, the task_struct_show macro displays the following fields from each task_struct:

Address Address of the task_struct for the process
PID Process ID
State Current state of the process
User_NIP Userspace Next Instruction Pointer
Kernel_SP Kernel Stack Pointer
device Device associated with this process
comm Name of the process (or command)

It is relatively easy to modify the macro to show the items of interest for your particular kernel debugging task. The only complexity is in the simplicity of the macro language. Because function equivalents such as strlen do not exist in gdb's user-defined command language, screen formatting must be done by hand.

Listing 14-14 reproduces the task_struct_show macro that produced the previous listing.

Listing 14-14. `gdb task_struct_show` Macro

 1 define task_struct_show  2   # task_struct addr and PID  3   printf "0x%08X %5d", $arg0, $arg0->pid  4  5   # Place a '<' marker on the current task  6   #  if ($arg0 == current)  7   # For PowerPC, register r2 points to the "current" task  8   if ($arg0 == $r2)  9     printf "<" 10   else 11     printf " " 12   end 13 14   # State 15   if ($arg0->state == 0) 16     printf "Running   " 17   else 18     if ($arg0->state == 1) 19       printf "Sleeping  " 20     else 21       if ($arg0->state == 2) 22         printf "Disksleep " 23       else 24         if ($arg0->state == 4) 25           printf "Zombie    " 26         else 27           if ($arg0->state == 8) 28             printf "sTopped   " 29           else 30             if ($arg0->state == 16) 31               printf "Wpaging   " 32             else 33               printf "%2d        ", $arg0->state 34             end 35           end 36         end 37       end 38     end 39   end 40 41   # User NIP 42   if ($arg0->thread.regs) 43     printf "0x%08X ", $arg0->thread.regs->nip 44   else 45     printf "           " 46   end 47 48   # Display the kernel stack pointer 49   printf "0x%08X ", $arg0->thread.ksp 50 51   # device 52   if ($arg0->signal->tty) 53     printf "%s   ", $arg0->signal->tty->name 54   else 55     printf "(none) " 56   end 57 58   # comm 59   printf "%s\n", $arg0->comm 60 end

Line 3 displays the address of the task_struct. Lines 8 through 12 display the process ID. If this is the current process (the process that was currently running on this CPU at the time the breakpoint was hit), it is marked with a < character.

Lines 14 through 39 decode and display the state of the process. This is followed by displaying the user process next instruction pointer (NIP) and the kernel stack pointer (SP). Finally, the device associated with the process is displayed, followed by the name of the process (stored in the ->comm element of the task_struct.)

It is important to note that this macro is architecture dependent, as shown in lines 7 and 8. In general, macros such as these are highly architecture- and version-dependent. Any time a change in the underlying structure is made, macros such as these must be updated. However, if you spend a lot of time debugging the kernel using gdb, the payback is often worth the effort.

For completeness, we present the find_next_task macro. Its implementation is less than obvious and deserves explanation. (It is assumed that you can easily deduce the task_struct_header that completes the series necessary for the ps macro presented in this section. It is nothing more than a single line arranging the column headers with the correct amount of whitespace.) Listing 14-15 presents the find_next_task macro used in our ps and find_task macros.

Listing 14-15. `gdb find_next_task` Macro

define find_next_task   # Given a task address, find the next task in the linked list   set $t = (struct task_struct *)$arg0   set $offset=( (char *)&$t->tasks - (char *)$t)   set $t=(struct task_struct *)( (char *)$t->tasks.next- (char *)$offset) end

The function performed by this macro is simple. The implementation is slightly less than straightforward. The goal is to return the ->next pointer, which points to the next task_struct on the linked list. However, the task_struct structures are linked by the address of the struct list_head member called tasks, as opposed to the common practice of being linked by the starting address of the task_struct itself. Because the ->next pointer points to the address of the task structure element in the next task_struct on the list, we must subtract to get the address of the top of the task_struct itself. The value we subtract from the ->next pointer is the offset from that pointer's address to the top of task_struct. First we calculate the offset and then we use that offset to adjust the ->next pointer to point to the top of task_struct. Figure 14-5 should make this clear.

Figure 14-5. Task structure list linking

Now we present one final macro that will be useful in the next section when we discuss debugging loadable modules. Listing 14-16 is a simple macro that displays the kernel's list of currently installed loadable modules.

Listing 14-16. `gdb` List Modules Macro

1 define lsmod 2   printf "Address\t\tModule\n" 3   set $m=(struct list_head *)&modules 4   set $done=0 5   while ( !$done ) 6     # list_head is 4-bytes into struct module 7     set $mp=(struct module *)((char *)$m->next - (char *)4) 8     printf "0x%08X\t%s\n", $mp, $mp->name 9     if ( $mp->list->next == &modules) 10       set $done=1 11     end 12     set $m=$m->next 13   end 14 end 15 16 document lsmod 17 List the loaded kernel modules and their start addresses 18 end

This simple loop starts with the kernel's global variable module. This variable is a struct list_head that marks the start of the linked list of loadable modules. The only complexity is the same as that described in Listing 14-15. We must subtract an offset from the struct list_head pointer to point to the top of the struct module. This is performed in line 7. This macro produces a simple listing of modules containing the address of the struct module and the module's name. Here is an example of its use:

(gdb) lsmod Address         Module 0xD1012A80      ip_conntrack_tftp 0xD10105A0      ip_conntrack 0xD102F9A0      loop (gdb) help lsmod List the loaded kernel modules and their start addresses (gdb)

Macros such as the ones presented here are very powerful debugging aids. You can create macros in a similar fashion to display anything in the kernel that lends itself to easy access, especially the major data structures maintained as linked lists. Examples include process memory map information, module information, file system information, and timer lists and so on. The information presented here should get you started.

14.3.5. Debugging Loadable Modules

The most common reason for using KGDB is to debug loadable kernel modules, that is, device drivers. One of the more convenient features of loadable modules is that, under most circumstances, it is not necessary to reboot the kernel for each new debugging session. You can start a debugging session, make some changes, recompile, and reload the module without the hassle and delay of a complete kernel reboot.

The complication associated with debugging loadable modules is in gaining access to the symbolic debug information contained in the module's object file. Because loadable modules are dynamically linked when they are loaded into the kernel, the symbolic information contained in the object file is useless until the symbol table is adjusted.

Recall from our earlier examples how we invoke gdb for a kernel debugging session:

$ ppc_4xx-gdb vmlinux

This launches a gdb debugging session on your host, and reads the symbol information from the Linux kernel ELF file vmlinux. Of course, you will not find symbols for any loadable modules in this file. Loadable modules are separate compilation units and are linked as individual standalone ELF objects. Therefore, if we intend to perform any source-level debugging on a loadable module, we need to load its debug symbols from the ELF file. gdb provides this capability in its add-symbol-file command.

The add-symbol-file command loads symbols from the specified object file, assuming that the module itself has already been loaded. However, we are faced with the chicken-and-egg syndrome. We don't have any symbol information until the loadable module has been loaded into the kernel and the add-symbol-file command is issued to read in the module's symbol information. However, after the module has been loaded, it is too late to set breakpoints and debug the module's *_init and related functions because they have already executed.

The solution to this dilemma is to place a breakpoint in the kernel code that is responsible for loading the module, after it has been linked but before its initialization function has been called. This work is done by .../kernel/module.c. Listing 14-17 reproduces the relevant portions of module.c.

Listing 14-17. `module.c`: Module Initialization

... 1901         down(&notify_mutex); 1902        notifier_call_chain(&module_notify_list, MODULE_STATE_COMING, mod); 1903         up(&notify_mutex); 1904 1905         /* Start the module */ 1906         if (mod->init != NULL) 1907                 ret = mod->init(); 1908         if (ret < 0) { 1909                 /* Init routine failed: abort.  Try to protect us from 1910                    buggy refcounters. */ 1911                 mod->state = MODULE_STATE_GOING; ...

We load the module using the modprobe utility, which was demonstrated in Listing 8-5 in Chapter 8, "Device Driver Basics," and looks like this:

$ modprobe loop

This command issues a special system call that directs the kernel to load the module. The module loading begins at sys_init_module() in module.c. After the module has been loaded into kernel memory and dynamically linked, control is passed to the module's _init function. This is shown in lines 1906 and 1907 of Listing 14-17. We place our breakpoint here. This enables us to add the symbol file to gdb and subsequently set breakpoints in the module. We demonstrate this process using the Linux kernel's loopback driver called loop.ko. This module has no dependencies on other modules and is reasonably easy to demonstrate.

Listing 14-18 shows the gdb commands to initiate this debugging session on loop.ko.

Listing 14-18. Initiate Module Debug Session: `loop.ko`

1 $ ppc-linux-gdb --silent vmlinux 2 (gdb) connect 3 breakinst () at arch/ppc/kernel/ppc-stub.c:825 4 825     } 5 Breakpoint 1 at 0xc0016b18: file kernel/panic.c, line 74. 6 Breakpoint 2 at 0xc005a8c8: file fs/buffer.c, line 296. 7 (gdb) b module.c:1907 8 Breakpoint 3 at 0xc003430c: file kernel/module.c, line 1907. 9 (gdb) c 10 Continuing. 11 >>>> Here we let the kernel finish booting 12      and then load the loop.ko module on the target 13 14 Breakpoint 3, sys_init_module (umod=0x30029000, len=0x2473e, 15     uargs=0x10016338 "") at kernel/module.c:1907 16 1907                    ret = mod->init(); 17 (gdb) lsmod 18 Address         Module 19 0xD102F9A0      loop 20 (gdb) set $m=(struct module *)0xD102F9A0. 21 (gdb) p $m->module_core 22 $1 = (void *) 0xd102c000 23 (gdb) add-symbol-file ./drivers/block/loop.ko 0xd102c000 24 add symbol table from file "./drivers/block/loop.ko" at 25         .text_addr = 0xd102c000 26 (y or n) y 27 Reading symbols from /home/chris/sandbox/linux-2.6.13-amcc/ drivers/block        /loop.ko...done.

Starting with line 2, we use the gdb user-defined macro connect created earlier in Listing 14-10 to connect to the target board and set our initial breakpoints. We then add the breakpoint in module.c, as shown in line 7, and we issue the continue command (c). Now the kernel completes the boot process and we establish a telnet session into the target and load the loop.ko module (not shown). When the loopback module is loaded, we immediately hit breakpoint #3. gdb then displays the information shown in lines 14 through 16.

At this point, we need to discover the address where the Linux kernel linked our module's .text section. Linux stores this address in the module information structure struct module in the module_core element. Using the lsmod macro we defined in Listing 14-16, we obtain the address of the struct module associated with our loop.ko module. This is shown in lines 17 through 19. Now we use this structure address to obtain the module's .text address from the module_core structure member. We pass this address to the gdb add-symbol-file command, and gdb uses this address to adjust its internal symbol table to match the actual addresses where the module was linked into the kernel. From there, we can proceed in the usual manner to set breakpoints in the module, step through code, examine data, and so on.

We conclude this section with a demonstration of placing a breakpoint in the loopback module's initialization function so that we can step through the module's initialization code. The complication here is that the kernel loads the module's initialization code into a separately allocated portion of memory so that it can be freed after use. Recall from Chapter 5, "Kernel Initialization," our discussion of the __init macro. This macro expands into a compiler attribute that directs the linker to place the marked portion of code into a specially named ELF section. In essence, any function defined with this attribute is placed in a separate ELF section named .init.text. Its use is similar to the following:

static int __init loop_init(void){...}

This invocation would place the compiled loop_init() function into the .init.text section of the loop.ko object module. When the module is loaded, the kernel allocates a chunk of memory for the main body of the module, which is pointed to by the struct module member named module_core. It then allocates a separate chunk of memory to hold the .init.text section. After the initialization function is called, the kernel frees the memory that contained the initialization function. Because the object module is split like this, we need to inform gdb of this addressing scheme to be able to use symbolic data for debugging the initialization function.^[9] Listing 14-19 demonstrates these steps.

^[9] As of this writing, there is a bug in gdb that prevents this technique from working properly. Hopefully, by the time you read this, it will be fixed.

Listing 14-19. Debugging Module `init` Code

$ ppc_4xx-gdb -slient vmlinux (gdb) target remote /dev/ttyS0 Remote debugging using /dev/ttyS0 breakinst () at arch/ppc/kernel/ppc-stub.c:825 825     } << Place a breakpoint before calling module init >> (gdb) b module.c:1907 Breakpoint 1 at 0xc0036418: file kernel/module.c, line 1907. (gdb) c Continuing. Breakpoint 1, sys_init_module (umod=0xd102ef40, len=0x23cb3, uargs=0x10016338 "") at kernel/module.c:1907 1907                    ret = mod->init(); << Discover init addressing from struct module >> (gdb) lsmod Address         Module 0xD102EF40      loop (gdb) set $m=(struct module *)0xD102EF40 (gdb) p $m->module_core $1 = (void *) 0xd102b000 (gdb) p $m->module_init $2 = (void *) 0xd1031000 << Now load a symbol file using the core and init addrs >> (gdb) add-symbol-file ./drivers/block/loop.ko 0xd102b000 -s .init.text 0xd1031000 add symbol table from file "./drivers/block/loop.ko" at         .text_addr = 0xd102b000         .init.text_addr = 0xd1031000 (y or n) y Reading symbols from /home/chris/sandbox/linux-2.6.13-amcc/drivers/block/loop.ko...done. (gdb) b loop_init Breakpoint 3 at 0xd1031000: file drivers/block/loop.c, line 1244. (gdb) c Continuing. << Breakpoint hit, proceed to debug module init function >> Breakpoint 3, 0xd1031000 in loop_init () file drivers/block/loop.c, line 1244 1244        if (max_loop < 1 || max_loop > 256) { (gdb)

14.3.6. printk Debugging

Debugging kernel and device driver code using printk is a popular technique, mostly because printk has evolved into a very robust method. You can call printk from almost any context, including from interrupt handlers. printk is the kernel's version of the familiar printf() C library function. printk is defined in .../kernel/printk.c.

It is important to understand the limitations of using printk for debugging. First, printk requires a console device. Moreover, although the console device is configured as early as possible during kernel initialization, there are many calls to printk before the console device has been initialized. We present a method to cope with this limitation later, in Section 14.5, "When It Doesn't Boot."

The printk function allows the addition of a string marker that identifies the level of severity of a given message. The header file .../include/linux/kernel.h defines eight levels:

#define     KERN_EMERG    "<0>" /* system is unusable */ #define     KERN_ALERT    "<1>" /* action must be taken immediately */ #define     KERN_CRIT     "<2>" /* critical conditions */ #define     KERN_ERR      "<3>" /* error conditions */ #define     KERN_WARNING  "<4>" /* warning conditions */ #define     KERN_NOTICE   "<5>" /* normal but significant condition */ #define     KERN_INFO     "<6>" /* informational */ #define     KERN_DEBUG    "<7>" /* debug-level messages */

A simple printk message might look like this:

printk("foo() entered w/ %s\n", arg);

If the severity string is omitted, the kernel assigns a default severity level, which is defined in printk.c. In recent kernels, this is set at severity level 4, KERN_WARNING. Specifying printk with a severity level might look something like this:

printk(KERN_CRIT "vmalloc failed in foo()\n");

By default, all printk messages below a predefined loglevel are displayed on the system console device. The default loglevel is defined in printk.c. In recent Linux kernels, it has the value 7. This means that any printk message that is greater in importance than KERN_DEBUG will be displayed on the console.

You can set the default kernel loglevel in a variety of ways. At boot time, you can specify the default loglevel on your target board by passing the appropriate kernel command line parameters to the kernel at boot time. Three kernel command line options defined in main.c affect the default loglevel:

debug Sets the console loglevel to 10
quiet Sets the console loglevel to 4
loglevel= Sets the console loglevel to your choice of value

Using debug effectively displays every printk message. Using quiet displays all printk messages of severity KERN_ERR or higher.

printk messages can be logged to files on your target or via the network. Use klogd (kernel log daemon) and syslogd (system log daemon) to control the logging behavior of printk messages. These popular utilities are described in man pages and many Linux references, and are not described here.

14.3.7. Magic SysReq Key

This useful debugging aid is invoked through a series of special predefined key sequences that send messages directly to the kernel. For many target architectures and boards, you use a simple terminal emulator on a serial port as a system console. For these architectures, the Magic SysReq key is defined as a break character followed by a command character. Consult the documentation on the terminal emulator you use for how to send a break character. Many Linux developers use the minicom terminal emulator. For minicom, the break character is sent by typing Ctl-A F. After sending the break in this manner, you have 5 seconds to enter the command character before the command times out.

This useful kernel tool can be very helpful for development and debugging, but it can also cause data loss and system corruption. Indeed, the b command immediately reboots your system without any notification or preparation. Open files are not closed, disks are not synced, and file systems are not unmounted. When the reboot (b) command is issued, control is immediately passed to the reset vector of your architecture in a most abrupt and stunning manner. Use this powerful tool at your own peril!

This feature is well documented in the Linux kernel documentation subdirectory in a file called sysrq.txt. There you find the details for many architectures and the description of available commands.

For example, another way to set the kernel loglevel just discussed is to use the Magic SysReq key. The command is a number from 0 through 9, which results in the default loglevel being set to the number of the command. From minicom, press Ctl-A F followed by a number, such as 9. Here is how it looks on the terminal:

$ SysRq : Changing Loglevel    Loglevel set to 9

Commands can be used to dump registers, shut down your system, reboot your system, dump a list of processes, dump current memory information to your console, and more. See the documentation file in any recent Linux kernel for the details.

This feature is most commonly used when something causes your system to lock up. Often the Magic SysReq key provides a way to learn something from an otherwise dead system.

14.3. Debugging the Linux Kernel

Listing 14-4. Debugging Architecture-Setup Code

14.3.1. gdb Remote Serial Protocol

Listing 14-5. continue Remote Protocol Example

Figure 14-4. gdb inserting target memory breakpoints

Listing 14-6. Remote Protocol: Breakpoint Hit

14.3.2. Debugging Optimized Kernel Code

Listing 14-7. Optimized Architecture-Setup Code

Listing 14-8. Disassemble Function yosemite_setup_arch

Listing 14-9. Portions of Source File yosemite.c

14.3.3. gdb User-Defined Commands

Listing 14-10. Simple gdb Initialization File

14.3.4. Useful Kernel gdb Macros

Listing 14-11. gdb find_task Macro

Listing 14-12. gdb Macro: Print Process Information

Listing 14-13. gdb ps Macro Output

Listing 14-14. gdb task_struct_show Macro

Listing 14-15. gdb find_next_task Macro