14.5. When It Doesn't BootOne of the most frequently asked questions on the various mailing lists that serve embedded Linux goes something like this: I am trying to boot Linux on my board, and I get stuck after this message prints to my console: "Uncompressing Kernel Image . . . OK." Thus starts the long and sometimes frustrating learning curve of embedded Linux! Many things that can go wrong could lead to this common failure. With some knowledge and a JTAG debugger, there are ways to determine what went awry. 14.5.1. Early Serial Debug OutputThe first tool you might have available is CONFIG_SERIAL_TEXT_DEBUG. This Linux kernel-configuration option adds support for debug messages very early in the boot process. At the present time, this feature is limited to the PowerPC architecture, but nothing prevents you from duplicating the functionality in other architectures. Listing 14-22 provides an example of this feature in use on a PowerPC target using the U-Boot bootloader. Listing 14-22. Early Serial Text Debug
Using this feature, you can often tell where your board is getting stuck during the boot process. Of course, you can add your own early debug messages in other places in the kernel. Here is an example of its usage found in .../arch/ppc/mm/init.c: /* Map in all of RAM starting at KERNELBASE */ if (ppc_md.progress) ppc_md.progress("MMU:mapin", 0x301); mapin_ram(); The AMCC Yosemite platform is an excellent example of this infrastructure. Consult the following files in the Linux source tree[11] for details of how this debugging system is implemented:
14.5.2. Dumping the printk Log BufferWhen we discussed printk debugging in Section 14.3.6, we pointed out some of the limitations of this method. printk itself is a very robust implementation. One of its shortcomings is that you can't see any printk messages until later in the boot sequence when the console device has been initialized. Very often, when your board hangs on boot, quite a few messages are stuck in the printk buffer. If you know where to find them, you can often pinpoint the exact problem that is causing the boot to hang. Indeed, many times you will discover that the kernel has encountered an error that led to a call to panic(). The output from panic() has likely been dumped into the printk buffer, and you can often pinpoint the exact line of offending code. This is best accomplished with a JTAG debugger, but it is still possible to use a bootloader and its memory dump capability to display the contents of the printk buffer after a reset. Some corruption of memory contents might occur as a result of the reset, but log buffer text is usually very readable. The actual buffer where printk stores its message text is declared in the printk source file .../kernel/printk.c. static char __log_buf[__LOG_BUF_LEN]; We can easily determine the linked location of this buffer from the Linux kernel map file System.map. $ grep __log_buf System.map c022e5a4 b __log_buf Now if the system happens to hang upon booting, right after displaying the "Uncompressing Kernel Image . . . OK" message, reboot and use the bootloader to examine the buffer. Because the relationship between kernel virtual memory and physical memory is fixed and constant on a given architecture, we can do a simple conversion. The address of __log_buf shown earlier is a kernel virtual address; we must convert it to a physical address. On this particular PowerPC architecture, that conversion is a simple subtraction of the constant KERNELBASE address, 0xc0000000. This is where we probe in memory to read the contents, if any, of the printk log buffer. Listing 14-23 is an example of the listing as displayed by the U-Boot memory dump command. Listing 14-23. Dump of Raw printk Log Buffer
It's not very pretty to read, but the data is there. We can see in this particular example that the kernel crashed someplace after initializing the PID hash table entries. With some additional use of printk messages, we can begin to close in on the actual source of the crash. As shown in this example, this is a technique that can be used with no additional tools. You can see the importance of some kind of early serial port output during boot if you are working on a new board port. 14.5.3. KGDB on PanicIf KGDB is enabled, the kernel attempts to pass control back to KGDB upon error exceptions. In some cases, the error itself will be readily apparent. To use this feature, a connection must already be established between KGDB and gdb. When the exception condition occurs, KGDB emits a Stop Reply packet to gdb, indicating the reason for the trap into the debug handler, as well as the address where the trap condition occurred. Listing 14-24 illustrates the sequence. Listing 14-24. Trapping Crash on Panic Using KGDB
The crash in this example was contrived by a simple write to an invalid memory location (all ones). We first establish a connection from gdb to KGDB and allow the kernel to continue to boot. Notice that we didn't even bother to set breakpoints. When the crash occurs, we see the line of offending code and get a nice backtrace to help us determine its cause. |