Summary


When encountering a kernel hang, panic, oops, or MCA (IA-64 only), remember that troubleshooting each condition involves key steps. In the event of an OS hang, you must decide whether it is an interruptible hang. After you determine this, you can proceed with troubleshooting. The goal with any hang, panic, oops, or MCA is to obtain a stack trace. This information is necessary early in the troubleshooting process to guide us to the source of the problem. Let us recap the key points of each type of scenario and the steps to troubleshooting them.

  • Interruptible hangs

     

    1.

    Use the Magic SysRq keys to attempt to gather stack of processor and offending processes.

    2.

    Check the registers: Alt+sysrq+p.

    3.

    Gather process stacks: Alt+sysrq+t.

    4.

    If SMP kernel, gather all processor stack: Alt+sysrq+w.

    5.

    Synchronize filesystems: Alt+sysrq+s.

    6.

    Reboot (soft reset) the system to clear the hang: Alt+sysrq+b.

    7.

    After system is booted, review all the logs.

    8.

    A serial console may be required to capture output of the sysrq keys.

  • Non-interruptible hangs

     

    1.

    Set up dump utility in case a panic is taking place. Recommended dump utilities include diskdump and lkcd. If running on an IPF system, a dump can be achieved by issuing a TOC, forcing a hardware INIT. The System Abstraction Layer then sees that the OS has an INIT handler. If the functionality is in place, the kernel handles the INIT and pulls panic(), utilizing the aforementioned dump utilities to create a dump. (SUSE Linux Enterprise Server 9 (ia64) - Kernel 2.6.5-7.97 uses lkcd and has this feature enabled by default.)

    2.

    On IA-32 x86 systems, the nmi_watchdog timer can be helpful in troubleshooting a hang. See linux/Documentation/nmi_watchdog.txt.

    3.

    As with interruptible hangs, review system logs.

  • Panics

     

    1.

    Collect the panic string.

    2.

    Review hardware and software logs.

    3.

    If problem cannot be identified through the console, the dump utilities must be enabled.

  • Oops

     

    1.

    Review the stack trace of the oops with ksymoops (no longer needed with the latest klogd and kernel releases).

    2.

    Locate the line that states Unable to handle kernel NULL pointer.

    3.

    Locate the line showing the instruction pointer (IP).

    4.

    Use gdb to look at the surrounding code.

  • MCA

    At the EFI shell (IA64 only), collect the CPU registers by performing the following steps:

     

    1.

    shell> errdump mca > mca.out.

    2.

    shell> errdump init > init.out.

    3.

    Send to hardware vendor for review.

Although these are the key conditions and steps to remember, every troubleshooting process is unique and should be evaluated individually to determine the ideal troubleshooting path. Of course, before consuming vast resources troubleshooting a problem, confirm that you are running on the latest supported kernel and that all software/hardware combinations are in their vendor-supported configurations.



Linux Troubleshooting for System Administrators and Power Users
Real World Mac Maintenance and Backups
ISBN: 131855158
EAN: 2147483647
Year: 2004
Pages: 129
Authors: Joe Kissell

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net