The second step to remedying a device failure is to locate the error. Many tools are available to assist a user in looking for an error. Determining the problem for a device requires the utilization of these tools. They include dmesg, lspci, lsmod, syslog/messages, and the /proc filesystem, among others. This section covers dmesg and syslog/messages in detail. The remaining tools are discussed in more detail later in this chapter.
dmesg is often helpful; therefore, it is a good place to begin. dmesg is a command that reads the kernel ring buffer, which holds the latest kernel messages. dmesg reports the current errors detected by the kernel with respect to the hardware or application. This tool provides a fast and easy way to capture the latest errors from the kernel. We provide the man page here for dmesg for quick reference:
DMESG(8) DMESG(8) NAME dmesg - print or control the kernel ring buffer SYNOPSIS dmesg [ -c ] [ -n level ] [ -s bufsize ] DESCRIPTION dmesg is used to examine or control the kernel ring buffer. The program helps users to print out their bootup messages. Instead of copying the messages by hand, the user need only: dmesg > boot.messages and mail the boot.messages file to whoever can debug their problem.
Although the dmesg command is simple to use, its extensive reports are critical to finding errors promptly. To assist in your understanding of dmesg, we now walk you through an example of a standard dmesg from a booted Linux machine.
greg@nc6000:/tmp> dmesg Linux version 2.6.8-24.11-default (geeko@buildhost) (gcc version 3.3.4 (pre 3.3.5 20040809)) #1 Fri Jan 14 13:01:26 UTC 2005 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000001ffd0000 (usable) BIOS-e820: 000000001ffd0000 - 000000001fff0c00 (reserved) BIOS-e820: 000000001fff0c00 - 000000001fffc000 (ACPI NVS) BIOS-e820: 000000001fffc000 - 0000000020000000 (reserved)
As shown in this example, the first line of the dmesg output provides information about the running kernel version, including who built the kernel, what compiler was used, and when the kernel was compiled. Therefore, if you are compiling source and have a GCC failure, this is a place to start looking. Next, we see the following:
502MB vmalloc/ioremap area available.
vmalloc is defined in arch/i386/kernel/setup.c for IA32 machines. Similarly, vmalloc for IA64 machines is defined in arch/ia64/kernel/perfmon.c. Complete details on memory structure are outside the scope of this chapter, but knowing where to look for details and documentation is a critical starting point. Also note that ioremap.c defines the space for kernel access. In the following code, we see basic boundaries of High and Low memory limits detected on boot by the communication between the BIOS and kernel.
0MB HIGHMEM available. 511MB LOWMEM available. On node 0 totalpages: 131024 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 126928 pages, LIFO batch:16 HighMem zone: 0 pages, LIFO batch:1 DMI 2.3 present. ACPI: RSDP (v000 COMPAQ ) @ 0x000f6f80 ACPI: RSDT (v001 HP HP0890 0x23070420 CPQ 0x00000001) @ 0x1fff0c84 ACPI: FADT (v002 HP HP0890 0x00000002 CPQ 0x00000001) @ 0x1fff0c00 ACPI: DSDT (v001 HP nc6000 0x00010000 MSFT 0x0100000e) @ 0x00000000 ACPI: PM-Timer IO Port: 0x1008 ACPI: local apic disabled Built 1 zonelists
The following is the bootloader command for calling the kernel. Whether using GRUB or LILO, the bootloader is called similarly to the line shown next.
Kernel command line: root=/dev/hda1 vga=0x317 selinux=0 resume=/dev/hda5 desktop elevator=as splash=silent PROFILE=Home bootsplash: silent mode.
If a user forgets the boot options provided at boot time, dmesg or syslog will have the values recorded. Note that Chapter 6, "Disk Partitions and Filesystems," discusses Master Boot Record (MBR) in great detail, offering information about how the BIOS uses a bootloader such as GRUB or LILO. Continuing with dmesg output, we see in the following that the processor and video console are detected:
Initializing CPU#0 PID hash table entries: 2048 (order: 11, 32768 bytes) Detected 1694.763 MHz processor. Using pmtmr for high-res timesource Console: colour dummy device 80x25
The next entries report directory entry and inode cache allocation.
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
The document located in kernel source at /usr/src/linux/Documentation/filesystems/vfs.txt describes dentry and inode cache in great detail. Following dentry and inode cache, the amount of system memory is detected and displayed:
Memory: 513740k/524096k available (2076k kernel code, 9744k reserved, 780k data, 212k init, 0k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok.
Now, let us wrap up this short demonstration of dmesg with some information about BogoMIPS. In linux/arch/i386/kdb/kdba_io.c, the "Kernel Debugger Architecture Dependent Console I/O handler" defines the BogoMIPS. In simplest terms, BogoMIPS is merely a benchmark tool for comparing similar CPUs. However, this tool is never used, which is the reason for its name ("bogus"). Please note that certain structures use the output of BogoMIPS within the kernel, but the end user will find no large benefit in it. More details on BogoMIPS can be found at http://www.tldp.org/HOWTO/BogoMips/.
Calibrating delay loop... 3358.72 BogoMIPS (lpj=1679360)
In addition to dmesg, there are other places to look for hardware errors. As we have shown, dmesg reports everything to the syslog daemon, which in turn records to log file /var/log/messages by default. Although other tools exist for reporting hardware errors, dmesg and syslog are the most prominent. Other tools such as lspci are used in conjunction with dmesg/syslog later in this chapter to locate a failed hardware component.