q4 | HP-UX 11i Internals

q4 is a tool that was designed for analyzing kernel core dumps. However, like adb, it can also be used on a running system. You do need to be aware that most kernel data structures are very dynamic, changing constantly. Thus, if you're looking at a live kernel rather than a dump, you need to realize that the data you're looking at is a snapshot.

Like adb, q4 takes a kernel file and a memory image or core dump directory as parameters. If the parameters are omitted, it looks for vmunix and a core dump in the current directory. So, if you're looking at a core dump, change to the directory that has the dump in it and just type q4. If you want to run q4 on the running kernel, type q4 /stand/vmunix /dev/mem.

A very useful option to q4 is p, which tells q4 to start Perl and set up an interface with the Perl process. This allows you, from within q4, to run Perl scripts that interact with q4. Many Perl scripts are packaged with q4 to perform common tasks. You may find that you need to tell q4 where Perl is by setting the Q4_PERL_PATH, and q4 also needs a Perl startup script that can be pointed to by the environment variable Q4_STARTUP_SCRIPT. If your files are in the usual places for release 11i, the following starts q4 with Perl mode on the running kernel:

 export Q4_PERL_PATH=/usr/contrib/bin/perl export Q4_STARTUP_SCRIPT=/usr/contrib./lib/Q4/sample.q4rc.pl /usr/contrib./bin/q4  p /stand/vmunix /dev/mem

Examining Data and Code

q4 has an examine command that is similar to the data display commands in adb. The format is

 examine <address> [for <count>] using <format>

For example, to display the value of nproc, type

 examine &nproc using D

Note the ampersand before the symbol nproc. q4 assumes that the symbol you are giving it is a pointer. This is true for most kernel values, since most kernel space is allocated dynamically. But for something like nproc, which is a static global value, you need to use the address-of operator, &.

If you want to look at assembly code for a function, you use the code command. This is simply code <function>. For example, to get the assembly code for realmain, you would type

 code realmain

q4 doesn't do the disassembly itself it relies on adb to do that. But it does figure out the length of the function for you, and the syntax is easier to remember than the adb command.

Getting Data Type Information

q4's strength lies in its awareness of data structures. Unlike adb, which requires working with raw data in memory, q4 can display complex kernel data structures in a format that is easy to read.

We can look at what is in a kernel data structure with the fields command. This command takes a struct, union, or type as an argument and displays the members of the structure. For example, to see what the fields are in a spinlock structure, we would type

 q4> fields struct lock OFFSET        SIZE       FLAVOR  NAME  bytes +bits bytes +bits      0     0     8     0 u_long  sl_lock      8     0     8     0 u_long  sl_owner     16     0     8     0 *       sl_name_ptr     24     0     2     0 u_short sl_flag     26     0     2     0 u_short sl_next_cpu     28     0     4     0 char[4] sl_pad

q4 tells us the offset of each member in the structure, the member's size, its type, and its name. Notice that q4 knows about pointers such as sl_name_ptr, but it doesn't know what type they point to.

When you specify a type in q4, you must include the struct or union keywords if the type is a structure or a union. In the above example we used struct lock to indicate that it was a structure. If the type has a typedef, you can use that instead, so we could also have said fields lock_t.

Loading a Structure into a Pile

The basic unit that q4 works with is called a pile. A pile is simply an array of structures. You create a pile with q4's load command. The load command needs to know what kind of structure you want to examine, how many of them to load, and how to find them. The simplest form of the load command is load <type> from <address>. The type can be any publicly visible kernel structure or typedef. If you're using a structure, you must include the struct keyword. The address can either be a numeric value or a symbol. For example, to load the first mpinfo struct from the mpproc_info array, we would type

 q4> load struct mpinfo from mpproc_info

Because struct mpinfo is also typedefed to mpinfo_t, we could also say

 q4> load mpinfo_t from mpproc_info

Displaying the Contents of a Pile

Once you've got a pile with a structure in it, you need to be able to look at the data. To print that pile, we use the print command. The print command with no arguments prints out the entire structure. If you're going to print a lot of elements, you probably want to add the t flag to the print command. This option tells the command to print one element per line with the names on the left rather than print the data in columns with the names across the top. The mpinfo structure is very large it has over 2,000 elements in it so even with the t option, you'll get over 2,000 lines of output. Fortunately, you don't have to see it all.

You can pipe q4 commands through shell commands. So, after you've got your mpinfo pile loaded, you might want to do print tx | more to see it a page at a time. You could find out how many items there are in an mpinfo structure by doing print tx | wc l. You can also tell q4 exactly which items to print. If you type print tx procindex prochpa, you'll get something like the following:

 procindex  0   prochpa  0xfffffffffed21000

Using Skip to Load Structures

So far, we've loaded up the first mpinfo structure into a pile. But mpinfo is an array what if you want to look at the mpinfo structures for other processors? The skip keyword tells q4 how many array elements to skip over. If you want to look at the second entry (the entry for processor 1), for example, type

 q4> load struct mpinfo from mpproc_info skip 1

Now, if we look at procindex and prochpa, we get

 procindex  0x1   prochpa  0xfffffffffed61000

Loading Multiple Structures

Now we know how to load up a structure and examine its contents. But a pile can be more than one structure you can load up a whole array of structures. We do this with the max keyword to the load command, which tells q4 the maximum number of structures to load; the default is one. Continuing with our mpinfo example, we can load the mpinfo structures for all of the processors by taking advantage of the global symbol nmpinfo, which tells us how many there are:

 q4> load struct mpinfo from mpproc_info max nmpinfo loaded 2 struct mpinfos as an array (stopped by max count)

q4 tells us how many were loaded and what made it stop.

Loading Linked Lists

The linked list is such a common structure in the kernel that q4 has knowledge of linked lists built in. To load a list of structures into a pile, you have to tell q4 which element in the structure points to the next structure in the list. You do this with the next keyword.

For an example, let's look at the process table. Prior to HP-UX release 11.11, the process table was a statically allocated table of entries. Beginning with HP-UX 11.11, the process table is a linked list. The head of the list is pointed to by the symbol proc_list, and each entry in the list has a member called p_factp, which points to the next entry. To load up all of the current process table entries, we type

 q4> load struct proc from proc_list next p_factp max 10000 loaded 155 struct procs as a linked list (stopped by null pointer)

q4 again tells us how many structures were loaded and why it stopped in this case because it encountered a null pointer. q4 detects the end of a linked list either when it finds a null pointer or when it finds a pointer that points back to the first entry in the list.

Notice the large value for max. In this case, we wanted all of the proc structures, so we picked a max value large enough that it wouldn't be exceeded. Without specifying max at all, we would get only one structure loaded.

Using Values from a Pile

In addition to using global symbols, we can also specify the name of a member in the current pile as an address. Since there are so many cases where one data structure points to another, it makes sense to be able to use the member names rather than have to type in addresses. For example, in the proc structure there is a member named p_firstthreadp that points to the first thread in that process. Once you've got a proc structure on the pile, you can load the first thread for that process by typing

 q4> load struct kthread from p_firstthreadp

Manipulating Piles

Each time you execute a load command, a new pile is created. Previous piles are pushed onto a stack, so that at any time you will have access to all piles you've created. The history command lists the piles in the stack. For example,

 q4> history HIST NAME   LAYOUT COUNT TYPE          COMMENTS    1 <none>  array     1 struct mpinfo stopped by max count    2 <none>  array     2 struct mpinfo stopped by max count    3 <none>  array     2 struct mpinfo stopped by max count    4 <none>   list   155 struct proc   stopped by null pointer

The highest numbered pile is the current pile. You can recall a previous pile with the recall command. The argument to the recall command is a pile number from the HIST column in the history display. A copy of the specified pile is pushed onto the stack and becomes the new current pile. Given the history in the previous example, we can recall the first mpinfo pile:

 q4> recall 1 copied 1 item q4> history HIST NAME   LAYOUT COUNT TYPE          COMMENTS    1 <none>  array     1 struct mpinfo stopped by max count    2 <none>  array     2 struct mpinfo stopped by max count    3 <none>  array     2 struct mpinfo stopped by max count    4 <none>   list   155 struct proc   stopped by null pointer    5 <none>  array     1 struct mpinfo copy of 1

A pile can also be given a descriptive name that will be displayed in the history. The name is applied to the current pile with the name command. The syntax is just name it <name>. Continuing the above example, if we want to assign the name proc1 to the pile we just recalled, we type

 q4> name it proc1 so named q4> history HIST NAME   LAYOUT COUNT TYPE          COMMENTS    1 <none>  array     1 struct mpinfo stopped by max count    2 <none>  array     2 struct mpinfo stopped by max count    3 <none>  array     2 struct mpinfo stopped by max count    4 <none>   list   155 struct proc   stopped by null pointer    5 proc1   array     1 struct mpinfo copy of 1

Now we can use proc1 instead of the pile number if we want to recall the pile. This can be really handy if you have a lot of piles that are all the same data type it can be hard to tell them apart.

Another handy pair of commands is keep and discard. For both of these commands, the argument is an expression. A new pile is created that contains either only those structures for which the expression is true (keep) or only those for which the expression is false (discard). As an example, let's say we want to look at the process table entry for PID 1. We can recall the list of all processes we have in our stack, then keep just the one with p_pid equal to one:

 q4> recall 4 copied 155 items q4> keep p_pid == 1 kept 1 of 155 struct proc's, discarded 154 q4> history HIST NAME   LAYOUT COUNT TYPE          COMMENTS    1 <none>  array     1 struct mpinfo stopped by max count    2 <none>  array     2 struct mpinfo stopped by max count    3 <none>  array     2 struct mpinfo stopped by max count    4 <none>   list   155 struct proc   stopped by null pointer    5 proc1   array     1 struct mpinfo copy of 1    6 <none>   list   155 struct proc   copy of 4    7 <none> mixed?     1 struct proc   subset of 6

We now have a pile that contains the process table for process one.

An Example: Getting the `uarea` of a Process

Let's look at an example that goes through several of these steps. In this example, we get the uarea of a process, PID 2003. Because of the way structures are interconnected, there are many ways to do this, and some of them are quicker and easier than this example, but the idea here is to show a variety of q4 commands. Following are the steps we go through:

Load all processes.
Keep process 2003.
Get the VAS structure for the process from the p_vas pointer.
Using the VAS region list, load all regions for the process.
Keep the region that has a type of PT_UAREA.
Print the space and offset of that region.
Load a struct user from that space and offset.

 q4> load struct proc from proc_list next p_factp max 10000 loaded 150 struct procs as a linked list (stopped by null  pointer) q4> keep p_pid == 2003 kept 1 of 150 struct proc's, discarded 149 q4> load vas_t from p_vas loaded 1 vas_t as an array (stopped by max count) q4> load preg_t from va_ll.lle_prev next p_ll.lle_prev max 100 loaded 21 preg_ts as a linked list (stopped by loop) q4> keep p_type == PT_UAREA kept 1 of 21 preg_t's, discarded 20 q4> print -x p_space p_vaddr   p_space            p_vaddr 0x6546c00 0x400003ffffff0000 q4> load struct user from 0x6546c00.0x400003ffffff0000 loaded 1 struct user as an array (stopped by max count)

Tracing Stacks

To get a stack trace, you use the trace command. q4 can trace four kinds of things: processes, kthreads, processors, and crash events. Note that trace will not show you function calls in user processes this is the kernel stack.

If you are looking at a crash dump rather than the running kernel, you can get a stack trace of crash events using the trace event command. Given a crash event number, this shows you the stack trace from the time the process entered kernel space to the time the system panicked. Crash event 0 is generally the most useful, although there are times when you need to see other crash events as well. For example, here is the stack trace from a data page fault panic:

 q4> trace event 0 stack trace for event 0 crash event was a panic panic+0x14 report_trap_or_int_and_panic+0x84 trap+0xd9c nokgdb+0x8 getnewbuf+0x1cc ogetblk+0x110 getblk1+0x260 vx_getblk+0x50 vx_write_default+0x564 vx_write1+0x4d8 vx_rdwr+0x164 vno_rw+0x84 write+0x104 syscall+0x28c $syscallrtn+0x0

Here we see that the process entered the kernel by doing a write() system call. The kernel progressed through vno_rw, vx_rdwr, and so on until it was in getnewbuf. At 0x1cc bytes past the beginning of getnewbuf, it executed an instruction that caused a trap in this case a Data Page Fault. The trap handler, trap, decided it couldn't handle the trap and called report_trap_or_int_and_panic to cause the system to panic.

The trace processor command takes a single argument, the number of the processor you wish traced. q4 prints a stack trace for whatever was running on that processor.

You can use trace process at or trace thread at to get a kernel stack trace for a particular process or thread. The value following at should be the address of a struct proc or a struct kthread. Tracing a process will print a stack trace of all the threads for that process.

You can also use trace pile to get a trace of whatever is in the current pile. The structures in the pile must be of type struct proc, struct kthread, struct mpinfo, or struct crash_event_table_struct. Here's an example of using trace pile to get the stack trace for a particular process:

 q4> load struct proc from proc_list next p_factp max 10000 loaded 150 struct procs as a linked list (stopped by null  pointer) q4> keep p_pid == 2003 kept 1 of 150 struct proc's, discarded 149 q4> trace pile stack trace for process at 0x0'42846040 (pid 2003), thread at  0x0'42847040 (tid 2104) process was not running on any processor _swtch+0xc4 _sleep+0x318 fifo_rdwr+0x2f8 vno_rw+0x1ac read+0x10c syscall+0x204 $syscallrtn+0x0

This shows us that this particular process has done a read() system call on a fifo. The fifo is empty, and the process has been put to sleep and will be woken up when data is available to read.

Using Perl Scripts in q4

q4 ships with a directory full of Perl scripts that can automate some common operations. Because these were designed for Hewlett-Packard internal use, they are not well documented and some have very limited, specific uses.

The Perl files are usually located in /usr/contrib/lib/Q4. Each script is a file with a .pl suffix. To use these files, you first have to use the include command to have them read into q4. For example, to include the whathappend.pl script, type

 q4> include whathappened.pl

Once the file is included, you run it with the run command:

 q4> run WhatHappened

The argument to the run command is the name of a subprogram within the Perl file. In most cases, this is the same as the file but in mixed case, as in the example for WhatHappened. In other cases, you may have to look at the Perl file to find out how it works. Table 16-2 lists some of the more useful Perl scripts.

Table 16-2. q4 Perl Scripts
File Name	Command	Purpose
`bucketwalk.pl`	`BucketWalk`	Checks the memory buckets in the bucket allocator for corruption.
`callout.pl`	`callout`	Reports information related to callouts.
`ioscan.pl`	`Ioscan`	Re-creates `ioscan`-like information from a dump.
`lvm.pl`	`LVM`	Prints summary of LVM configuration.
`netinfo.pl`	`Netinfo`	Simulates `lanscan` and `netstat`.
`openfiles.pl`	`OpenFiles`	Displays all files open by all processes.
`sleep_queue.pl`	`(various)`	Various commands to check sleep queues.
`whathappened.pl`	`WhatHappened`	General information used in analyzing dumps.
`wsioscsi.pl`	`WsioScsi`	Information about the state of the WSIO SCSI subsystem.