Stack tracebacks for every process


Now let's get a stack traceback for each of the processes on the system to see what they were doing. This is very long output, so we will look at it in smaller chunks . (Some of the output has been deleted for brevity.)

 zatch:  adb -k vmunix.2 vmcore.2   $<traceall  pid 0  sw_bad(?)  _swtch(0x11901ae3,0xf8180c60,0xf814fc00,0x0,0x3,0x0) + 80  _sleep(0xf8180c60,0xa,0x165e000,0x0,0xa,0xf826e34c) + 1a0  _ufs_getpage(0xfd84c2d0,0x165d000,0x1000,0x0,0xf814edc0,0x1000) + b4  _ufs_l_getpage(0xfd84c2d0,0x165d000,0x1000,0x0,0xf814edc0,0x1000) + c4  _segu_softload(0xfd803678,0xf84a5000,0x1,0x5b,0x1,0x1c7000) + b4  _segu_fault(0xfd803678,0xf84a5000,0x4000,0x2,0x5b,0xfd8069f4) + 104  _swapin(0xf827182c,0x166,0x0,0xfffffffe,0x4000,0x2) + 50  _sched(0xf8184e70,0xf827182c,0xf8184e40,0x1f,0x2,0x4) + 33c  _main(0x114000e7,0xffffffff,0x2a9b6f81,0xfffe,0xf814f000,0xf826e34c) + 374 

Timeout! This looks a bit unusual. We can't tell everything from the stack traceback without code to look at, but we can notice a few things. Reading from the bottom up, it looks like process 0, swapper , was doing a swap-in. Going up a couple of lines (the segu functions refer to the U-area for a process), we see two getpage functions that start with ufs , which refers to the UNIX File System. This means we were trying to swap from a regular file system, not a raw partition.

To continue:

 pid 1  sw_bad(?)  _swtch(0x11800aa5,0xf8180cb0,0xf814fc00,0x0,0x3,0xf8272ef0) + 80  _sleep(0xf8180cb0,0x1e,0xf826e63c,0x0,0x1e,0xf826e408) + 1a0  _wait4(0xf826e63c,0x1,0xf815b640,0xf826e408,0xf8272ef0,0xf8272ef0) + 4a0  _syscall(0xf82e1000) + 3b4 

According to the ps output, process 1, init , was in an I state. In other words, init was waiting for I/O. The entry point into the system (the last line of the stack traceback) is syscall() , which is normally the only way a user process can get into the kernel. The next line up usually identifies the system call that was made ”in this case, wait4() , a derivative of wait() (to stop until a child process changes state). The third, second, and first lines are sleep() , swtch() , and sw_bad() . The third, sleep() , is the function in the kernel that gives up control and allows a context switch to some other process. The next function is the actual context switcher, swtch() . The top line is an assembly language label (the address does not start with an underscore , which the Solaris 1, bundled C compiler automatically adds to any C function name ), so although it looks alarming, it's not significant. It's down in very low-level code in the context switch procedure.

 pid 2  sw_bad(?)  _swtch(0x11901ae4,0xf82713c4,0x114010e4,0x0,0x3,0xf82b31ac) + 80  _sleep(0xf8180ce0,0x1,0x154,0xa00,0x1,0xf826e4c4) + 1a0  _pageout(?)  _yield_child(0xf8184c00,0x1,0x1,0xf82a8374,0xf82cd1c4,0x0) + cc          pid 128          pid 55          pid 5408          pid 60  data address not found          pid 5346  data address not found 

Here we have a bunch of "blanks" ” nothing gets printed. If you'll refer back to the ps output, these are all in IW or RW state, meaning that they have been swapped out. No stack traceback information is available in memory for these particular processes.

 pid 69  sw_bad(?)  _swtch(0x11800aa7,0xf8272798,0x114000a7,0x0,0x3,0x118000a6) + 80  _sleep(0xf8180c10,0x1a,0x400000,0xa00,0x1a,0xf826e92c) + 1a0  _select(0xffbfffff,0x88001,0xf8303fe0,0x0,0xf8303fe0,0xf8304000) + 4cc  _syscall(0xf8304000) + 3b4 

This type of process (pid 69) is another common sight. A lot of processes which wait for I/O use the select() system call, which will alert them when one or more file descriptors have data available.

 pid 72  sw_bad(?)  _swtch(0x11800ae0,0xf826eaa4,0xf814fc00,0x0,0x3,0xf82d92bc) + 80  _sleep(0xf8180d70,0x1a,0xf8189c00,0x0,0x1a,0xf826e9e8) + 1a0  _async_daemon(0xfd8a85c4,0xfd8a85b8,0xf815b640,0xf815bb40,0xf8309000,0xf815bb40)    + 260  _syscall(0xf8309000) + 3b4 

This process (72) belongs to one of the biod processes. It's in the system call, async_daemon() , and is sleeping, probably waiting for a network request. All the biods look the same.

 sw_bad(?)  _swtch(0x11800ae2,0xf826f664,0xf814fc00,0x0,0x3,0xf8027c60) + 80  _sleep(0xf8180c78,0x1a,0xfd844af8,0x100,0x1a,0xf826efc8) + 1a0  _sbwait(0xff64ab38,0xfd8dc688,0x826,0x0,0xffffffff,0x2) + 14  _svc_run(0xfd844af8,0x186a3,0x2,0xf8027d94,0x0,0x110010e7) + 28  _nfs_svc(0xf8330fe0,0x4d8,0xf815b640,0xf815bb18,0xf8331000,0xf815bb18) + 260  _syscall(0xf8331000) + 3b4 

And here we have an NFS daemon. These all look the same.

Next, we see one of the processes (pid 132) in D state.

 pid 132  sw_bad(?)  _swtch(0x11901ae6,0xf8180cd0,0xf814fc00,0x0,0x3,0xf8280c64) + 80  _sleep(0xf8180cd0,0x1,0xf814fc00,0x800,0x1,0xf826f898) + 1a0  _page_cv_wait(0xf82bec8c,0x114010e0,0xf814fc00,0x0,0x1,0x1) + 14  _page_wait(0xf82bec8c,0xf82a4bfc,0x0,0x0,0x16d7000,0xfd84c2d0) + a4  _pvn_getdirty(0xf82bec8c,0xf8326c0c,0x0,0x0,0x16d7000,0xfd84c2d0) + e4  _pvn_vplist_dirty(0xfd84c2d0,0x0,0x0,0x0,0xf8298b2c,0xf82bec8c) + 110  _ufs_putpage(0xfd84c2d0,0xfd84c2c8,0x0,0x0,0x0,0xf826f898) + 350  _ufs_l_putpage(0xfd84c2d0,0x0,0x0,0x0,0x0,0x0) + 30  _syncip(0xfd84c2c8,0x0,0x0,0x2,0x1101,0x0) + 124  _update(0xf8187130,0x104e,0xfd84c2c8,0xfd8a5eb8,0x0,0xfd84c2c8) + 2f0  _ufs_sync(0x0,0xf815dca0,0xf81663c0,0xf8185090,0xf815dca0,0xfd852b98) + 4  _sync(0xf8326fe0,0x120,0xf815b640,0xf815b760,0xf8327000,0xf815dc60) + 3c  _syscall(0xf8327000) + 3b4 

Process 132 is update . The normal function of update is to do a sync every 30 seconds. This crash appears to have caught update in the middle of one of its sync operations. The system call is sync() , and it seems to be going through UFS operations ( putting a page out, in this case), and waiting for some sort of resource to be freed in the pageout code. This may be abnormal, or may not. Let's go on and see what other processes there are.

 pid 5525  sw_bad(?)  _swtch(0x11801aa7,0xf8271190,0x114010a7,0x0,0x3,0x9a) + 80  _sleep(0xf8180c10,0x1a,0x64,0xa00,0x1a,0xf826fff0) + 1a0  _select(0xffbfffff,0x20088001,0xf847ffe0,0xf7fff1d0,0xf847fdd0,0xf8480000) + 4cc  _syscall(0xf8480000) + 3b4 

Process 5525, is one of the ones marked in the ps output as runnable. It was in a select system call, which means that if the system were still up this process would have been continued with the select , returning an indication that input was available. This change of state could have been done at any time.

 pid 5363  sw_bad(?)  _swtch(0x11901ae7,0xf827124c,0xf814fc00,0x0,0x3,0xf8188c00) + 80  _sleep(0xf8180c60,0xa,0xe7000,0x0,0xa,0xf827068c) + 1a0  _ufs_getpage(0xfd84c2d0,0xe6000,0x1000,0xf838fdbc,0xf838fd58,0x1000) + b4  _ufs_l_getpage(0xfd84c2d0,0xe6000,0x1000,0xf838fdbc,0xf838fd58,0x1000) + c4  _anon_getpage(0x0,0xf838fdbc,0xf838fd58,0x1000,0xfd8487c8,0x324000) + 170  _segvn_faultpage(0xfd8487c8,0x324000,0x64000,0xfd9093c0,0x0,0xf838fdec) + 148  _segvn_fault(0xfd8487c8,0x1000,0x0,0x0,0x324000,0xfd84c9e0) + 5d4  _as_fault(0x324000,0xfd8487c8,0x1000,0x0,0x1,0x324000) + c0  _pagefault(0x324144,0x0,0x1,0x0,0xf827068c,0xf8188c00) + 1f0  _trap(0x10009) + 5dc 

Process 5363 is another D-state process. According to the ps output, this process is running /usr/openwin/bin/xnews , part of OpenWindows. This process entered the kernel through a trap ” a page fault. Note that this stack trace goes through anon_getpage() , which is used to get anonymous memory. This is usually another name for swap space, and the next routines are going through UFS code again. Hmmm. This is starting to look like a trend.

 pid 5524  sw_bad(?)  _swtch(0x11901ae1,0xf826e34c,0xf814fc00,0x0,0x3,0xf8188c00) + 80  _sleep(0xf8180c60,0xa,0x13ec000,0x0,0xa,0xf8270804) + 1a0  _ufs_getpage(0xfd84c2d0,0x13eb000,0x1000,0xf847adbc,0xf847ad58,0x1000) + b4  _ufs_l_getpage(0xfd84c2d0,0x13eb000,0x1000,0xf847adbc,0xf847ad58,0x1000) + c4  _anon_getpage(0x0,0xf847adbc,0xf847ad58,0x1000,0xfd8b2298,0x1f1000) + 170  _segvn_faultpage(0xfd8b2298,0x1f1000,0x45000,0xfd8e70b4,0x0,0xf847adec) + 148  _segvn_fault(0xfd8b2298,0x1000,0x0,0x0,0x1f1000,0xfd84a550) + 5d4  _as_fault(0x1f1000,0xfd8b2298,0x1000,0x0,0x1,0x1f1000) + c0  _pagefault(0x1f14b8,0x0,0x1,0x0,0xf8270804,0xf8188c00) + 1f0  _trap(0x10009) + 5dc 

Another page fault. Another reference to anonymous memory that goes into the UFS routines and blocks.

 pid 5529  sw_bad(?)  _swtch(0x11900ae2,0xf8270804,0xf814fc00,0x0,0x3,0xf8188c00) + 80  _sleep(0xf8180c60,0xa,0x8b9000,0x0,0xa,0xf827124c) + 1a0  _ufs_getpage(0xfd84c2d0,0x8b8000,0x1000,0xf8493dbc,0xf8493d58,0x1000) + b4  _ufs_l_getpage(0xfd84c2d0,0x8b8000,0x1000,0xf8493dbc,0xf8493d58,0x1000) + c4  _anon_getpage(0x0,0xf8493dbc,0xf8493d58,0x1000,0xfd8b32c0,0xe000) + 170  _segvn_faultpage(0xfd8b32c0,0xe000,0x2000,0xfd908548,0x0,0xf8493dec) + 148  _segvn_fault(0xfd8b32c0,0x1000,0x0,0x0,0xe000,0xfd8b5b28) + 5d4  _as_fault(0xe000,0xfd8b32c0,0x1000,0x0,0x1,0xe000) + c0  _pagefault(0xeca4,0x0,0x1,0x0,0xf827124c,0xf8188c00) + 1f0  _trap(0x10009) + 5dc 

Yes, it looks like we're on to something. Another identical sequence in process 5529.

 sw_bad(?)  _swtch(0x11900ae7,0xf826e4c4,0xf814fc00,0x0,0x3,0xfd8c7288) + 80  _sleep(0xf8180ce0,0x1,0xfd803698,0x800,0x1,0xf82713c4) + 1a0  _page_cv_wait(0xf82bfccc,0x114000e1,0xf814fc00,0x0,0x258e0be,0x2) + 14  _page_wait(0xf82bfccc,0x1651000,0x2,0x0,0x1651000,0xfd84c2d0) + a4  _anon_decref(0xfd8803b0,0xf84acb38,0xfd8803b0,0x0,0x0,0xf82bfccc) + 54  _anon_free(0xfd925620,0x3,0x0,0x3,0x0,0x4) + 2c  _segvn_free(0xfd8c6db8,0xfd925620,0x0,0xf81804d8,0xfd8c7288,0xfd8b37bc) + 100  _seg_free(0xfd8c6db8,0xf8219060,0xff847000,0x3,0xff847000,0xf81801f0) + 80  _segvn_unmap(0xfd8c6db8,0xff843000,0x4000,0xfd805c80,0xff847000,0xfd8c7288)+ 9c  _as_unmap(0xf81801f0,0xfd8c6db8,0xfd803698,0xff847000,0x4000,0xff843000)+b8  _args_free(0x0,0x20088001,0xf84ad000,0xf8267624,0xf84ad000,0xff843000) + 20  _execve(0xff843000,0x0,0xf7fffffa,0xffffffff,0xfffffffc,0xffffffff) + a5c  _au_execve(0xf84acfe0,0x1d8,0xf815b640,0xf815b818,0xf84ad000,0x0) + 20  _syscall(0xf84ad000) + 3b4 

This trace appears to be a system call to start up a new process, an exec () . As a part of the exec() call, normally the system deletes the current process space and overlays it with a new image. We appear to be going through args_free() , which sounds a lot like it's releasing argument space, then passing through anon_free() (swap space again) on the way to a page_wait() and a sleep() .

The next two traces are not unusual; they are often seen in stack tracebacks: sigpause() indicates a process which is waiting for a signal (often an alarm clock). Process 4435 is doing a read() system call, which goes through soreceive() . This is socket code, network related .

 pid 5500  sw_bad(?)  _swtch(0x11800aa3,0xf826fb88,0xf814fc00,0x0,0x3,0x2f) + 80  _sleep(0xf8180bb0,0x28,0xf8272274,0x0,0x28,0xf8272274) + 1a0  _sigpause(0xf841bfe0,0x378,0xf815b640,0xf815b9b8,0xf841c000,0xf815b9b8) + 4c  _syscall(0xf841c000) + 3b4          pid 4435  sw_bad(?)  _swtch(0x11800ae3,0xf826ed94,0xf814fc00,0x0,0x3,0x35) + 80  _sleep(0xf8180bb8,0x1a,0x48,0x100,0x1a,0xf82724a8) + 1a0  _sbwait(0xff65d838,0x3,0x0,0x0,0x1,0xc) + 14  _soreceive(0xff65d80c,0x0,0xf8407ea4,0x0,0x4000,0x1000) + 28c  _soo_rw(0xf82660d0,0xf80609a8,0xf8407ea4,0xf8407ec0,0x1,0xf8407ea4) + 30  _rwuio(0xf82660d0,0xf8407ea4,0xf8407eb8,0xc,0xc,0xf8407ea4) + 2b0  _read(0xf8407fe0,0x18,0xf815b640,0xf815b658,0xf8408000,0xf815b658) + 34  _syscall(0xf8408000) + 3b4 

With the next trace, we're back to another stopped process, pid 27863. If you look at the ps output, this belongs to a process called <exiting> . In other words, it's trying to go away. This involves releasing all its memory, as we can see it is going through relvm() and anon_free() on its way to sleep() .

 pid 27863  sw_bad(?)  _swtch(0x11901ae3,0xf8272c00,0xf814fc00,0x0,0x3,0x0) + 80  _sleep(0xf8180c78,0x1,0x0,0x800,0x1,0xf8272ef0) + 1a0  _page_cv_wait(0xf82b932c,0x114010e5,0xf814fc00,0x0,0x258e0d6,0x1) + 14  _page_wait(0xf82b932c,0x1714000,0x1,0x0,0x1714000,0xfd84c2d0) + a4  _anon_decref(0xfd8809c8,0x11001ae7,0xfd8809c8,0x0,0xf81970d0,0xf82b932c) + 54  _anon_free(0xfd925638,0x1,0x0,0x3,0x0,0x2) + 2c  _segvn_free(0xfd8ce4d0,0xfd925638,0x0,0xfd8ab6b8,0xfd8d74c8,0xfd8a8ab4) + 100  _seg_free(0xfd8ce4d0,0xffffffff,0x12,0x2e8,0xf7795d44,0xfd8c20c0) + 80  _as_free(0xfd8c20c0,0x1000,0xfd8c20c0,0x0,0xf810521c,0x0) + 1c  _relvm(0xf8272ef0,0xf8272ef0,0x0,0x114010a3,0xf8280cf4,0x0) + 2c  _exit(0xff67ff80,0x270,0xf7803860,0xf8272ef0,0xf7795eb8,0x80) + a8  _rexit(0xf84b1fe0,0x8,0xf815b640,0xf815b648,0xf84b2000,0xf815b648) + 18  _syscall(0xf84b2000) + 3b4 

The fact that we have several processes apparently stuck while trying to access swap seems to be significant. As a final check, let's examine the stack traceback of the process that was running at the time of the halt and see if the system was perhaps stuck in some kernel loop. This is also a good example of how a stack traceback looks for a system that has been manually stopped.

First, the heading is displayed. We will be seeing eight local registers ( %l0 through %l7 ) followed by eight input registers ( %i0 through %i7 ).

 <sp$<stacktrace                  l0        l1        l2        l3                  l4        l5        l6        l7                  i0        i1        i2        i3                  i4        i5        i6        i7 

Following this output, each stack frame is printed twice. Note that the address on the left side is the same twice in a row.

 intstack+0x2ba8:                 f826fb88      f8182378      f8421000                   63              f8052420      63            f8148ba8                   63              f8173367      f8173367      0                   0               0             0             f8148c08                   f81200f8  intstack+0x2ba8:                 0xf826fb88    _panic_regs   0xf8421000                   0x63            _panic+0x6c   0x63          intstack+0x2ba8                   0x63            _va_cache_valid+0x367  _va_cache_valid+0x367                   0               0             0              0                   intstack+0x2c08               _vx_handler+0x50 

These top stack frames are all on the interrupt stack ( instack ) and are up in the PROM code.

 intstack+0x2c08:                 114036c4    ffe96300       ffe96304                   f8005f28        a           f8005c00       28                   f8148be0        ffefb6c8    f3             0                   0               0           f8172ff8       f8148c88                   ffe9d2bc  intstack+0x2c08:                 0x114036c4  0xffe96300     0xffe96304                   level10         0xa         int_rtt        0x28                   intstack+0x2be0 0xffefb6c8  0xf3           0                   0               0           _mon_clock14_vec+0x10                   intstack+0x2c88 0xffe9d2bc  intstack+0x2c88:                 ffea0d24    5              ffe94f08                   ffef0040        f81200a8    ffeaee58       ffefefc8                   ffefebdc        ffea7b14    119006c6       f814fc00                   1               0           f8280a54       f8148ce8                   f814343c  intstack+0x2c88:                 0xffea0d24  5              0xffe94f08                   0xffef0040      _vx_handler 0xffeaee58     0xffefefc8                   0xffefebdc      0xffea7b14  0x119006c6     intu+0x47c                   1               0           0xf8280a54     intstack+0x2ce8                   _prom_enter_mon+0xc 

The return address for this next frame (the last value printed, or %i7 ) is kbdinput() ”looks like a keyboard input function.

 intstack+0x2ce8:                 119006c6       f810b670       f810b674                   f8005f28        a              f8005c00       28                   f8148ca0        0              f8128410       4d                   13              13             f8172c00       f8148d48                   f8128378  intstack+0x2ce8:                 0x119006c6     _usec_delay+0x14                   _usec_delay+0x18               level10        0xa                   int_rtt         0x28           intstack+0x2ca00                   _kbdinput+0x608 0x4d           0x13           0x13                   _monthsec       intstack+0x2d48               _kbdinput+0x570  intstack+0x2d48:                 f8181c00       f8181c00       f804734c                   80              0              fd83b920       f8175afc                   fd83b920        f8175afc       4d             4d                   0               11f33d         1              f8148dc0                   f8127dc8  intstack+0x2d48:                 _strevent+0xe38_strevent+0xe38_xdballoc+4                   0x80            0              0xfd83b920     _keyindex_s4                   0xfd83b920      _keyindex_s4   0x4d           0x4d                   0               0x11f33d       1               intstack+0x2dc0                   _kbdrput+0x168 

The following frame returns to zsa_process() . The ZS (Zilog Serial) chip is the hardware that handles the input characters from the keyboard.

 intstack+0x2dc0:                 13            0             72                   1               0             f8189400      0                   1               fd818f68      fd8195e0      0                   fd83b900        fd8195e0      118006e0      f8148e20                   f812ebac  intstack+0x2dc0: 0x13            0             0x72                   1               0             _intrcnt+0xc0  0                   1               0xfd818f68    0xfd8195e0     0                   0xfd83b900      0xfd8195e0    0x118006e0     intstack+0x2e20                   _zsa_process+0x228  intstack+0x2e20:                 1             fd818f69       3f7a0                   4d              6ce           fd817050       fd801920                   1               fd8018d0      1              fd8195e0                   fd801b70        1             0              f8148e80                   f812e8e4  intstack+0x2e20:                 1             0xfd818f69     0x3f7a0                   0x4d            0x6ce         0xfd817050     0xfd801920                   1               0xfd8018d0    1              0xfd8195e0                   0xfd801b70      1             0              intstack+0x2e80                   _zspoll+0x58 

Keyboard input is actually interpreted on every clock tick, as evidenced by the next two return addresses in the next two frames.

 intstack+0x2e80:                 1             885           4                   bc              14            c             f8189000                   14              0             0             f814fc00                   110001e2        0             fd8018d0      f8148ee0                   f803c658  intstack+0x2e80:                 1             0x885         4                   0xbc            0x14          0xc           _ktextseg+8                   0x14            0             0             intu+0x47c                   0x110001e2      0             0xfd8018d0    intstack+0x2ee0                   _softclock+0x80  intstack+0x2ee0:                 f8180400      20            f8180400                   20              80000000      40            0                   a               114000c5      f812e88c      0                   0               114001e3      f8280c64      f8148f40                   f803c4dc  intstack+0x2ee0:                 _rtable+0xa0  0x20          _rtable+0xa0                   0x20            0x80000000    0x4           0                   0xa             0x114000c5    _zspoll       0                   0               0x114001e3    0xf8280c64    intstack+0x2f40                   _hardclock+0x564 

The return address in the next frame is to an assembly language function, level10() , corresponding to the actual interrupt handler for the clock tick.

 intstack+0x2f40:                 f8177ab0      2710           57e40                   57e40           f4240         f8189400       0                   1               f810b938      114000c5       f7662914                   1               f7662914      f8280c64       f8148fa0                   f8005f80  intstack+0x2f40:                 _time         0x2710         0x57e40                   0x57e40         0xf4240       _intrcnt+0xc0  0                   1               _idle+0x34    0x114000c5     0xf7662914                   1               0xf7662914    0xf8280c64     intstack+0x2fa0                   level10+0x58 

This next frame is the first one that gets put on the interrupt stack.

.

 intstack+0x2fa0:                 11400ac5      f810b938      f810b93c                   f8005f28        a             f8005c00      28                   f8420cf8        11800aa6      3e28          0                   0               0             38            f8420da0                   f8043c88  intstack+0x2fa0:                 0x11400ac5    _idle+0x34    _idle+0x38                   level10         0xa           int_rtt       0x28                   0xf8420cf8      0x11800aa6    0x3e28        0                   0               0             0x38          0xf8420da0                   _setsigvec+0x1c8 

The return address for the next frame is in sleep() , on the regular stack. It looks like this process was giving up the CPU, if it was, in fact, active at all.

 0xf8420da0:     118000a6        0             0             100                  f8421000        0             ffffdfff      0                  11800aa7        f82720fc      f814fc00      0                  3               1b            f8420e00      f8045c6c  0xf8420da0:     0x118000a6      0             0             0x100                  0xf8421000      0             0xffffdfff    0                  0x11800aa7      0xf82720fc    intu+0x47c    0                  3               0x1b          0xf8420e00    _sleep+0x1a0  0xf8420e00:     3d              f8177400      f8180bb0      0                  1               f8421000      60d27         0                  f8180bb0        28            f826fb88      0                  28              f826fb88      f8420e60      f8043dc4  0xf8420e00:     0x3d            _utsname+0x38 _slpque       0                  1               0xf8421000    0x60d27       0                  _slpque         0x28          0xf826fb88    0                  0x28            0xf826fb88    0xf8420e60    _sigpause+0x4c  0xf8420e60:     0               fffefeff      f8421000      f826fb88                  0               f826fb88      0             f76e2d14                  f8420fe0        378           f815b640      f815b9b8                  f8421000        f815b9b8      f8420ec0      f8124410  0xf8420e60:     0               0xfffefeff    0xf8421000    0xf826fb88                  0               0xf826fb88    0             0xf76e2d14                  0xf8420fe0      0x378         _sysent       _sysent+0x378                  0xf8421000      _sysent+0x378 0xf8420ec0    _syscall+0x3b4  0xf8420ec0:     f8421000        f8421000      f826fb88      f815b9b8                  f8421000        f826fb88      0             f76e721c                  f8421000        f8420fb4      f8420fe0      f8421000                  f8421000        f8420fb4      f8420f58      f8005a54  0xf8420ec0:     0xf8421000      0xf8421000    0xf826fb88     sysent+0x378                  0xf8421000      0xf826fb88    0             xf76e721c                  0xf8421000      0xf8420fb4    0xf8420fe0    0xf8421000                  0xf8421000      0xf8420fb4    0xf8420f58    syscall+8  0xf8420f58:     11400082f       76e2894       f76e2898      20                  80              4             7             f8420f58                  0               f7fff380      0             0                  f774f788        0             f7fff2f8      f76f965c  0xf8420f58:     0x11400082      0xf76e2894    0xf76e2898    0x20                  0x80            4             7             0xf8420f58                  0               0xf7fff380    0             0                  0xf774f788      0             0xf7fff2f8    0xf76f965c 

The next frame and those following appear to be user-level stack frames. Note that the stack address is now in user space, and the return address is quite small ”a valid user code address, but certainly not an address in kernel space.

 0xf7fff2f8:     ffffdfff        f76629cc      3             c                  1               fd801c4c      ff003000      f774c05c                  0               0             f7662914      0                  f7662914        0             f7fff390      29a4  0xf7fff2f8:     0xffffdfff      0xf76629cc    3             0xc                  1               0xfd801c4c    0xff003000    0xf774c05c                  0               0             0xf7662914    0                  0xf7662914      0             0xf7fff390    0x29a4  0xf7fff390:     82bac           1             2             1                  f75e0000        f766572c      1             f800                  f75e0000        82914         f7662914      f800                  f7662914        14            f7fff430      22b4  0xf7fff390:     0x82bac         1             2             1                  0xf75e0000      0xf766572c    1             0xf800                  0xf75e0000      0x82914       0xf7662914    0xf800                  0xf7662914      0x14          0xf7fff430    0x22b  0xf7fff430:     11400080        f773a0d8      f773a0dc      0                  4               4             7             f8420f58                  3               f7fff4f4      f7fff504      c000                  0               0             f7fff490      2064  0xf7fff430:     0x11400080      0xf773a0d8    0xf773a0dc    0                  4               4             7             0xf8420f58                  3               0xf7fff4f4    0xf7fff504    0xc000                  0               0             0xf7fff490    0x2064  0xf7fff490:     0               0             0             0                  0               0             0             0                  0               0             0             0                  0               0             0             0  0xf7fff490:     0               0             0             0                  0               0             0             0                  0               0             0             0                  0               0             0             0 

And finally the stack runs out with no more data on it.

Take a look at the frame that was the first one on the interrupt stack. This should be a trap frame, or a special frame used for traps (including system calls) and interrupts. It contains the address of the instruction that was interrupted in register %l1 , the second number in the frame. Examining that frame, you see:

 intstack+0x2fa0:         0x11400ac5       _idle+0x34      _idle+0x38 

The actual address, which indicates where the CPU was executing when the clock ticked , appears to be in the kernel function called idle() . This is not exactly an indication that the system was working very hard. Thus, it appears that even though there are a number of runnable processes in the ps output, at the moment of the L1-A keystroke, nothing was actually happening. This implies that the D-state processes are in fact locked up, waiting for a resource that isn't available.

All the blocked processes were waiting for anonymous (swap) memory, which appeared to be related to or obtained from UFS disk blocks. One feature of Solaris is the ability to define a swap file, or a regular UNIX file, that can be used as additional swap space. It seems that the problem this system encountered is related to this feature.



PANIC. UNIX System Crash Dump Analysis Handbook
PANIC! UNIX System Crash Dump Analysis Handbook (Bk/CD-ROM)
ISBN: 0131493868
EAN: 2147483647
Year: 1994
Pages: 289
Authors: Chris Drake

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net