Process status | PANIC! UNIX System Crash Dump Analysis Handbook (Bk/CD-ROM)

Now that we've seen that the core dump appears to match the circumstances, we can continue and look at things with some confidence. Since this is a hang, one of the first steps should be to look at the processes on the machine and see if any are stuck.

With Solaris 1 systems, we can use the ps command to look at a core file, then check for processes in a "D" state ("disk-wait"). If the system is locked up due to a lack of resources or some sort of deadlock, the D-state processes are likely candidates.

Some of the processes in the following ps and adb traceall output have been deleted for space reasons, so this listing is a little short. Yours may well look a lot longer, especially if the system was busy at the time.

 zatch 13:  ps -axk vmunix.2 vmcore.2  PID TT STAT  TIME COMMAND      0 ?  D     8:03 swapper      1 ?  I     4:59 /sbin/init -     2 ?  D     0:20 pagedaemon     55 ?  RW    2:46 portmap     60 ?  IW    0:00 keyserv     69 ?  I     1:13 in.routed     72 ?  I     0:03  (biod)     73 ?  I     0:03  (biod)     74 ?  I     0:03  (biod)     75 ?  I     0:03  (biod)     86 ?  RW    0:01 syslogd     98 ?  RW    0:08 /usr/lib/sendmail -bd -q1h    104 ?  IW    0:01 rpc.mountd -n    106 ?  I     0:00  (nfsd)    107 ?  I     0:00  (nfsd)    108 ?  I     0:00  (nfsd)    109 ?  IW    0:01 rarpd -a    110 ?  IW    0:00 rarpd -a    111 ?  I     0:00  (nfsd)    113 ?  I     0:00  (nfsd)    114 ?  I     0:00  (nfsd)    115 ?  I     0:00  (nfsd)    116 ?  I     0:00  (nfsd)    117 ?  IW    0:00 rpc.bootparamd    120 ?  IW    0:00 rpc.statd    122 ?  IW    0:00 rpc.lockd    128 ?  RW   18:25 automount -m -f /etc/auto.master    132 ?  D   532:35 update    135 ?  RW    0:02 cron    141 ?  RW    0:54 inetd    145 ?  IW    0:00 /usr/lib/lpd   5346 co IW    0:00 -csh (csh)   5357 co IW    0:00 sh /home/sat/.openwin   5358 co IW    0:00 /bin/sh /usr/openwin/bin/openwin -dev /dev/fb   5362 co IW    0:00 /usr/openwin/bin/xinit -- /usr/openwin/bin/xnews :0   5363 co D    67:05 /usr/openwin/bin/xnews :0 -dev /dev/fb   5370 co IW    0:00 sh /home/sat/.xinitrc   5382 co IW    0:04 mwm -multiscreen -display :0.0 :0.1 :0.2   5385 co IW    0:00 calctool -display :0.1 -title CALC -scale medium   5387 co IW    0:00 sv_xv_sel_svc   5388 co IW    0:00 vkbd -nopopup   5389 co IW    0:00 dsdm   5391 co IW    0:00 calctool -display :0.2 -title CALC -scale medium   5393 co IW    0:00 calctool -display :0.0 -title CALC -scale medium   5394 co IW    0:00 xterm -C -name SystemConsole -title System Console -sb   5408 co IW    1:36 xterm -sb -font -misc-*-medium-r-normal-*-15-120-*-*-*-*-*    148 ?  IW    0:00 - std.9600 ttya (getty)   5398 ?  IW    0:00 -sh (csh)   5409 ?  IW    0:00 -sh (csh)  ?  RW    0:00 sleep 30   4434 ?  R     0:25 gendis second   4435 ?  I     0:30 recdis second   4436 ?  R     0:53 gendis first   4437 ?  I     1:04 recdis first   5436 ?  IW    0:00 csh go   5438 ?  IW    0:00 /bin/csh go   5440 ?  I     3:19 ranger/trcque go.out   5487 ?  IW    0:30 /bin/sh ps_repeat -p mwm -d 30   5489 ?  R     0:02 wsini   5497 ?  IW    0:00 wprntfhost .0   5498 ?  IW    0:00 wprntfhost .1   5499 ?  IW    0:00 wprntfhost .2   5500 ?  R     0:04 csupv primary FICC_PRIMARY_MMF   5502 ?  RW    0:01 resync   5503 ?  Z     0:00 <defunct>   5504 ?  R     0:01 icon32   5505 ?  R     0:05 colorhost .0   5507 ?  R     0:05 colorhost .1   5508 ?  R     0:05 colorhost .2   5513 ?  IW    0:00 wprntfhost .1   5514 ?  IW    0:00 wprntfhost .2   5515 ?  IW    0:00 wprntfhost .0   5516 ?  IW    0:00 colorhost .0   5517 ?  IW    0:00 colorhost .2   5518 ?  I     0:30 colorhost .0   5519 ?  I     0:33 colorhost .2   5520 ?  IW    0:00 colorhost .1   5521 ?  I     0:33 colorhost .1   5522 ?  I    27:11 keys .0   5523 ?  R     0:06 navwin .0   5524 ?  D    23:48 crtout .0 -sync   5525 ?  R     0:05 navwin .1   5526 ?  R     0:14 crtout .1 -sync   5527 ?  R     0:05 navwin .2   5528 ?  R     0:14 crtout .2 -sync   5529 ?  D     0:14 taskmon  ?  RW    0:00 sleep 30  ?  D     0:00 play -v 25 dialtone.au  ?  D     0:00 <exiting>

Let's note a few things about this ps output.

First of all, there are a few processes in D state, but not a lot. However, one of them is OpenWindows, which might account for the console freezing up. We also have the swapper and pagedaemon in D state. Although this is fairly normal, it may be worth looking at the stack traceback to see why they are stopped .

There is nothing at all in DW state, which could mean that the system froze up, and before it had a chance to swap everything out, the alert user at the console decided to kill the system and get a crash. It might also mean that the freeze was related to swapping, and nothing could get swapped in or out.

There are also quite a few processes in R or RW state. These are runnable programs. The relatively high number makes it look like the system was fairly active at the time of the freeze. However, we have no way of knowing how long those programs were in this state. If a program is waiting, say, for input from the network and the machine crashes, the network does not stop. Even though the panic function is writing out the contents of memory onto disk, a chunk of data may arrive from the Ethernet and cause a process to be "awakened," or put back on the run queue. Interrupts are not disabled, so some of these runnable processes may have become runnable during the time the system was trying to shut itself down.

We should also note that this is obviously a networked system, as we have four biod processes running (so this is an NFS server) and eight nfsd s running (so this is an NFS client). None of these are in IW state or D state, which means that the network has been fairly busy because all the processes were active recently, and none of them seem to be stuck. To be sure, though, we should check the stack traces for these.