Get initial information


Since we have four crashes, the approach we will take this time is to gather some initial information from each crash dump and see if we are going to be working with one type of problem or more.

 Hiya on zen...  cd /var/crash/zen  Hiya on zen...  ls -l  total 45777  -rw-rw-rw-  1 root            3 Oct  2 03:26 bounds  -rw-r--r--  1 root     10248192 Jul 25  1994 vmcore.11  -rw-r--r--  1 root     10436608 Aug 10 08:43 vmcore.12  -rw-r--r--  1 root      9052160 Sep 26 07:49 vmcore.13  -rw-r--r--  1 root      9732096 Oct  2 03:26 vmcore.14  -rw-r--r--  1 root      1832133 Jul 25  1994 vmunix.11  -rw-r--r--  1 root      1832133 Aug 10 08:43 vmunix.12  -rw-r--r--  1 root      1832133 Sep 26 07:49 vmunix.13  -rw-r--r--  1 root      1832133 Oct  2 03:26 vmunix.14  Hiya on zen... 

The first thing we can assume, based on the time stamps on the crash dump files, is that the problem doesn't appear to occur on a regular basis.

Let's look at the crash 11.

 Hiya on zen...  adb -k vmunix.11 vmcore.11  physmem 1ff8  hostname/s  _hostname:  _hostname:      zen  *boottime=Y  1994 May 31 11:13:37  *time=Y  1994 Jul 23 03:31:14  $c  _panic(0xf8174abb,0xf817e784,0xf817e798,0x283,0xa000,0xf84c6b98) + 6c  _assfail(0xf817e784,0xf817e798,0x283,0xfcfdbdb0,0x1,0xf83817a8) + 3c  _pvn_vplist_dirty(0xfcfdbdb0,0x0,0x0,0xf83817a8,0xf8359758,0xfcfdbdb0) + 250  _ufs_putpage(0xfcfdbdb0,0xfcfdbda8,0x0,0x0,0x0,0xf82f43d8) + 33c  _ufs_l_putpage(0xfcfdbdb0,0x0,0x0,0x0,0x0,0x0) + 30  _syncip(0xfcfdbda8,0x0,0x0,0x7,0x1101,0x0) + 124  _update(0xf81a7900,0x104e,0xfcfdbda8,0xfcf1e808,0x0,0xfcfdbda8) + 2f0  _ufs_sync(0x0,0xf8175ac8,0xf817d820,0xf81a5c48,0xf8175ac8,0xfce792c4) + 4  _sync(0xf84c6fe0,0x120,0xf8173468,0xf8173588,0xf84c7000,0xf8175a88) + 3c  _syscall(0xf84c7000) + 3b4  *panicstr/s  _prtmsgbuflines+0x87:           assertion failed  $<msgbuf  0xf8002000:     magic           size            bufx            bufr                  63062           1bf0            af5             8e5  0xf80028f5:  id010e: block 134678(134678 abs): read: Uncorrectable Data Check   assertion failed: pp->p_vnode == NULL, file: ../../vm/vm_pvn.c,   line: 643  panic: assertion failed                  syncing file systems... done                  00353 low-memory static kernel pages                  00558 additional static and sysmap kernel pages                  00000 dynamic kernel data pages                  00223 additional user structure pages                  00000 segmap kernel pages                  00000 segvn kernel pages                  00002 current user process pages                  00112 user stack pages                  01248 total pages (1248 chunks)                  dumping to vp fce25f4c, offset 239514  $q  Hiya on zen... 

The system had been up nearly two months before crashing.

Notice the panic messages in the message buffer. These are a good example of verbose panic messages that include information about where to find the call to panic() in the source code. They may be useless to some of us, but to those with the source code, they are a helpful aid in problem diagnosis.

Let's look at crash 12.

 Hiya on zen...  adb -k vmunix.12 vmcore.12  physmem 1ff8  hostname/s  _hostname:  _hostname:      zen  *boottime=Y  1994 Jul 25 08:07:30  *time=Y  1994 Aug 10 03:23:05  $c  _panic(0xf8174abb,0xf817e784,0xf817e798,0x283,0xa000,0xf84ccb98) + 6c  _assfail(0xf817e784,0xf817e798,0x283,0xfd0cd5a8,0x1,0xf8385088) + 3c  _pvn_vplist_dirty(0xfd0cd5a8,0x0,0x0,0xf8385088,0xf8386b40,0xfd0cd5a8) + 250  _ufs_putpage(0xfd0cd5a8,0xfd0cd5a0,0x0,0x0,0x0,0xf82f607c) + 33c  _ufs_l_putpage(0xfd0cd5a8,0x0,0x0,0x0,0x0,0x0) + 30  _syncip(0xfd0cd5a0,0x0,0x0,0x4e,0x1101,0x0) + 124  _update(0xf81a7900,0x104e,0xfd0cd5a0,0xfcfaf218,0x0,0xfd0cd5a0) + 2f0  _ufs_sync(0x0,0xf8175ac8,0xf817d820,0xf81a5c48,0xf8175ac8,0xfcea1d44) + 4  _sync(0xf84ccfe0,0x120,0xf8173468,0xf8173588,0xf84cd000,0xf8175a88) + 3c  _syscall(0xf84cd000) + 3b4  *panicstr/s  _prtmsgbuflines+0x87:           assertion failed  $<msgbuf  0xf8002000:     magic           size            bufx            bufr                  63062           1bf0            241             31  0xf8002041:  id010e: block 134678(134678 abs): read: Uncorrectable Data Check  assertion failed: pp->p_vnode == NULL, file: ../../vm/vm_pvn.c,                  line: 643                  panic: assertion failed                  syncing file systems... done  (Remainder of output trimmed)  

We see that this is very much the same as crash 11.

Take a look at crash 11's crash time and crash 12's boot time. What does this tell us? The system was probably unable to reboot itself successfully without operator intervention. It appears that Zen was down until someone came into the office on Monday morning, July 25th.

So, we now have two crashes. Both involve id010e. What is id010e? That is one of the IPI disk drives . Both crashes specifically report problems with block 134678 of id010e.

What will crash 13 tell us? Let's see.

 Hiya on zen...  adb -k vmunix.13 vmcore.13  physmem 1ff8  hostname/s  _hostname:  _hostname:      zen  *boottime/Y  data address not found  *boottime=Y  1994 Sep  7 11:58:24  *time=Y  1994 Sep 26 03:21:01  $c  _panic(0xf8174abb,0xf817e784,0xf817e798,0x283,0xa000,0xf84c6b98) + 6c  _assfail(0xf817e784,0xf817e798,0x283,0xfce7af78,0x1,0xf8355bf8) + 3c  _pvn_vplist_dirty(0xfce7af78,0x0,0x0,0xf8355bf8,0xf8355c20,0xfce7af78) + 250  _ufs_putpage(0xfce7af78,0xfce7af70,0x0,0x0,0x0,0xf82f5b58) + 33c  _ufs_l_putpage(0xfce7af78,0x0,0x0,0x0,0x0,0x0) + 30  _syncip(0xfce7af70,0x0,0x0,0x7,0x1101,0x0) + 124  _update(0xf81a7900,0x104e,0xfce7af70,0xfcf9c6a8,0x0,0xfce7af70) + 2f0  _ufs_sync(0x0,0xf8175ac8,0xf817d820,0xf81a5c48,0xf8175ac8,0xfceb206c) + 4  _sync(0xf84c6fe0,0x120,0xf8173468,0xf8173588,0xf84c7000,0xf8175a88) + 3c  _syscall(0xf84c7000) + 3b4  $<msgbuf  0xf8002000:     magic           size            bufx            bufr                  63062           1bf0            297             87  0xf8002097:  id010e: block 134678(134678 abs): read: Uncorrectable Data Check  assertion failed: pp->p_vnode == NULL, file: ../../vm/vm_pvn.c,                  line: 643                  panic: assertion failed  (Remainder of output trimmed)  

This certainly looks like the other crashes. Another problem with block 134678 on drive id010e.

Sure! By now you're screaming "Fix the disk!" However, with Zen being a semi- retired system, you have to remember that no one may be taking much notice of the crashes.

By the way, did you notice the boot time on this crash? Did you compare it to the time of the last crash? What was the system doing between August 10th and September 7th? Chances are savecore had been disabled, then re-enabled. Or, maybe the system was actually shutdown for one reason or another: backups , hardware maintenance, or maybe even system relocation.

Finally, let's take a look at the 14th crash. Want to guess what we're going to see there?

 Hiya on zen...  adb -k vmunix.14 vmcore.14  physmem 1ff8  *boottime=Y  1994 Sep 26 03:21:50  *time=Y  1994 Oct  2 03:24:47  $c  _panic(0xf8174abb,0xf817e784,0xf817e798,0x283,0xa000,0xf84deb98) + 6c  _assfail(0xf817e784,0xf817e798,0x283,0xfce4a2a8,0x1,0xf834a500) + 3c  _pvn_vplist_dirty(0xfce4a2a8,0x0,0x0,0xf834a500,0xf8381e88,0xfce4a2a8) + 250  _ufs_putpage(0xfce4a2a8,0xfce4a2a0,0x0,0x0,0x0,0xf82f5fc0) + 33c  _ufs_l_putpage(0xfce4a2a8,0x0,0x0,0x0,0x0,0x0) + 30  _syncip(0xfce4a2a0,0x0,0x0,0x7,0x1101,0x0) + 124  _update(0xf81a7900,0x104e,0xfce4a2a0,0xfce779d8,0x0,0xfce4a2a0) + 2f0  _ufs_sync(0x0,0xf8175ac8,0xf817d820,0xf81a5c48,0xf8175ac8,0xfce71dbc) + 4  _sync(0xf84defe0,0x120,0xf8173468,0xf8173588,0xf84df000,0xf8175a88) + 3c  _syscall(0xf84df000) + 3b4  $<msgbuf  0xf8002000:     magic           size            bufx            bufr                  63062           1bf0            1b7d            196d  0xf800397d:    id010e: block 134678 (134678 abs): read: Uncorrectable Data Check                  assertion failed: pp->p_vnode == NULL, file: ../../vm/vm_pvn.c,                  line: 643                  panic: assertion failed  (Remainder of output trimmed)  

No surprises there.



PANIC. UNIX System Crash Dump Analysis Handbook
PANIC! UNIX System Crash Dump Analysis Handbook (Bk/CD-ROM)
ISBN: 0131493868
EAN: 2147483647
Year: 1994
Pages: 289
Authors: Chris Drake

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net