Since we have four crashes, the approach we will take this time is to gather some initial information from each crash dump and see if we are going to be working with one type of problem or more. Hiya on zen... cd /var/crash/zen Hiya on zen... ls -l total 45777 -rw-rw-rw- 1 root 3 Oct 2 03:26 bounds -rw-r--r-- 1 root 10248192 Jul 25 1994 vmcore.11 -rw-r--r-- 1 root 10436608 Aug 10 08:43 vmcore.12 -rw-r--r-- 1 root 9052160 Sep 26 07:49 vmcore.13 -rw-r--r-- 1 root 9732096 Oct 2 03:26 vmcore.14 -rw-r--r-- 1 root 1832133 Jul 25 1994 vmunix.11 -rw-r--r-- 1 root 1832133 Aug 10 08:43 vmunix.12 -rw-r--r-- 1 root 1832133 Sep 26 07:49 vmunix.13 -rw-r--r-- 1 root 1832133 Oct 2 03:26 vmunix.14 Hiya on zen... The first thing we can assume, based on the time stamps on the crash dump files, is that the problem doesn't appear to occur on a regular basis. Let's look at the crash 11. Hiya on zen... adb -k vmunix.11 vmcore.11 physmem 1ff8 hostname/s _hostname: _hostname: zen *boottime=Y 1994 May 31 11:13:37 *time=Y 1994 Jul 23 03:31:14 $c _panic(0xf8174abb,0xf817e784,0xf817e798,0x283,0xa000,0xf84c6b98) + 6c _assfail(0xf817e784,0xf817e798,0x283,0xfcfdbdb0,0x1,0xf83817a8) + 3c _pvn_vplist_dirty(0xfcfdbdb0,0x0,0x0,0xf83817a8,0xf8359758,0xfcfdbdb0) + 250 _ufs_putpage(0xfcfdbdb0,0xfcfdbda8,0x0,0x0,0x0,0xf82f43d8) + 33c _ufs_l_putpage(0xfcfdbdb0,0x0,0x0,0x0,0x0,0x0) + 30 _syncip(0xfcfdbda8,0x0,0x0,0x7,0x1101,0x0) + 124 _update(0xf81a7900,0x104e,0xfcfdbda8,0xfcf1e808,0x0,0xfcfdbda8) + 2f0 _ufs_sync(0x0,0xf8175ac8,0xf817d820,0xf81a5c48,0xf8175ac8,0xfce792c4) + 4 _sync(0xf84c6fe0,0x120,0xf8173468,0xf8173588,0xf84c7000,0xf8175a88) + 3c _syscall(0xf84c7000) + 3b4 *panicstr/s _prtmsgbuflines+0x87: assertion failed $<msgbuf 0xf8002000: magic size bufx bufr 63062 1bf0 af5 8e5 0xf80028f5: id010e: block 134678(134678 abs): read: Uncorrectable Data Check assertion failed: pp->p_vnode == NULL, file: ../../vm/vm_pvn.c, line: 643 panic: assertion failed syncing file systems... done 00353 low-memory static kernel pages 00558 additional static and sysmap kernel pages 00000 dynamic kernel data pages 00223 additional user structure pages 00000 segmap kernel pages 00000 segvn kernel pages 00002 current user process pages 00112 user stack pages 01248 total pages (1248 chunks) dumping to vp fce25f4c, offset 239514 $q Hiya on zen... The system had been up nearly two months before crashing. Notice the panic messages in the message buffer. These are a good example of verbose panic messages that include information about where to find the call to panic() in the source code. They may be useless to some of us, but to those with the source code, they are a helpful aid in problem diagnosis. Let's look at crash 12. Hiya on zen... adb -k vmunix.12 vmcore.12 physmem 1ff8 hostname/s _hostname: _hostname: zen *boottime=Y 1994 Jul 25 08:07:30 *time=Y 1994 Aug 10 03:23:05 $c _panic(0xf8174abb,0xf817e784,0xf817e798,0x283,0xa000,0xf84ccb98) + 6c _assfail(0xf817e784,0xf817e798,0x283,0xfd0cd5a8,0x1,0xf8385088) + 3c _pvn_vplist_dirty(0xfd0cd5a8,0x0,0x0,0xf8385088,0xf8386b40,0xfd0cd5a8) + 250 _ufs_putpage(0xfd0cd5a8,0xfd0cd5a0,0x0,0x0,0x0,0xf82f607c) + 33c _ufs_l_putpage(0xfd0cd5a8,0x0,0x0,0x0,0x0,0x0) + 30 _syncip(0xfd0cd5a0,0x0,0x0,0x4e,0x1101,0x0) + 124 _update(0xf81a7900,0x104e,0xfd0cd5a0,0xfcfaf218,0x0,0xfd0cd5a0) + 2f0 _ufs_sync(0x0,0xf8175ac8,0xf817d820,0xf81a5c48,0xf8175ac8,0xfcea1d44) + 4 _sync(0xf84ccfe0,0x120,0xf8173468,0xf8173588,0xf84cd000,0xf8175a88) + 3c _syscall(0xf84cd000) + 3b4 *panicstr/s _prtmsgbuflines+0x87: assertion failed $<msgbuf 0xf8002000: magic size bufx bufr 63062 1bf0 241 31 0xf8002041: id010e: block 134678(134678 abs): read: Uncorrectable Data Check assertion failed: pp->p_vnode == NULL, file: ../../vm/vm_pvn.c, line: 643 panic: assertion failed syncing file systems... done (Remainder of output trimmed) We see that this is very much the same as crash 11. Take a look at crash 11's crash time and crash 12's boot time. What does this tell us? The system was probably unable to reboot itself successfully without operator intervention. It appears that Zen was down until someone came into the office on Monday morning, July 25th. So, we now have two crashes. Both involve id010e. What is id010e? That is one of the IPI disk drives . Both crashes specifically report problems with block 134678 of id010e. What will crash 13 tell us? Let's see. Hiya on zen... adb -k vmunix.13 vmcore.13 physmem 1ff8 hostname/s _hostname: _hostname: zen *boottime/Y data address not found *boottime=Y 1994 Sep 7 11:58:24 *time=Y 1994 Sep 26 03:21:01 $c _panic(0xf8174abb,0xf817e784,0xf817e798,0x283,0xa000,0xf84c6b98) + 6c _assfail(0xf817e784,0xf817e798,0x283,0xfce7af78,0x1,0xf8355bf8) + 3c _pvn_vplist_dirty(0xfce7af78,0x0,0x0,0xf8355bf8,0xf8355c20,0xfce7af78) + 250 _ufs_putpage(0xfce7af78,0xfce7af70,0x0,0x0,0x0,0xf82f5b58) + 33c _ufs_l_putpage(0xfce7af78,0x0,0x0,0x0,0x0,0x0) + 30 _syncip(0xfce7af70,0x0,0x0,0x7,0x1101,0x0) + 124 _update(0xf81a7900,0x104e,0xfce7af70,0xfcf9c6a8,0x0,0xfce7af70) + 2f0 _ufs_sync(0x0,0xf8175ac8,0xf817d820,0xf81a5c48,0xf8175ac8,0xfceb206c) + 4 _sync(0xf84c6fe0,0x120,0xf8173468,0xf8173588,0xf84c7000,0xf8175a88) + 3c _syscall(0xf84c7000) + 3b4 $<msgbuf 0xf8002000: magic size bufx bufr 63062 1bf0 297 87 0xf8002097: id010e: block 134678(134678 abs): read: Uncorrectable Data Check assertion failed: pp->p_vnode == NULL, file: ../../vm/vm_pvn.c, line: 643 panic: assertion failed (Remainder of output trimmed) This certainly looks like the other crashes. Another problem with block 134678 on drive id010e. Sure! By now you're screaming "Fix the disk!" However, with Zen being a semi- retired system, you have to remember that no one may be taking much notice of the crashes. By the way, did you notice the boot time on this crash? Did you compare it to the time of the last crash? What was the system doing between August 10th and September 7th? Chances are savecore had been disabled, then re-enabled. Or, maybe the system was actually shutdown for one reason or another: backups , hardware maintenance, or maybe even system relocation. Finally, let's take a look at the 14th crash. Want to guess what we're going to see there? Hiya on zen... adb -k vmunix.14 vmcore.14 physmem 1ff8 *boottime=Y 1994 Sep 26 03:21:50 *time=Y 1994 Oct 2 03:24:47 $c _panic(0xf8174abb,0xf817e784,0xf817e798,0x283,0xa000,0xf84deb98) + 6c _assfail(0xf817e784,0xf817e798,0x283,0xfce4a2a8,0x1,0xf834a500) + 3c _pvn_vplist_dirty(0xfce4a2a8,0x0,0x0,0xf834a500,0xf8381e88,0xfce4a2a8) + 250 _ufs_putpage(0xfce4a2a8,0xfce4a2a0,0x0,0x0,0x0,0xf82f5fc0) + 33c _ufs_l_putpage(0xfce4a2a8,0x0,0x0,0x0,0x0,0x0) + 30 _syncip(0xfce4a2a0,0x0,0x0,0x7,0x1101,0x0) + 124 _update(0xf81a7900,0x104e,0xfce4a2a0,0xfce779d8,0x0,0xfce4a2a0) + 2f0 _ufs_sync(0x0,0xf8175ac8,0xf817d820,0xf81a5c48,0xf8175ac8,0xfce71dbc) + 4 _sync(0xf84defe0,0x120,0xf8173468,0xf8173588,0xf84df000,0xf8175a88) + 3c _syscall(0xf84df000) + 3b4 $<msgbuf 0xf8002000: magic size bufx bufr 63062 1bf0 1b7d 196d 0xf800397d: id010e: block 134678 (134678 abs): read: Uncorrectable Data Check assertion failed: pp->p_vnode == NULL, file: ../../vm/vm_pvn.c, line: 643 panic: assertion failed (Remainder of output trimmed) No surprises there. |