First, let's take a look at the strings output. Hiya... strings vmcore.0 more Generic_101318-45 Data fault P+ PRLR v@ ( X,DD XBAD TRAP: type=9 rp=f05d04c4 addr=3 mmu_fsr=3a6 rw=2 fm_flb: Data fault kernel write fault at addr=0x3, pme=0x0 MMU sfsr=3a6: Invalid Address on supv data store at level 3 pid=496, pc=0xf0048918, sp=0xf05d0510, psr=0x404000c7, context=408 g1-g7: 0, 0, ffffff00, 0, f05d09e0, 1, fc45c600 Begin traceback... sp = f05d0510 Called from f009e214, fp=f05d0680, args=f05d06e4 0 f05d06ec 0 0 fc01dd14 Called from f00e4790, fp=f05d06f0, args=0 0 1 0 f05d07f4 0 Called from f00e14c4, fp=f05d0848, args=3cad8 0 3 b40 f05d08ac 0 Called from f007095c, fp=f05d08b8, args=3cad8 3 b48 f05d0920 3cad8 f0156628 Called from f0041aa0, fp=f05d0938, args=f0160f3c f05d0eb4 0 f05d0e90 fffffffc ffffffff Called from ef75de2c, fp=efffea08, args=3cad8 2 efffeb48 1 0 f016b620 End traceback... panic: Data fault { We quit out of more at this point } Hiya... The system panic'ed due to a data fault, a bad trap type number 9. The output shown above is the same as the messages that appeared on the system console during the panic. Since this was our experimental crash, we already know which system it came from and which OS was running. For this panic, the authors used a Sun SPARCstation 20 running Solaris 2.3. So, using the same system, let's jump into adb and start examining the crash dump. Items of particular interest are underlined . Hiya... adb -k unix.0 vmcore.0 physmem 1e05 $c complete_panic(0xf0049460,0xf05d03ac,0xf05d0238,0x3,0x0,0x1) + 10c do_panic(?) + 1c vcmn_err(0xf015f7a8,0xf05d03ac,0xf05d03ac,0x3cad8,0x2,0x3) cmn_err(0x3,0xf015f7a8,0x0,0x18,0x18,0xf0152400) + 1c die(0x9,0xf05d04c4,0x3,0x3a6,0x2,0xf015f7a8) + 78 trap(0x9, 0xf05d04c4 ,0xf01822d8,0x3a6,0x2,0x0) + 598 fault(?) + 84 mutex_enter( 0x0 ,0xd,0x64,0x1,0xd,0xf05d06ec) lookuppn(0xf05d06e4,0x0,0xf05d06ec,0x0,0x0,0xfc01dd14) + 148 lookupname(0x0,0x0,0x1,0x0,0xf05d07f4,0x0) + 28 vn_open(0x3cad8,0x0,0x3,0xb40,0xf05d08ac,0x0) + a4 copen(0x3cad8,0x3,0xb48,0xf05d0920,0x3cad8,0xf0156628) + 70 syscall(0xf0160f3c) + 3e4 0xf05d04c4$<regs 0xf05d04c4: psr pc npc 404000c7 f0048918 f004891c 0xf05d04d0: y g1 g2 g3 20000000 0 0 ffffff00 0xf05d04e0: g4 g5 g6 g7 0 f05d09e0 1 fc45c600 0xf05d04f0: o0 o1 o2 o3 0 d 64 1 0xf05d0500: o4 o5 o6 o7 d f05d06ec f05d0510 f009e378 f0048918/i mutex_enter+4: ldstub [%o0 + 0x3], %g6 First, we get the stack traceback, and using the second parameter to the trap routine, we collect the trap registers. In the trap registers, we find the contents of the PC and the actual instruction that caused the trap and panic. The instruction that failed, ldstub , is a very special SPARC instruction because it can read and write to memory atomically. In other words, the ldstub instruction cannot be interrupted during its work. ldstub reads an unsigned byte from memory into a register, then writes all ones (hexadecimal 0xff) to that byte of memory. Specifically, the ldstub instruction that triggered the trap was trying to work with the memory location represented by [ %o0 + 0x3 ]. Since %o0 contained 0 according to the output from the regs macro, the ldstub instruction was trying to access location 0x3. The first page of memory is always off limits. Attempts to read or write to page zero result in data faults. In this case, we were trying to write to the page. If you look back at the messages, you will see that this was well reported in the diagnostic output from the trap() routine. |