What's still useful in the stack traceback?Looking at the stack traceback and the expected arguments as shown above, it becomes fairly obvious that the first argument in the call lookupname () is not a pointer to a filename. So, there are two questions that need to be asked.
The lookupname() routineLet's look at the assembly code for lookupname() and see if we can answer the questions. lookupname,12/ia lookupname: save %sp, -0x70, %sp lookupname+4: add %fp, -0xc, %o2 lookupname+8: mov %i1, %o1 lookupname+0xc: call pn_get lookupname+0x10: mov %i0, %o0 lookupname+0x14: orcc %g0, %o0, %i0 lookupname+0x18: bne lookupname + 0x3c lookupname+0x1c: mov %i4, %o3 lookupname+0x20: mov %i3, %o2 lookupname+0x24: mov %i2, %o1 lookupname+0x28: call lookuppn lookupname+0x2c: add %fp, -0xc, %o0 lookupname+0x30: mov %o0, %i0 lookupname+0x34: call pn_free lookupname+0x38: add %fp, -0xc, %o0 lookupname+0x3c: ret lookupname+0x40: restore lookuppn: save %sp, -0x170, %sp lookuppn+4: Right at the start, the lookupname() routine calls pn_get() . Upon returning, we OR the contents of %o0 with %g0 and store the result in %i0 , thus overwriting the first argument to lookupname() . Since we are ORing %o0 with the /dev/nul l of registers, %g0 , placing the results in %i0 , we are really copying the contents of %o0 to %i0 . However, because it's an orcc instruction, condition bits are being set in the Processor Status Register, PSR . These bits are tested by the next instruction, bne , branch if not equal. In effect, this code tests to see if %o0 is equal to zero. If not, we exit lookupname() immediately. What does this sequence suggest? It suggests that pn_get() sends back a return code of zero or non-zero to the caller. The pn_get() routineWhat did the call to pn_get() do? If for no other reason than pure curiosity , we should take a look at it. Here it is. pn_get,20/ia pn_get: save %sp, -0x60, %sp pn_get+4: call pn_alloc pn_get+8: mov %i2, %o0 pn_get+0xc: orcc %g0, %i1, %g0 pn_get+0x10: bne pn_get + 0x30 pn_get+0x14: mov 0x400, %o2 pn_get+0x18: mov %i0, %o0 pn_get+0x1c: ld [%i2 + 0x4], %o1 pn_get+0x20: call copyinstr pn_get+0x24: add %i2, 0x8, %o3 pn_get+0x28: ba pn_get + 0x44 pn_get+0x2c: ld [%i2 + 0x8], %l6 pn_get+0x30: mov %i0, %o0 pn_get+0x34: ld [%i2 + 0x4], %o1 pn_get+0x38: call copystr pn_get+0x3c: add %i2, 0x8, %o3 pn_get+0x40: ld [%i2 + 0x8], %l6 pn_get+0x44: orcc %g0, %o0, %i0 pn_get+0x48: sub %l6, 0x1, %l6 pn_get+0x4c: be pn_get + 0x5c pn_get+0x50: st %l6, [%i2 + 0x8] pn_get+0x54: call pn_free pn_get+0x58: mov %i2, %o0 pn_get+0x5c: ret pn_get+0x60: restore pn_set: save %sp, -0x60, %sp (output trimmed) The pn_get() routine, based on the value of input register %i1 , calls either copyinstr() or copystr() . The copy routines copy a string from a source to a destination. One is used for copying from user space to kernel space, the other works solely within kernel space. The argument in %i1 is listed as seg . What might this be? Well, this wouldn't be an easy one to guess at, so we'll tell you. It is an enumerated flag that shows what segment is being referenced. Look in /usr/include/sys/uio.h , and you'll find: /* * Segment flag values. */ typedef enum uio_seg { UIO_USERSPACE, UIO_SYSSPACE, UIO_USERISPACE } uio_seg_t; So, knowing this, we can see that if %i1 is zero ( seg is set to UIO_USERSPACE), then we call copyinstr() . Otherwise, we use copystr() to copy the string. When pn_get() has completed the copy, it modifies %i0 . If we were to follow this further, we would find that the copy routines also modify the calling parameters. The call to pn_get() in lookupname() is nearly immediately followed by a call to lookuppn() , so let's find out what it does next. The lookuppn() routineThe lookuppn() routine, according to the stack traceback, does appear to have a valid memory address as the first argument. This is listed as being a pnp , a pointer to a pn , probably a pathname. Let's see what we find there and if it looks valid. We will be looking at address 0xf05d06e4, the value of the first calling parameter to lookuppn() , as seen in the $c output we got earlier. 0xf05d06e4/X 0xf05d06e4: fc26a000 fc26a000/X ledmadelay+0x404: 2f646576 ./s ledmadelay+0x404: /dev/ticotsord The value 2f646576 looked like it was possibly an ASCII value, so we redisplayed at it as a string. Sure enough, we now have the name of the file that fm_flb was trying to open ! Now, we get to a trouble spot. According to $c, we called mutex_enter() from location lookuppn+0x148 . That's fairly deep down into the program, making variable-chasing a lot of work. Let's look at some of the assembly code first. lookuppn,20/ia lookuppn: save %sp, -0x170, %sp lookuppn+4: st %i2, [%fp + 0x4c] lookuppn+8: sethi %hi(0xf0165400), %o0 lookuppn+0xc: st %i3, [%fp + 0x50] lookuppn+0x10: add %o0, 0x264, %l2 lookuppn+0x14: ldsh [%g7 + 0x1a], %o3 lookuppn+0x18: sethi %hi(0xf00f1400), %o1 lookuppn+0x1c: add %o3, 0x1, %o3 lookuppn+0x20: sth %o3, [%g7 + 0x1a] lookuppn+0x24: ld [%g7 + 0x58], %o5 lookuppn+0x28: add %o1, 0x370, %l5 lookuppn+0x2c: ld [%o5 + 0xf4], %o0 lookuppn+0x30: add %o0, 0x1, %o0 lookuppn+0x34: st %o0, [%o5 + 0xf4] lookuppn+0x38: ldsh [%g7 + 0x1a], %o2 lookuppn+0x3c: sub %o2, 0x1, %o2 lookuppn+0x40: sll %o2, 0x10, %o2 lookuppn+0x44: sra %o2, 0x10, %o2 lookuppn+0x48: sth %o2, [%g7 + 0x1a] lookuppn+0x4c: orcc %g0, %o2, %g0 lookuppn+0x50: bne,a lookuppn + 0x78 lookuppn+0x54: ld [%fp + 0x4c], %l1 lookuppn+0x58: ld [%g7 + 0x58], %o0 lookuppn+0x5c: ldsb [%o0 + 0x4d], %o0 lookuppn+0x60: orcc %g0, %o0, %g0 lookuppn+0x64: be,a lookuppn + 0x78 lookuppn+0x68: ld [%fp + 0x4c], %l1 lookuppn+0x6c: call kpreempt lookuppn+0x70: clr %o0 lookuppn+0x74: ld [%fp + 0x4c], %l1 lookuppn+0x78: clr %i4 lookuppn+0x7c: orcc %g0, %l1, %g0 lookuppn+0x80: +,20 lookuppn+0x80: be lookuppn + 0x90 lookuppn+0x84: clr %l4 lookuppn+0x88: ba lookuppn + 0x94 lookuppn+0x8c: mov 0x1, %l7 lookuppn+0x90: clr %l7 lookuppn+0x94: ld [%g7 + 0xa0], %i5 lookuppn+0x98: ld [%i5 + 0x2d0], %i5 lookuppn+0x9c: call mutex_enter lookuppn+0xa0: mov %i5, %o0 lookuppn+0xa4: ld [%i5 + 0xc], %o1 lookuppn+0xa8: mov %i5, %o0 lookuppn+0xac: add %o1, 0x1, %o1 lookuppn+0xb0: call mutex_exit lookuppn+0xb4: st %o1, [%i5 + 0xc] lookuppn+0xb8: cmp %i1, 0x1 lookuppn+0xbc: bne,a lookuppn + 0xc8 lookuppn+0xc0: clr %o2 lookuppn+0xc4: mov 0x1, %o2 lookuppn+0xc8: ld [%fp + 0x50], %l0 lookuppn+0xcc: add %i0, 0x8, %i2 lookuppn+0xd0: add %i0, 0x4, %l3 lookuppn+0xd4: clr %i3 lookuppn+0xd8: mov %o2, %l6 lookuppn+0xdc: ld [%i2], %o5 lookuppn+0xe0: orcc %g0, %o5, %g0 lookuppn+0xe4: bne,a lookuppn + 0xf4 lookuppn+0xe8: ld [%i2], %o7 lookuppn+0xec: ba lookuppn + 0x684 lookuppn+0xf0: mov 0x2, %i1 lookuppn+0xf4: orcc %g0, %o7, %g0 lookuppn+0xf8: be,a lookuppn + 0x108 lookuppn+0xfc: clr %i1 lookuppn+0x100: +,20 lookuppn+0x100: ld [%l3], %i1 lookuppn+0x104: ldsb [%i1], %i1 lookuppn+0x108: cmp %i1, 0x2f lookuppn+0x10c: bne,a lookuppn + 0x184 lookuppn+0x110: sethi %hi(0xf0181800), %o1 lookuppn+0x114: call vn_rele lookuppn+0x118: mov %i5, %o0 lookuppn+0x11c: call pn_skipslash lookuppn+0x120: mov %i0, %o0 lookuppn+0x124: ld [%g7 + 0xa0], %o0 lookuppn+0x128: ld [%o0 + 0x2d4], %o0 lookuppn+0x12c: orcc %g0, %o0, %g0 lookuppn+0x130: be,a lookuppn + 0x144 lookuppn+0x134: sethi %hi(0xf0178c00), %i1 lookuppn+0x138: ld [%g7 + 0xa0], %i1 ! vph_mutex + 0x7f4c lookuppn+0x13c: ba lookuppn + 0x148 lookuppn+0x140: ld [%i1 + 0x2d4], %i1 lookuppn+0x144: ld [%i1 + 0x1d0], %i1 lookuppn+0x148: call mutex_enter lookuppn+0x14c: mov %i1, %o0 lookuppn+0x150: ld [%i1 + 0xc], %o7 lookuppn+0x154: mov %i1, %o0 lookuppn+0x158: add %o7, 0x1, %o7 lookuppn+0x15c: st %o7, [%i1 + 0xc] lookuppn+0x160: call mutex_exit lookuppn+0x164: mov %i1, %i5 lookuppn+0x168: sethi %hi(0xf0181800), %o0 lookuppn+0x16c: ld [%o0 + 0xe8], %o0 ! audit_active lookuppn+0x170: orcc %g0, %o0, %g0 lookuppn+0x174: be lookuppn + 0x1a0 lookuppn+0x178: mov 0x1, %o1 lookuppn+0x17c: ba lookuppn + 0x198 lookuppn+0x180: We can take two approaches at this point.
Working with source code, it's fairly easy to take the first approach. However, without source, we are making educated guesses about what the code might be doing. We may get it right, or we may end up way off track. Without source code, it is probably best to start at the point of failure and work backwards , which we will do shortly. The magic of %g7 in Solaris 2.3 & 2.4The user program, fm_flb , made a system call to open /dev/ticotsord . System calls are actually software traps (one of the good traps as compared to bad traps), and it is the system call software trap that brought us into kernel mode. On this particular hardware with this particular OS, since we trapped, %g7 has special meaning. %g7 contains the address of the thread that trapped. Note At this point, the trap code is down deep in the guts of the UNIX system. The source code involved is locore.s , which is pure assembly code. From vendor to vendor, release to release, architecture to architecture, do not expect to see or count on register %g7 being used in the same manner. For Sun's Solaris 2.3 and 2.4 on SPARC architectures, the trap software in locore.s does use %g7 this way. In lookuppn() , %g7 is used a lot. To help find the source of the panic, we are going to make use of %g7 . First, let's take another quick look at the area of code we are going to closely examine. lookuppn+0x124: ld [%g7 + 0xa0], %o0 lookuppn+0x128: ld [%o0 + 0x2d4], %o0 lookuppn+0x12c: orcc %g0, %o0, %g0 lookuppn+0x130: be,a lookuppn + 0x144 lookuppn+0x134: sethi %hi(0xf0178c00), %i1 lookuppn+0x138: ld [%g7 + 0xa0], %i1 ! vph_mutex + 0x7f4c lookuppn+0x13c: ba lookuppn + 0x148 lookuppn+0x140: ld [%i1 + 0x2d4], %i1 lookuppn+0x144: ld [%i1 + 0x1d0], %i1 lookuppn+0x148: call mutex_enter lookuppn+0x14c: mov %i1, %o0 As you'll recall, it is the calling parameter to mutex_enter() that caused trouble. So, we need to see if we can figure out what happens to %o0 , as this is where we will find the source of the trouble. At lookuppn+x0124 , we set %o0 to the address of the proc structure for the running thread. Using that, we set %o0 again to the contents of the proc structure + 0x24d. We will call this keyvalue . If you have the time and patience, you might try to figure out what variable is stored at offset 0x24d in the proc structure, although that isn't really so important at this stage. You will find that it is variable u_rdir , which is part of the u or user structure within the proc structure. Refer to /usr/include/sys/user.h and /usr/include/sys/proc.h for full details. According to the lookuppn() code, if keyvalue is zero, we branch to lookuppn+0x144 . The delay instruction, sethi , and the ld instruction at lookuppn+0x144 work together to put the value 0xf0178dd0 into %i1 . We then call mutex_enter() , copying 0xf0178dd0 into the output register as we go. If keyvalue is non-zero, we skip the sethi instruction. We do not execute the sethi in the delay slot by virtue of the ,a annul flag in the be instruction. The annul bit says that the sethi instruction is executed only when we branch. We execute the instructions at offsets 0x138, 0x13c, and 0x140. In effect, we are reloading keyvalue (the process's u_rdir value), this time putting it into register %i1 . We call mutex_enter() next, copying the value over to %o0 . (Yes, this is a bit redundant, but that's what the compiler came up with for final executable code.) Now, let's see which path was taken by the lookuppn() routine. First, we have to find out what u_rdir is set to. $c complete_panic(0xf0049460,0xf05d03ac,0xf05d0238,0x3,0x0,0x1) + 10c do_panic(?) + 1c vcmn_err(0xf015f7a8,0xf05d03ac,0xf05d03ac,0x3cad8,0x2,0x3) cmn_err(0x3,0xf015f7a8,0x0,0x18,0x18,0xf0152400) + 1c die(0x9,0xf05d04c4,0x3,0x3a6,0x2,0xf015f7a8) + 78 trap(0x9, 0xf05d04c4 ,0xf01822d8,0x3a6,0x2,0x0) + 598 fault(?) + 84 mutex_enter(0x0,0xd,0x64,0x1,0xd,0xf05d06ec) lookuppn(0xf05d06e4,0x0,0xf05d06ec,0x0,0x0,0xfc01dd14) + 148 lookupname(0x0,0x0,0x1,0x0,0xf05d07f4,0x0) + 28 vn_open(0x3cad8,0x0,0x3,0xb40,0xf05d08ac,0x0) + a4 copen(0x3cad8,0x3,0xb48,0xf05d0920,0x3cad8,0xf0156628) + 70 syscall(0xf0160f3c) + 3e4 0xf05d04c4$<regs 0xf05d04c4: psr pc npc 404000c7 f0048918 f004891c 0xf05d04d0: y g1 g2 g3 20000000 0 0 ffffff00 0xf05d04e0: g4 g5 g6 g7 0 f05d09e0 1 fc45c600 0xf05d04f0: o0 o1 o2 o3 0 d 64 1 0xf05d0500: o4 o5 o6 o7 d f05d06ec f05d0510 f009e378 fc45c600+a0/X 0xfc45c6a0: fc44f800 fc44f800+2d4/X tmp_mdevmap+0x5cfc: 0 It appears that u_rdir is set to zero. Therefore, lookuppn() would have used the other value instead as the argument to mutex_enter() . Let's go look at it now. f0178dd0/X rootdir: rootdir: 0 Aha! We've found our problem at last! The rootdir variable was set to zero. What should have been there? According to /usr/include/sys/systm.h , rootdir is a pointer to the vnode of the root directory. |