What s still useful in the stack traceback?


What's still useful in the stack traceback?

Looking at the stack traceback and the expected arguments as shown above, it becomes fairly obvious that the first argument in the call lookupname () is not a pointer to a filename. So, there are two questions that need to be asked.

  1. Is fnamep used only upon return from lookupname() and thus could be useless at the time the crash occurred?

  2. Was the value of %i0 , which was stored in the frame, overwritten after it was used?

The lookupname() routine

Let's look at the assembly code for lookupname() and see if we can answer the questions.

  lookupname,12/ia  lookupname:     save    %sp, -0x70, %sp  lookupname+4:   add     %fp, -0xc, %o2  lookupname+8:   mov     %i1, %o1  lookupname+0xc: call    pn_get  lookupname+0x10:                mov     %i0, %o0  lookupname+0x14:                orcc    %g0, %o0, %i0  lookupname+0x18:                bne     lookupname + 0x3c  lookupname+0x1c:                mov     %i4, %o3  lookupname+0x20:                mov     %i3, %o2  lookupname+0x24:                mov     %i2, %o1  lookupname+0x28:                call    lookuppn  lookupname+0x2c:                add     %fp, -0xc, %o0  lookupname+0x30:                mov     %o0, %i0  lookupname+0x34:                call    pn_free  lookupname+0x38:                add     %fp, -0xc, %o0  lookupname+0x3c:                ret  lookupname+0x40:                restore  lookuppn:       save    %sp, -0x170, %sp  lookuppn+4: 

Right at the start, the lookupname() routine calls pn_get() . Upon returning, we OR the contents of %o0 with %g0 and store the result in %i0 , thus overwriting the first argument to lookupname() .

Since we are ORing %o0 with the /dev/nul l of registers, %g0 , placing the results in %i0 , we are really copying the contents of %o0 to %i0 . However, because it's an orcc instruction, condition bits are being set in the Processor Status Register, PSR . These bits are tested by the next instruction, bne , branch if not equal. In effect, this code tests to see if %o0 is equal to zero. If not, we exit lookupname() immediately.

What does this sequence suggest? It suggests that pn_get() sends back a return code of zero or non-zero to the caller.

The pn_get() routine

What did the call to pn_get() do? If for no other reason than pure curiosity , we should take a look at it. Here it is.

  pn_get,20/ia  pn_get:         save    %sp, -0x60, %sp  pn_get+4:       call    pn_alloc  pn_get+8:       mov     %i2, %o0  pn_get+0xc:     orcc    %g0, %i1, %g0  pn_get+0x10:    bne     pn_get + 0x30  pn_get+0x14:    mov     0x400, %o2  pn_get+0x18:    mov     %i0, %o0  pn_get+0x1c:    ld      [%i2 + 0x4], %o1   pn_get+0x20:    call    copyinstr   pn_get+0x24:    add     %i2, 0x8, %o3   pn_get+0x28:    ba      pn_get + 0x44   pn_get+0x2c:    ld      [%i2 + 0x8], %l6   pn_get+0x30:    mov     %i0, %o0   pn_get+0x34:    ld      [%i2 + 0x4], %o1   pn_get+0x38:    call    copystr   pn_get+0x3c:    add     %i2, 0x8, %o3   pn_get+0x40:    ld      [%i2 + 0x8], %l6   pn_get+0x44:  orcc    %g0, %o0, %i0  pn_get+0x48:    sub     %l6, 0x1, %l6   pn_get+0x4c:    be      pn_get + 0x5c   pn_get+0x50:    st      %l6, [%i2 + 0x8]   pn_get+0x54:    call    pn_free   pn_get+0x58:    mov     %i2, %o0   pn_get+0x5c:    ret   pn_get+0x60:    restore   pn_set:         save    %sp, -0x60, %sp  (output trimmed)  

The pn_get() routine, based on the value of input register %i1 , calls either copyinstr() or copystr() . The copy routines copy a string from a source to a destination. One is used for copying from user space to kernel space, the other works solely within kernel space.

The argument in %i1 is listed as seg . What might this be? Well, this wouldn't be an easy one to guess at, so we'll tell you. It is an enumerated flag that shows what segment is being referenced. Look in /usr/include/sys/uio.h , and you'll find:

 /*   * Segment flag values.   */  typedef enum uio_seg { UIO_USERSPACE, UIO_SYSSPACE, UIO_USERISPACE } uio_seg_t; 

So, knowing this, we can see that if %i1 is zero ( seg is set to UIO_USERSPACE), then we call copyinstr() . Otherwise, we use copystr() to copy the string.

When pn_get() has completed the copy, it modifies %i0 . If we were to follow this further, we would find that the copy routines also modify the calling parameters.

The call to pn_get() in lookupname() is nearly immediately followed by a call to lookuppn() , so let's find out what it does next.

The lookuppn() routine

The lookuppn() routine, according to the stack traceback, does appear to have a valid memory address as the first argument. This is listed as being a pnp , a pointer to a pn , probably a pathname. Let's see what we find there and if it looks valid.

We will be looking at address 0xf05d06e4, the value of the first calling parameter to lookuppn() , as seen in the $c output we got earlier.

  0xf05d06e4/X  0xf05d06e4:     fc26a000  fc26a000/X  ledmadelay+0x404:               2f646576  ./s  ledmadelay+0x404:               /dev/ticotsord 

The value 2f646576 looked like it was possibly an ASCII value, so we redisplayed at it as a string. Sure enough, we now have the name of the file that fm_flb was trying to open !

Now, we get to a trouble spot. According to $c, we called mutex_enter() from location lookuppn+0x148 . That's fairly deep down into the program, making variable-chasing a lot of work.

Let's look at some of the assembly code first.

  lookuppn,20/ia  lookuppn:       save    %sp, -0x170, %sp  lookuppn+4:     st      %i2, [%fp + 0x4c]  lookuppn+8:     sethi   %hi(0xf0165400), %o0  lookuppn+0xc:   st      %i3, [%fp + 0x50]  lookuppn+0x10:  add     %o0, 0x264, %l2  lookuppn+0x14:  ldsh    [%g7 + 0x1a], %o3  lookuppn+0x18:  sethi   %hi(0xf00f1400), %o1  lookuppn+0x1c:  add     %o3, 0x1, %o3  lookuppn+0x20:  sth     %o3, [%g7 + 0x1a]  lookuppn+0x24:  ld      [%g7 + 0x58], %o5  lookuppn+0x28:  add     %o1, 0x370, %l5  lookuppn+0x2c:  ld      [%o5 + 0xf4], %o0  lookuppn+0x30:  add     %o0, 0x1, %o0  lookuppn+0x34:  st      %o0, [%o5 + 0xf4]  lookuppn+0x38:  ldsh    [%g7 + 0x1a], %o2  lookuppn+0x3c:  sub     %o2, 0x1, %o2  lookuppn+0x40:  sll     %o2, 0x10, %o2  lookuppn+0x44:  sra     %o2, 0x10, %o2  lookuppn+0x48:  sth     %o2, [%g7 + 0x1a]  lookuppn+0x4c:  orcc    %g0, %o2, %g0  lookuppn+0x50:  bne,a   lookuppn + 0x78  lookuppn+0x54:  ld      [%fp + 0x4c], %l1  lookuppn+0x58:  ld      [%g7 + 0x58], %o0  lookuppn+0x5c:  ldsb    [%o0 + 0x4d], %o0  lookuppn+0x60:  orcc    %g0, %o0, %g0  lookuppn+0x64:  be,a    lookuppn + 0x78  lookuppn+0x68:  ld      [%fp + 0x4c], %l1  lookuppn+0x6c:  call    kpreempt  lookuppn+0x70:  clr     %o0  lookuppn+0x74:  ld      [%fp + 0x4c], %l1  lookuppn+0x78:  clr     %i4  lookuppn+0x7c:  orcc    %g0, %l1, %g0  lookuppn+0x80:  +,20  lookuppn+0x80:  be      lookuppn + 0x90  lookuppn+0x84:  clr     %l4  lookuppn+0x88:  ba      lookuppn + 0x94  lookuppn+0x8c:  mov     0x1, %l7  lookuppn+0x90:  clr     %l7  lookuppn+0x94:  ld      [%g7 + 0xa0], %i5  lookuppn+0x98:  ld      [%i5 + 0x2d0], %i5  lookuppn+0x9c:  call    mutex_enter  lookuppn+0xa0:  mov     %i5, %o0  lookuppn+0xa4:  ld      [%i5 + 0xc], %o1  lookuppn+0xa8:  mov     %i5, %o0  lookuppn+0xac:  add     %o1, 0x1, %o1  lookuppn+0xb0:  call    mutex_exit  lookuppn+0xb4:  st      %o1, [%i5 + 0xc]  lookuppn+0xb8:  cmp     %i1, 0x1  lookuppn+0xbc:  bne,a   lookuppn + 0xc8  lookuppn+0xc0:  clr     %o2  lookuppn+0xc4:  mov     0x1, %o2  lookuppn+0xc8:  ld      [%fp + 0x50], %l0  lookuppn+0xcc:  add     %i0, 0x8, %i2  lookuppn+0xd0:  add     %i0, 0x4, %l3  lookuppn+0xd4:  clr     %i3  lookuppn+0xd8:  mov     %o2, %l6  lookuppn+0xdc:  ld      [%i2], %o5  lookuppn+0xe0:  orcc    %g0, %o5, %g0  lookuppn+0xe4:  bne,a   lookuppn + 0xf4  lookuppn+0xe8:  ld      [%i2], %o7  lookuppn+0xec:  ba      lookuppn + 0x684  lookuppn+0xf0:  mov     0x2, %i1  lookuppn+0xf4:  orcc    %g0, %o7, %g0  lookuppn+0xf8:  be,a    lookuppn + 0x108  lookuppn+0xfc:  clr     %i1  lookuppn+0x100:  +,20  lookuppn+0x100: ld      [%l3], %i1  lookuppn+0x104: ldsb    [%i1], %i1  lookuppn+0x108: cmp     %i1, 0x2f  lookuppn+0x10c: bne,a   lookuppn + 0x184  lookuppn+0x110: sethi   %hi(0xf0181800), %o1  lookuppn+0x114: call    vn_rele  lookuppn+0x118: mov     %i5, %o0  lookuppn+0x11c: call    pn_skipslash  lookuppn+0x120: mov     %i0, %o0  lookuppn+0x124: ld      [%g7 + 0xa0], %o0  lookuppn+0x128: ld      [%o0 + 0x2d4], %o0  lookuppn+0x12c: orcc    %g0, %o0, %g0  lookuppn+0x130: be,a    lookuppn + 0x144  lookuppn+0x134: sethi   %hi(0xf0178c00), %i1  lookuppn+0x138: ld      [%g7 + 0xa0], %i1      ! vph_mutex + 0x7f4c  lookuppn+0x13c: ba      lookuppn + 0x148  lookuppn+0x140: ld      [%i1 + 0x2d4], %i1  lookuppn+0x144: ld      [%i1 + 0x1d0], %i1  lookuppn+0x148: call    mutex_enter  lookuppn+0x14c: mov     %i1, %o0  lookuppn+0x150: ld      [%i1 + 0xc], %o7  lookuppn+0x154: mov     %i1, %o0  lookuppn+0x158: add     %o7, 0x1, %o7  lookuppn+0x15c: st      %o7, [%i1 + 0xc]  lookuppn+0x160: call    mutex_exit  lookuppn+0x164: mov     %i1, %i5  lookuppn+0x168: sethi   %hi(0xf0181800), %o0  lookuppn+0x16c: ld      [%o0 + 0xe8], %o0      ! audit_active  lookuppn+0x170: orcc    %g0, %o0, %g0  lookuppn+0x174: be      lookuppn + 0x1a0  lookuppn+0x178: mov     0x1, %o1  lookuppn+0x17c: ba      lookuppn + 0x198  lookuppn+0x180: 

We can take two approaches at this point.

  1. Start at the beginning, working toward the failure point.

  2. Start at the failure point, working back toward the beginning.

Working with source code, it's fairly easy to take the first approach. However, without source, we are making educated guesses about what the code might be doing. We may get it right, or we may end up way off track. Without source code, it is probably best to start at the point of failure and work backwards , which we will do shortly.

The magic of %g7 in Solaris 2.3 & 2.4

The user program, fm_flb , made a system call to open /dev/ticotsord . System calls are actually software traps (one of the good traps as compared to bad traps), and it is the system call software trap that brought us into kernel mode.

On this particular hardware with this particular OS, since we trapped, %g7 has special meaning. %g7 contains the address of the thread that trapped.

Note

At this point, the trap code is down deep in the guts of the UNIX system. The source code involved is locore.s , which is pure assembly code. From vendor to vendor, release to release, architecture to architecture, do not expect to see or count on register %g7 being used in the same manner. For Sun's Solaris 2.3 and 2.4 on SPARC architectures, the trap software in locore.s does use %g7 this way.


In lookuppn() , %g7 is used a lot. To help find the source of the panic, we are going to make use of %g7 . First, let's take another quick look at the area of code we are going to closely examine.

 lookuppn+0x124: ld      [%g7 + 0xa0], %o0  lookuppn+0x128: ld      [%o0 + 0x2d4], %o0  lookuppn+0x12c: orcc    %g0, %o0, %g0  lookuppn+0x130: be,a    lookuppn + 0x144  lookuppn+0x134: sethi   %hi(0xf0178c00), %i1  lookuppn+0x138: ld      [%g7 + 0xa0], %i1     ! vph_mutex + 0x7f4c  lookuppn+0x13c: ba      lookuppn + 0x148  lookuppn+0x140: ld      [%i1 + 0x2d4], %i1  lookuppn+0x144: ld      [%i1 + 0x1d0], %i1  lookuppn+0x148: call    mutex_enter  lookuppn+0x14c: mov     %i1, %o0 

As you'll recall, it is the calling parameter to mutex_enter() that caused trouble. So, we need to see if we can figure out what happens to %o0 , as this is where we will find the source of the trouble.

At lookuppn+x0124 , we set %o0 to the address of the proc structure for the running thread. Using that, we set %o0 again to the contents of the proc structure + 0x24d. We will call this keyvalue . If you have the time and patience, you might try to figure out what variable is stored at offset 0x24d in the proc structure, although that isn't really so important at this stage. You will find that it is variable u_rdir , which is part of the u or user structure within the proc structure. Refer to /usr/include/sys/user.h and /usr/include/sys/proc.h for full details.

According to the lookuppn() code, if keyvalue is zero, we branch to lookuppn+0x144 . The delay instruction, sethi , and the ld instruction at lookuppn+0x144 work together to put the value 0xf0178dd0 into %i1 . We then call mutex_enter() , copying 0xf0178dd0 into the output register as we go.

If keyvalue is non-zero, we skip the sethi instruction. We do not execute the sethi in the delay slot by virtue of the ,a annul flag in the be instruction. The annul bit says that the sethi instruction is executed only when we branch. We execute the instructions at offsets 0x138, 0x13c, and 0x140. In effect, we are reloading keyvalue (the process's u_rdir value), this time putting it into register %i1 . We call mutex_enter() next, copying the value over to %o0 . (Yes, this is a bit redundant, but that's what the compiler came up with for final executable code.)

Now, let's see which path was taken by the lookuppn() routine. First, we have to find out what u_rdir is set to.

  $c  complete_panic(0xf0049460,0xf05d03ac,0xf05d0238,0x3,0x0,0x1) + 10c  do_panic(?) + 1c  vcmn_err(0xf015f7a8,0xf05d03ac,0xf05d03ac,0x3cad8,0x2,0x3)  cmn_err(0x3,0xf015f7a8,0x0,0x18,0x18,0xf0152400) + 1c  die(0x9,0xf05d04c4,0x3,0x3a6,0x2,0xf015f7a8) + 78  trap(0x9,  0xf05d04c4  ,0xf01822d8,0x3a6,0x2,0x0) + 598  fault(?) + 84  mutex_enter(0x0,0xd,0x64,0x1,0xd,0xf05d06ec)  lookuppn(0xf05d06e4,0x0,0xf05d06ec,0x0,0x0,0xfc01dd14) + 148  lookupname(0x0,0x0,0x1,0x0,0xf05d07f4,0x0) + 28  vn_open(0x3cad8,0x0,0x3,0xb40,0xf05d08ac,0x0) + a4  copen(0x3cad8,0x3,0xb48,0xf05d0920,0x3cad8,0xf0156628) + 70  syscall(0xf0160f3c) + 3e4  0xf05d04c4$<regs  0xf05d04c4:     psr             pc              npc                  404000c7        f0048918        f004891c  0xf05d04d0:     y               g1              g2              g3                  20000000        0               0               ffffff00  0xf05d04e0:     g4              g5              g6              g7                  0               f05d09e0        1  fc45c600  0xf05d04f0:     o0              o1              o2              o3                  0               d               64              1  0xf05d0500:     o4              o5              o6              o7                  d               f05d06ec        f05d0510        f009e378  fc45c600+a0/X  0xfc45c6a0:     fc44f800  fc44f800+2d4/X  tmp_mdevmap+0x5cfc:             0 

It appears that u_rdir is set to zero. Therefore, lookuppn() would have used the other value instead as the argument to mutex_enter() . Let's go look at it now.

  f0178dd0/X  rootdir:  rootdir:        0 

Aha! We've found our problem at last! The rootdir variable was set to zero. What should have been there? According to /usr/include/sys/systm.h , rootdir is a pointer to the vnode of the root directory.



PANIC. UNIX System Crash Dump Analysis Handbook
PANIC! UNIX System Crash Dump Analysis Handbook (Bk/CD-ROM)
ISBN: 0131493868
EAN: 2147483647
Year: 1994
Pages: 289
Authors: Chris Drake

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net