The msgbuf and msgbuf.wrap macros | PANIC! UNIX System Crash Dump Analysis Handbook (Bk/CD-ROM)

The msgbuf and msgbuf .wrap macros

Let's tackle one of the more interesting adb macros now, one that shows some of the power that macros offer. We'll start with the msgbuf macros that display the message ring buffer we talked about earlier. These macros happen to be the same for both Solaris 1.1 and 2.3 systems. Here are the two macros used to display the msgbuf ring.

Example 13-1 The msgbuf macro

 msgbuf/"magic"16t"size"16t"bufx"16t"bufr"n4X  +,(*(msgbuf+0t8)-*(msgbuf+0t12))&80000000$<msgbuf.wrap  .+*(msgbuf+0t12),(*(msgbuf+0t8)-*(msgbuf+0t12))/c

Example 13-2 The msgbuf.wrap macro

 .+*(msgbuf+0t12),(*(msgbuf+0t4)-*(msgbuf+0t12))/c  msgbuf+0t16+0,*(msgbuf+0t8)/c

Yes, they look a bit hairy, but you'll fully understand them shortly!

Below is a portion of the /usr/include/sys/msgbuf.h header file that, as you might guess, describes the msgbuf structure. This is from a Solaris 2.3 system. The Solaris 1.1 version is quite similar, but does differ a bit.

Example 13-3 Excerpts from /usr/include/sys/msgbuf.h

 #define MSG_MAGIC       0x8724786  struct  msgbuf {       struct  msgbuf_hd {               long    msgh_magic;                long    msgh_size;                long    msgh_bufx;                long    msgh_bufr;                u_longlong_t    msgh_map;        } msg_hd;  }

As we saw in an earlier chapter, we use the msgbuf macro to print out the ring buffer known as msgbuf . The msgbuf structure maintains four values at the head of the buffer that are used to guide manipulation of the buffer.

The first word of the msgbuf structure, msgh_magic , contains a magic number, or in simple terms, a unique identifier number. The next word, msgh_size, contains the size of the message buffer in bytes. This size does not include the four words at the head of the buffer.

You'll remember from an earlier discussion that the message buffer is a data area of fixed length and is used as a ring, or rotating, buffer. As such, there are a few ways that the buffer could be managed by the operating system.

The method used by the Solaris systems is quite clever. Two pointers are maintained . One, msgh_bufr , points to the start of the most recently added message, and the other, msgh_bufx , points to the end of that message, which is where the next message will be written. These variables are byte offsets into the message buffer, with zero referring to msgh_map .

The messages we were looking for in msgbuf actually don't start until the fifth word, represented in the msgbuf structure by msgh_map . Don't let the fact that msgh_map is not a character array bother you. The Solaris 2 kernel magically makes sure that the structure is correctly allocated space in memory. In the Solaris 1 msgbuf.h header file, you will see a character array of fixed size in the msgbuf structure.

Since the message buffer is a ring buffer, sometimes msgh_bufr will be more than msgh_bufx , and sometimes it will be less. In other words, the beginning of a message does not necessarily always come before the end of the message in the ring. The programmer who wrote the msgbuf and msgbuf.wrap macros had to take this into account. We will see how he tackled this in a minute.

Here is a diagram of the msgbuf structure that shows it first with one message in it, and then again after a second message has been added. As each message is added to the msgh_map area, the pointers to the beginning and end of the most recent message, msgh_bufr and msgh_bufx , are adjusted accordingly .

Figure 13-1. The message buffer with one, then two messages in it

graphics/13fig01.gif

Calling another macro

New to our discussion on macros are two very powerful adb features used in the two msgbuf macros. These are:

Calling another macro
Command counts

As you explore the adb macro library on your system, you will find that both of these features are used quite heavily.

Let's start with the ability of a macro to call another macro. This can be done in one of two ways.

$< macro2 Execute macro2 , but do not return to the calling macro
$< < macro2 Execute macro2 , then return to the calling macro

In the msgbuf macros we will see the first calling method used. We will look at the second method in a short while.

Note

If, at any point, a macro generates an adb error, such as "symbol not found" or "data address not found," all current macro execution immediately terminates . This is true even if we are more than one level deep in macros and plan to return. Unfortunately, there is no way to capture or trap for these sorts of adb conditions. However, sometimes the clever macro programmer can work around this "feature" of adb by using the command count and combinations of unary and binary operators. As we walk through the msgbuf macro file and other complex macros, you'll see how this is done.

Let's take another look at the macros.

The msgbuf macro

 msgbuf/"magic"16t"size"16t"bufx"16t"bufr"n4X  +,(*(msgbuf+0t8)-*(msgbuf+0t12))&80000000$<msgbuf.wrap  .+*(msgbuf+0t12),(*(msgbuf+0t8)-*(msgbuf+0t12))/c

The msgbuf.wrap macro

 .+*(msgbuf+0t12),(*(msgbuf+0t4)-*(msgbuf+0t12))/c  msgbuf+0t16+0,*(msgbuf+0t8)/c

In msgbuf , the first line prints out the magic number, the buffer size, the offset to where the next message goes, and the offset to where the most recent message started.

Command count

You'll remember from Chapter 8, "adb: The Gory Details," that commands can be given a count saying to do something X number of times. As a reminder, here is the general syntax for adb data display commands.

  address, count   command   formatting_information

The second line of msgbuf contains a command count (in this case, a formula that includes a bitwise AND) and a call to another macro, msgbuf.wrap . Note that we won't be returning from the msgbuf.wrap macro, by virtue of the $< calling method, and yet we have a third command line in this macro.

Q:	How would we execute the third line of the msgbuf macro?
A:	By not making the call to msgbuf.wrap at all.

The ,(*(msgbuf+0t8)-*(msgbuf+0t12))&80000000 translates into a count of how many times to call msgbuf.wrap . Rather silly, you might say, since we know that the $< calling method means we don't return. So, really, what we are looking for is whether to call msgbuf.wrap once or not at all. Here is where the macro programmer had to think about how to work with messages that had wrapped from the bottom to the top of the msgbuf ring buffer.

*(msgbuf+0t8) gives us the msgh_bufx offset; the ending point for the most recent message. *(msgbuf+0t12) gives us the msgh_bufr offset: the starting point. If the start has a higher offset, then we know the message has wrapped around the bottom of the message buffer, which means special processing is needed to print out the most recent message because it is in two pieces. We compare the start and end by subtracting them. If the result is negative, we have a wrapped message.

Q:	How can we test for a negative value within adb ?
A:	Negative values always have the high order bit set, so we test that bit by using a bitwise AND.

The &80000000 tests to see if we got a negative value and thus have a wrapped message. If the high-order bit is set in the result of the subtraction, the final count will be 1 and we will call the msgbuf.wrap macro. If the high-order bit is not set, meaning that the end of the message is somewhere after the beginning, the final count will be 0 and we will not call the msgbuf.wrap macro.

adb does not offer conditional tests, such as "if" statements. However, through clever use of the features that do exist in adb , in effect, the formula for the command count actually performed this test:

 if (start > end) msgbuf.wrap;

If we don't call msgbuf.wrap , we move on to the third line.

 .+*(msgbuf+0t12),(*(msgbuf+0t8)-*(msgbuf+0t12))/c

In this line, .+*(msgbuf+0t12) i s the new current address to work with. , *(msgbuf+0t8)-*(msgbuf+0t12)) is the count and /c is the command. Can you figure out what this is going to do?

Starting at the beginning of the most recent message, msgh_bufr , we calculate the number of characters in the most recent message and print each one.

Take a look again at the msgbuf.wrap macro.

 .+*(msgbuf+0t12),(*(msgbuf+0t4)-*(msgbuf+0t12))/c  msgbuf+0t16+0,*(msgbuf+0t8)/c

Unless we've lost you, you should now be able to figure out how this macro works. Remember that msgbuf+0t4 contains the ring buffer size and msgbuf+0t16 is where the messages actually begin. What we do in this macro is calculate the number of characters from the start of the most recent message to the actual end of the ring buffer and, starting at the beginning of that most recent message, print that many characters. This, in effect, prints the bottom end of the ring buffer.

The next line prints the top end of the ring buffer, printing all the way to end of the message.

The msgbuf macro in use

Let's take a look at a couple of system crashes. Both are from a SPARCstation 20 running Solaris 1.1.1. Both crashes were forced by modifying rootdir . The crashes were done within minutes of each other.

A unique feature of the message ring buffer is that it is given a fixed location within the memory of a system and is not initialized or cleaned out during reboots. Only a power-down or a reset of the system will result in clearing the message buffer. This means that in the case of back-to-back crashes, you may see more than one of them recorded in the message buffer. This works to our advantage when looking for some history about the system.

When looking at these crashes, note the msgbuf offset pointers. You'll notice that a gap seems to exist between the crash messages, which is quite true. What the msgbuf macros show us are the most recent messages . The messages that appeared during the boot-up are not the most recent. The crash messages are the most recent. However, if we look around, we will see that the boot-time messages are still in the buffer.

Figure 13-2 Viewing the message buffer via two methods while in adb

 Hiya...  adb -k vmunix.0 vmcore.0  physmem 1f8c  $<msgbuf  0xf0002000:     magic           size            bufx            bufr                  63062           1ff0            1adb            1519  0xf0003529:     BAD TRAP: cpu=0 type=9 rp=f048bb1c addr=2 mmu_fsr=326 rw=1                  MMU sfsr=326: Invalid Address on supv data fetch at level 3                  regs at f048bb1c:                          psr=404000c2 pc=f006c154 npc=f006c158                          y: 3b000000 g1: 0 g2: 8000000 g3: ffffff00                          g4: 0 g5: f048c000 g6: 0 g7: 0                          o0: 0 o1: 4000 o2: d20 o3: 0                          o4: 7 o5: 6 sp: f048bb68 ra: 0                  pid 326, `sh': Data access exception                  kernel read fault at addr=0x2, pme=0x0                  MMU sfsr=326: Invalid Address on supv data fetch at level 3                  rp=0xf048bb1c, pc=0xf006c154, sp=0xf048bb68, psr=0x404000c2, context=0x9e                  g1-g7: 0, 8000000, ffffff00, 0, f048c000, 0, 0                  Begin traceback... sp = f048bb68                  Called from f006c068, fp=f048bcd8, args=f048be0c 1 0 2f 0 0                  Called from f0040118, fp=f048bd38, args=f048be0c 1 0 f048be18 f048c000 0                  Called from f0035f88, fp=f048be40, args=0 0 f048beb4 0 f048be2014c00                  Called from f013fae8, fp=f048bec0, args=f048bfe0 1d8 f01ba1d8 f01ba3b0 00                  Called from f0005cd0, fp=f048bf58, args=f048c000 f048bfb4 f048bfe0 0 0 0                  Called from 886c, fp=effff778, args=1ee54 1e724 1e83c 1a400 1ee5c 0                  End traceback...                  panic on cpu 0: Data access exception                  syncing file systems... done                  01018 low-memory static kernel pages                  00432 additional static and sysmap kernel pages                  00000 dynamic kernel data pages                  00218 additional user structure pages                  00000 segmap kernel pages                  00000 segvn kernel pages                  00038 current user process pages                  00150 user stack pages                  01856 total pages (928 chunks)  msgbuf+10/s  0xf0002010:     SuperSPARC: PAC ENABLED  SunOS Release 4.1.3_U1 (GENERIC) #2: Thu Jan 20 15:58:03 PST 1994  Copyright (c) 1983-1993, Sun Microsystems, Inc.  cpu = SUNW,SPARCstation-20  mod0 = TI,TMS390Z50 (mid = 8)  mem = 32304K (0x1f8c000)  avail mem = 28893184  cpu0 at Mbus 0x8 0x224000  entering uniprocessor mode  Ethernet address = 8:0:20:1f:d9:aa  espdma0 at SBus slot f 0x400000  esp0 at SBus slot f 0x800000 pri 4 (onboard)  sd0 at esp0 target 3 lun 0  sd0: <SUN1.05 cyl 2036 alt 2 hd 14 sec 72>  sr0 at esp0 target 6 lun 0  ledma0 at SBus slot f 0x400010  le0 at SBus slot f 0xc00000 pri 6 (onboard)  SUNW,bpp0 at SBus slot f 0x4800000 pri 3 (sbus level 2)  SUNW,DBRIe0 at SBus slot e 0x10000 pri 9 (sbus level 5)  cgsix0 at SBus slot 2 0x0 pri 9 (sbus level 5)  cgsix0: screen 1152x900, single buffered, 1M mappable, rev 11  zs0 at obio 0x100000 pri 12 (onboard)  zs1 at obio 0x0 pri 12 (onboard)  SUNW,fdtwo0 at obio 0x700000 pri 11 (onboard)  MMCODEC: manufacturer id 1, rev 2  root on sd0a fstype 4.2  swap on sd0b fstype spec size 98784K  dump on sd0b fstype spec size 98772K  le0: Twisted Pair Ethernet  SuperSPARC: PAC ENABLED  SunOS Release 4.1.3_U1 (GENERIC) #2: Thu Jan 20 15:58:03 PST 1994  Copyright (c) 1983-1993, Sun Microsystems, Inc.  cpu = SUNW,SPARCstation-20  mod0 = TI,TMS390Z50 (mid = 8)  mem = 32304K (0x1f8c000)  avail mem = 28893184  cpu0 at Mbus 0x8 0x224000  entering uniprocessor mode  Ethernet address = 8:0:20:1f:d9:aa  espdma0 at SBus slot f 0x400000  esp0 at SBus slot f 0x800000 pri 4 (onboard)  sd0 at esp0 target 3 lun 0  sd0: <SUN1.05 cyl 2036 alt 2 hd 14 sec 72>  sr0 at esp0 target 6 lun 0  ledma0 at SBus slot f 0x400010  le0 at SBus slot f 0xc00000 pri 6 (onboard)  SUNW,bpp0 at SBus slot f 0x4800000 pri 3 (sbus level 2)  SUNW,DBRIe0 at SBus slot e 0x10000 pri 9 (sbus level 5)  cgsix0 at SBus slot 2 0x0 pri 9 (sbus level 5)  cgsix0: screen 1152x900, single buffered, 1M mappable, rev 11  zs0 at obio 0x100000 pri 12 (onboard)  zs1 at obio 0x0 pri 12 (onboard)  SUNW,fdtwo0 at obio 0x700000 pri 11 (onboard)  MMCODEC: manufacturer id 1, rev 2  root on sd0a fstype 4.2  swap on sd0b fstype spec size 98784K  dump on sd0b fstype spec size 98772K  le0: Twisted Pair Ethernet

Here we look at the second crash. Note that the offsets are both less than the offsets we saw in the first crash. This is because the boot messages wrapped around the bottom of msgbuf .

Figure 13-3 Viewing the message buffer of a subsequent crash

 Hiya...  adb -k vmunix.1 vmcore.1  physmem 1f8c  $<msgbuf  0xf0002000:     magic           size            bufx            bufr                  63062           1ff0            979             38c  0xf000239c:     BAD TRAP: cpu=0 type=9 rp=f0413b2c addr=2 mmu_fsr=326 rw=1                  MMU sfsr=326: Invalid Address on supv data fetch at level 3                  regs at f0413b2c:                          psr=404000c5 pc=f006c154 npc=f006c158                          y: b3000000 g1: 40900ae4 g2: c1a0 g3: ffffff00                          g4: 0 g5: f0414000 g6: 0 g7: 0                          o0: 7 o1: ff1238d0 o2: 0 o3: f0047c08                          o4: f0047c0c o5: f01efbc8 sp: f0413b78 ra: 1                  pid 243, `sh': Data access exception                  kernel read fault at addr=0x2, pme=0x0                  MMU sfsr=326: Invalid Address on supv data fetch at level 3                  rp=0xf0413b2c, pc=0xf006c154, sp=0xf0413b78, psr=0x404000c5, context=0xeb                  g1-g7: 40900ae4, c1a0, ffffff00, 0, f0414000, 0, 0                  Begin traceback... sp = f0413b78                  Called from f006c038, fp=f0413ce8, args=f0413d4c 1 0 2f f0413eb4 0                  Called from f006ccbc, fp=f0413d58, args=0 0 1 0 f0413e3c f0413eb4                  Called from f0036688, fp=f0413e40, args=1bfb0 ffffffff f0414a30 f0413eb4 0 0                  Called from f013fae8, fp=f0413ec0, args=f0413fe0 60 f01ba1d8 f01ba238 0 0                  Called from f0005cd0, fp=f0413f58, args=f0414000 f0413fb4 f0413fe0 0 0 0                  Called from 7154, fp=effff9e0, args=1bfb0 effffd2a 18800 0 effffd2c 1bfb2                  End traceback...                  panic on cpu 0: Data access exception                  syncing file systems... done                  01018 low-memory static kernel pages                  00430 additional static and sysmap kernel pages                  00000 dynamic kernel data pages                  00100 additional user structure pages                  00000 segmap kernel pages                  00000 segvn kernel pages                  00038 current user process pages                  00040 user stack pages                  01626 total pages (813 chunksvail mem = 28893184                  cpu0 at Mbus 0x8 0x22  .=X  f0002988

Phew! Well, that's probably enough about msgbuf and the msgbuf macros!

How are you doing so far? If you want to take a break from reading and have some fun with macros, make backup copies of the macros we've talked about so far and try modifying them. For example, see if you can redesign the msgbuf macros so that the whole buffer is displayed in the correct order! It will help warm you up for when you'll be writing your own macros from scratch later on.