The message buffer, msgbuf

When the system is up and running, it maintains a ring buffer known as the message buffer. This contains the information that you often see on the system console and later in the /var/adm/messages files. When the system panics or crashes, the panic- related messages appear on the console. However, they may not get written out to the /var/adm/messages file. Again, we can use the strings command to read this ring buffer. Before doing so, let's make sure you fully understand the nature of the ring buffer.

A ring buffer, as in the case of the message buffer, is a data area of fixed length. The message buffer on Solaris systems is known as msgbuf . Since msgbuf is a ring, the starting and ending points of the data within the ring are constantly rotating as the buffer is updated. When we dump the strings of a core image, we don't know where the most recent messages start.

Since the ring buffer is of fixed length, only the most recent information will be found there. If the system was booted months ago, you are not likely to find the boot messages in msgbuf . Instead, you will only see the more recent messages, including the panic messages.

To look at the msgbuf of a system crash dump, use the following as an example. This strings output is from the vmcore file generated by a bad trap / data fault generated by following the instructions in Chapter 5. Note that in this example, we are piping the output of strings into the UNIX more command so that only one page of output is displayed at a time.

Figure 6-2 Using the strings command to view the message buffer, msgbuf

 Hiya...  strings vmcore.1  more  Generic  Data fault  arc/  esac  done  shift `expr $OPTIND - 1`  if [ $# -gt 1 ] ; then  echo $USAG  stealing page f00d92a0 pfnum c4 for prom  (The message buffer starts here)  stealing page f00e1478 pfnum 2ee for prom  mem = 28672K (0x1c00000)  avail mem = 26963968  Ethernet address = 8:0:20:9:85:7e  root nexus = Sun 4_65  sbus0 at root: obio 0xf8000000  dma0 at sbus0: SBus slot 0 0x400000  esp0 at sbus0: SBus slot 0 0x800000 SBus level 3 sparc ipl 3  sd3 at esp0: target 3 lun 0  sd3 is /sbus@1,f8000000/esp@0,800000/sd@3,0  <SUN1.05 cyl 2036 alt 2 hd 14 sec 72>  sd6 at esp0: target 6 lun 0  sd6 is /sbus@1,f8000000/esp@0,800000/sd@6,0  Unable to install/attach driver 'isp'  root on /sbus@1,f8000000/esp@0,800000/sd@3,0:a fstype ufs  zs0 at root: obio 0xf1000000 sparc ipl 12  zs0 is /zs@1,f1000000  zs1 at root: obio 0xf0000000 sparc ipl 12  zs1 is /zs@1,f0000000  cgthree0 at sbus0: SBus slot 3 0x0 SBus level 5 sparc ipl 7  cgthree0 is /sbus@1,f8000000/cgthree@3,0  cgthree0: resolution 1152 x 900  Unable to install/attach driver 'stc'  dump on /dev/dsk/c0t3d0s1 size 66012K  Feb 27 16:47:07 su: 'su sys' succeeded for root on /dev/console  Feb 27 16:48:01 sendmail[178]: network daemon starting  pseudo-device: vol0  vol0 is /pseudo/vol@0  audio0 at root: obio 0xf7201000 sparc ipl 13  audio0 is /audio@1,f7201000  Feb 27 16:50:37 su: 'su root' succeeded for kbrown on /dev/pts/2  Feb 27 17:07:30 su: 'su root' succeeded for kbrown on /dev/pts/3  Feb 27 17:09:51 su: 'su root' succeeded for kbrown on /dev/pts/1  Feb 27 18:15:43 su: 'su root' succeeded for kbrown on /dev/pts/3  BAD TRAP  sh: Data fault  kernel read fault at addr=0x0, pme=0x0  Sync Error Reg 80<INVALID>  pid=556, pc=0xf000aaa8, sp=0xf0331670, psr=0x4000c4, context=3  g1-g7: 0, 0, ffffff80, 0, f03319e0, 1, ff467800  Begin traceback... sp = f0331670  Called from f0050668, fp=f03317e0, args=f0331844 0 f033184c 0 0 ff35be08  Called from f0093b68, fp=f0331850, args=0 0 1 0 f03318b4 f00c5b70  Called from f00245e4, fp=f03318b8, args=f0331e94 f0331920 0 0 4f074 f00b5218  Called from f0005acc, fp=f0331938, args=f00bc334 f0331eb4 0 f0331e90 fffffffc    ffffffff  Called from 13c24, fp=effff678, args=4f074 effff6d8 3a 2f 1 4dc00  End traceback...  panic: Data fault  syncing file systems... done  static and sysmap kernel pages    56 dynamic kernel data pages   168 kernel-pageable pages     0 segkmap kernel pages     0 segvn kernel pages    51 current user process pages  total pages (1892 chunks)  dumping to vp ff1e9d84, offset 116888  ?PbM   p-p  /opt/SUNWspro/ma  /Xinitrc  ?PbM  /sbin/sh  -8.y  (We quit out of "more" at this point)  Hiya...

Since the system had just recently booted, the boot-time messages were still in msgbuf , including a very detailed description of the system's hardware configuration. Also, you can see the syslog() messages from the UNIX su command showing when user kbrown became root. These are messages that also were displayed on the system console.

This procedure all works because the message buffer area is kept in low memory and will appear close to the beginning of the core file. When you use strings to read the core file, you'll see the messages (remember that they may not appear to be in the right order, because the "beginning" of the buffer output may really be in the middle), followed by fairly random garbage once you're past the end.

Once you've passed the msgbuf area, watching the strings output is usually boring and useless. However, there are times when you might need to read more than just the messages data. Read on!

Strings and the case of the unknown customer

Recently, a customer mailed an unlabeled tape to a support engineer in Sun's UK Answer Centre. No note was sent with the tape. No return address was on the envelope. It seemed there was no way to figure out where the tape had come from, so we would have to wait for the customer to call and claim it.

Ah, but maybe the customer had been clever and put helpful information on the tape, something like a README-FIRST file. So, we read the tape. The only files it contained were unix.0 and vmcore.0 . At least now we knew that the files were from a Solaris 2 system by virtue of the file names . Had unix.0 been named vmunix.0, we would know that the files were from a Solaris 1 system.

Using the techniques described above, we quickly learned that the system was a large SPARCcenter 2000 server running Solaris 2.3. However, we still didn't know whose SC2000 the files came from. The engineer involved knew many SC2000 customers. In hopes of finding more clues, we watched the strings output for a long time. Fortunately, we didn't have to read all 48,613 lines (over 850 printed pages) of it!

In reading the strings output, it became apparent that the SC2000 was used for inventory control of some sort . There were a lot of drugstore items with pricing and stocking data. This alone narrowed down our list.

Then, just when we were pretty sure which customer might have sent the tape, the customer's company name (the real name has been changed) appeared on our screen in banner- sized letters !

 #                                 ######     # #     ####   #    #  ######      #     #  #####   #    #   ####    ####    #   #   #    #  ##  ##  #           #     #  #    #  #    #  #    #  #   #     #  #       # ## #  #####       #     #  #    #  #    #  #        ####   #######  #       #    #  #           #     #  #####   #    #  #  ###       #   #     #  #    #  #    #  #           #     #  #   #   #    #  #    #  #    #   #     #   ####   #    #  ######      ######   #    #   ####    ####    ####

As it turned out, the system's /etc/motd or message of the day file contained this banner, some welcoming messages, and the latest system news for folks to read when they log in. Just prior to the crash, someone must have logged in, as the /etc/motd file was still in memory at the time of the crash.

Without the strings command, we would have had to wait for the customer to contact us.