Solaris 2: How to set up savecore | PANIC! UNIX System Crash Dump Analysis Handbook (Bk/CD-ROM)

Here is the method for enabling savecore in Solaris 2 systems. Note the differences from Solaris 1 as we point them out.

Customizing /etc/rc2.d/S20sysetup

On Solaris 2 systems, the savecore command is called by the run-level-2 script /etc/rc2.d/S20sysetup , which is hardlinked to /etc/init.d/ sysetup . By default, savecore is commented out, thus disabling it when transitioning to run level 2, as shown in this portion of the script.

Example 3-4 Savecore commented out in /etc/rc2.d/S20sysetup

 ##  ## Default is to not do a savecore  ##  #if [ ! -d /var/crash/`uname -n` ]

Example 3-4 Savecore commented out in /etc/rc2.d/S20sysetup

 ##  #then mkdir -p /var/crash/`uname -n`  #fi  #                echo 'checking for crash dump...\c '  #savecore /var/crash/`uname -n`  #                echo ''

To enable the savecore command, uncomment this area of the script, as shown.

Example 3-5 Savecore enabled in /etc/rc2.d/S20sysetup

 #  # Default is to not do a savecore  #  if [ ! -d /var/crash/`uname -n` ]  then mkdir -p /var/crash/`uname -n`  fi                   echo 'checking for crash dump...\c '  savecore /var/crash/`uname -n`                   echo ''

Unlike the Solaris 1 /etc/rc.local script, this script first tests for the existence of the savecore directory and if the directory is not found, calls mkdir to create it. This is done by an if then fi Bourne shell command sequence. Be careful to uncomment or recomment all portions of this sequence or the script will fail.

Another difference you may note is that the UNIX command uname -n is being used. This command is the Solaris 2 equivalent of the Solaris 1 hostname command.

Again, if you want to use a different directory for your savecore files, change the if , mkdir , and savecore lines accordingly .

Configuring a special dump device

Solaris 2 supports much larger systems than does Solaris 1, allowing for up to 20 CPU modules and massive amounts of memory. In Solaris 2, we also have newer , more advanced swapping techniques. You'll read more about this in the advanced chapters later on.

The Solaris 1 informal and rather crude rule of thumb of having twice as much swap as memory doesn't apply to Solaris 2 systems. Indeed, some of the larger Solaris 2 systems run well with nearly no swap space defined at all!

Solaris 2 systems that have a minimal amount of swap space will need to have some sort of dump device at hand when system crashes occur. As with Solaris 1, you can specify a dump device other than your primary swap device. On the releases of Solaris 2 up to and including Solaris 2.4, this is not quite as easy to do as it was in Solaris 1, however, we will tackle this tricky subject anyway!

Both the panic() routine and the savecore program need to know where the dump device is located. Therefore, we need to define this before either executes. We cannot predict when panic() will run; however, we do know when savecore is executed. We need to redefine the name of the dumpfile , which is how the kernel refers to the dump device in Solaris 2, before /etc/rc2.d/S20sysetup is run. To do this, we will create our own script, /etc/rc2.d/S19dumpfile . The "S" or "Start" run-command scripts are executed in alphabetical order by init during run level transitions. Because this is so, we know our S19dumpfile script will be run before the S20sysetup script, as S19 comes before S20 alphabetically .

Before we can write this script, we need to know where to locate the current dumpfile name in the running kernel. Jumping ahead of ourselves , we are going to take a quick peek at the kernel by using the UNIX adb command. By the end of this book, you'll be a wizard when it comes to using adb , so don't get too worried if this seems a bit scary at first.

You need to be the super- user , root, to view and modify the kernel by using adb .

Example 3-6 Displaying the dumpfile kernel variable via adb

 #  adb -k /dev/ksyms /dev/mem  physmem  1b24  dumpfile/20X  dumpfile:  dumpfile:  0           0          0          0             2f646576    2f64736b   2f633074   33643073             31000000    0          0          0             0           0          0          0             0           0          0          0  dumpfile+10/X  dumpfile+0x10:  2f646576  dumpfile+10/s  dumpfile+0x10:  /dev/dsk/c0t3d0s1  $q  #

A full 32-bit word of memory can store 4 characters of a string, as a character only requires one byte, 8 bits, of storage. There are 4 bytes per full 32-bit word. Each byte has a unique address in memory; however, using adb we write to memory in full and half-words. Throughout this book, we usually reference full-word, hexadecimal addresses, which end in 0, 4, 8, and c.

In the above adb session, we start at the kernel symbol or variable name dumpfile and display 20 full words of memory in hexadecimal. The first 4 words contain zero. The fifth word contains 2f646576. This is actually the first 4 bytes or characters of the null- terminated string " /dev/dsk/c0t3d0s1 ."

The dumpfile string starts at address dumpfile+0x10 . The kernel string that we need to modify is actually stored in memory this way:

 Full word   address      Characters  ------------------------ dumpfile+10 = "/dev"  dumpfile+14 = "/dsk"  dumpfile+18 = "/c0t"  dumpfile+1c = "3d0s"  dumpfile+20 = "1"  (The last character is followed by three nulls or zeros)

Of these addresses, only the last three might require changing. The first two, representing the /dev/dsk portion of the device name, will not need to be changed.

Note

We do not specify a raw disk partition name; however, please remember that, in effect, the dump device is treated as such!

The next important thing for you to know is the hexadecimal values for the ASCII characters you might need to use to identify which dump device you want. You can refer to the ascii (5) man page to view the complete ASCII chart.

 Character    0  1  2  3  4  5  6  7  8  9  Hex value   30 31 32 33 34 35 36 37 38 39  Character    a  b  c  d  e  f  k  s  t  v  /  Hex value   61 62 63 64 65 66 6b 73 74 76 2f

Let's have our S19dumpfile script change the dump device to /dev/dsk/c1t2d3s4 . As you learn more about adb in later chapters, this will all become clear. Here's our script:

Example 3-7 S19dumpfile script

 :  Automatically executed by the Bourne shell  #  #  S19dumpfile - Change dumpfile name  #  #  echo  echo "Changing dumpfile name to /dev/dsk/c1t2d3s4'  adb -k -w /dev/ksyms /dev/mem << END  dumpfile+18/W 2f633174  dumpfile+1c/W 32643373  dumpfile+20/W 34000000  END  echo "Done changing dumpfile name."  echo  #  #  end of S19dumpfile  #

When transitioning into run level 2, /etc/rc2.d/S19dumpfile will generate output similar to the following. Your physmem size may differ .

 Changing dumpfile name to /dev/dsk/c1t2d3s7.  physmem 1b24  dumpfile+0x18: 0x2f633174 = 0x2f633174  dumpfile+0x1c: 0x32643373 = 0x32643373  dumpfile+0x20: 0x37000000 = 0x37000000  dumpfile+0x10: /dev/dsk/c1t2d3s7  Done changing dumpfile name.

Alternatively, you may have your S19dumpfile script write to locations dumpfile+18 and dumpfile+1c by using commands such as the following.

 dumpfile+18/W '/c1t'  dumpfile+1c/W '2d3s'

However, take care not to use this method for location dumpfile+20 , since a null is required at the end of the string. Replace dumpfile+20 with a hexadecimal value instead, as shown in the earlier example.

In future releases of Solaris 2 and other UNIX systems, the starting location of the dumpfile string may differ. Always check before you modify the string. Also, in future releases of Solaris 2, it may become possible to simply set the dumpfile string or something similar via the /etc/system configuration and tuning file, by specifying, for example:

 set dumpfilename = "/dev/dsk/c1t2d3s4"

and then rebooting the system. However, as of Solaris 2.4, this is not possible.

Shouldn't I copy the kernel first?

This is a good question! However, as you'll come to understand later when we talk about adb in greater detail, the S19dumpfile script modifies only the contents of /dev/mem .

The kernel variable dumpfile is initially set to all zeros. During the booting process, the name of the dump device is stored in dumpfile in memory, /dev/mem . Therefore, we use adb to modify /dev/mem after dumpfile is set.

Swapless systems

Finally, here's one last note for those of you who are administering swapless systems running Solaris 2.0 up through 2.3. Due to a bug, savecore will not work unless you have at least a minimal swap space set up. Create a swap partition of at least 8K in size and a custom dumpfile, make the swap space available to the system so that it is accessible to savecore, and all will work!

Our system is now ready to capture a system crash dump image. Let's move on.