Certification Objective 12.05Working with Crash Dumps | Sun Certified System Administrator for Solaris 10 Study Guide Exams 310-XXX & 310-XXX

Certification Objective 12.05—Working with Crash Dumps

Exam Objective 2.2: Manage crash dumps, and core file behaviors.

The bad news is that the applications running on the systems do crash, and so do the systems. The good news is that the crash information is saved so that you can investigate the crash and take appropriate action to fix the problem. The information is stored in core files and crash dump files. Some folks confuse these two kinds of files with each other. So, let's make the distinction clear before we dive into the topic.

The core files are those files that are created when an application crashes.
The crash dump files are those files that are created when the system crashes.

Let's explore how to manage both of these file types.

Managing Core Files

Core files are generated when a process or an application running on the system terminates abnormally. An obvious question is Where are the core files saved and how are they named? They are saved, by default, in the directory in which the application was running. However, you may want to configure this location so that you can save all the core files to one central location. You will learn further on in this section how to do that.

The names of the core files can be more sophisticated than the names of the crash clump files that you will see in the next section. However, the default name of the core file is very simple: core. The following two file paths are available so that you can configure them independent of each other:

Process-based core file path. This file path, also called the per-process file path, is enabled by default, and its default value is core. If enabled, it causes a core file to be produced when a process terminates abnormally. This path is inherited by a child process from the parent process. The owner of the process owns the process-based core file with read/write permission, and no other user can view this file.
Global core file path. This file path is disabled by default, and its default value is also core. If enabled, this file will be created in addition to the process-based file (if that file path is enabled) and will contain the same content. However, the owner of this file is the superuser with read/write permissions, and no other user can view this file.

So, by default, a core file in the current directory of the process will be created if the process terminates abnormally. If the global core file path is enabled, a second core file will also be created in the global core file location. If more than one process is executing from the same directory and they terminate abnormally, there will be a name conflict for the file name core. The solution is to configure the expanded names for the core files by using the coreadm command, which, in general, is used to manage the core files. For example, consider the following command:

    coreadm -i /var/core/core.%f.%p

This command sets the default process-based core file path and applies to all processes that have not overridden the default: path. The pattern %f means the name of the process file, and the pattern %p means process ID. Assume that a process sendmail with process ID 101420 terminates abnormally. The core file name that the system will produce now is the following:

    /var/core/core.sendmail.101420

A list of patterns that can be used in configuring the core file paths is presented in Table 12-5.

Table 12-5: Patterns that can be used to configure the core file paths
Pattern	Description
%d	The directory name for the executable file (the process file)
%f	The name of the executable file
%g	Effective group ID for the process
%m	Machine name from the output of the uname -m command
%n	System node name from the output of the uname -n command
%p	Process ID
%t	Decimal value of time
%u	Effective user ID associated with the process
%z	Name of the zone in which the process is executed (Zones are discussed in Chapter 15.)
%%	The literal %

On the Job

By default a setuid process does not create any core file—neither a process-based core file nor a global-path core file.

The main functionality of the coreadm command is to specify the name and location of core files produced by the abnormally terminating processes.

The coreadm command has the following syntax:

 coreadm [-g <pattern>] [-i <pattern>] [-d <options>] [-e <options>]

The options are described here:

-d <options>. Disable the options specified by <options>.
-e <options>. Enable the options specified by <options>, which could be one or more of the following:
- global. Allow core dumps that use the global core pattern.
- global - setid. Allow setid core dumps that use global core pattern.
- log. Generate a syslog message when there is an attempt to create a global core file.
- process. Allow core dumps that use a per-process (process-based) core pattern.
- proc - setid. Allow setid core dumps that use a per-process (process-based) core pattern.
- -g <pattern>. Set the global-core file path specified by <pattern>. Possible values for <pattern> are listed in Table 12-5.
- - i <pattern>. Set the per-proccss (process based) core file path specified by <pattern>. Possible values for <pattern> are listed in Table 12-5.

The configuration values set by the coreadm command are saved in the /etc/coreadm.conf file; hence, they survive across system reboots. Whereas the core files are created when a process crashes (terminates abnormally), the crash dump files are created when the system crashes.

Managing Crash Dumps

A system can crash for any of a number of reasons, including hardware malfunctions, I/O problems, and software errors. When the Solaris system crashes, it will do the following in order to help you:

Display an error message on the console.
Write a copy of its physical memory to the dump device.
Reboot automatically.
Execute the savecore command to retrieve the data from the dump device and write the saved crash dump to the following two files:
- unix.<n>. Contains kernel's name list.
- vmcore.<n>. Contains the crash dump data.

<n> specifies the dump sequence number. The files are saved in a predetermined directory; by default this is /var/crash/<hostname>, which you can change by reconfiguring, as you will see further on.

On the Job

In the previous Solaris versions, the existing crash dump files were automatically overwritten when a system rebooted unless you manually enabled the system to do something about it. In Solaris 10, the saving of crash dump files is enabled by default.

The saved crash dump files provide useful information for diagnosing the problem. To manage the crash dump information, you can use the dumpadm command, which has the following syntax:

 /usr/sbin/dumpadm [-nuy] [-c contentType>] [-d dumpDevice>] [-m <n><unit>] [-s savecoreDir>] [-r rootDir>]

The options are described here:

-c <contentType>. Modify the content options for the dump—that is, what the dump should contain. The <contentType> can specify one of the following values:
- all. All memory pages.
- curproc. Kernel memory pages and the memory pages of the currently executing process when the dump was initiated.
- kernel. Only the kernel memory pages.
-d <dumpDevice>. Specify the dump device. The <dumpDevice> can specify one of the following values:
- <devicePath>. A specific device with an absolute path, such as /dev/dsk/c0t2d0s2.
- swap. Select the most appropriate active swap entry to be used as dump device.
-m <n><unit>. Specify the minimum free space that the savecore must maintain in the file system that contains the savecore directory. The parameter <n> specifies a number, and the parameter <unit> specifies the unit, which can be k for KB, m for MB, or % to indicate the percentage of the total file system size.
-n. Do not run savecore automatically on reboot. This is not recommended, because you may lose the crash information.
-r <rootDir>. Specify an alternative root directory relative to which the dumpadm command should create files. The default is /.
-s <savecoreDir>. Specify the directory in which to store the files written by the savecore command. The default is /var/crash/<hostname>. The value of <hostname> is the name by which the system is known to the network; it can be retrieved by issuing the following command: uname -n.
-u. Update the dump configuration based on the /etc/dumpadm.conf file. If the /etc/dumpadm.conf file is missing, it is created and synchronized with the current dump configuration.
-y. Automatically execute the savecore command on reboot. This is the default.

Note that the system crash dump service is managed by SMF under the identifier:

 svc:/system/dumpadm:default

Therefore, it is automatically started at system reboot. In addition, dumpadm is the name of the command that you can use to manage the configuration of the crash dump facility as described previously.

Now you know that a crash dump is just a snapshot of the memory when a fatal error occurred. At times, even when the system did not crash, you may want to investigate the memory content. Solaris 10 lets you take a memory snapshot of a live system by using the following command:

    savecore -L

You can troubleshoot a running system by taking a snapshot of memory during some troubled state of the system, such as a transient performance problem or service outage. Before issuing the savecore command, make sure you have configured a dedicated dump device (by using the dumpadm command) to which the savecore command will save the information. Immediately after dumping to the dump device, the savecore utility writes out the crash dump files to your savecore directory.

Now that you know how to manage crash dump information, here are some practical scenarios and their solutions.

SCENARIO & SOLUTION
You want to use the disk slice c0t1d0s2 as the dedicated dump device. What command would you issue to make that happen?	dumpadm -d /dev/dsk/c0t1d0s2
You want the crash dump files to be saved to the /var/dump directory instead of the /var/crash/<hostname> directory. What command would you issue?	dumpadm -s /var/dump
You discover that the crash dump service is not running. What command would you issue to start it without rebooting the system?	svcadm enable svc:/system/dumpadm:default (Remember that dump crash is an SMF service.)

The three most important takeaways from this chapter are the following:

The file system SWA.PFS is used by the kernel for swapping (using disk space for physical memory). TMPFS is used to improve the performance of applications by using physical memory for file read and write and is the default file system for the /tmp directory.
Network file system (NFS) service is managed by the SMF under the identifier network/nfs/server.
Core files are created when a process terminates abnormally, whereas crash dump files are created when the system crashes.