Solaris Processes


Team-Fly

	Solaris™ Operating Environment Boot Camp By David Rhodes, Dominic Butler
	Table of Contents

	Chapter 2. Booting and Halting the System

So far, we have looked at the first two processes to be created when the system boots up. These are sched, which has process ID 0, and init, which has process ID 1. They are different because the sched process is actually part of the Solaris kernel, whereas init is a normal program that is stored in a file system (in the directory /etc) just like other Solaris commands we have looked at. The difference between init and commands such as ls or who is that init runs all the time, not just when its name is typed. Processes that run all the time are called "daemon" processes; at any time there will be many daemon processes running.

In this section we will look in more detail at how processes are handled by Solaris and how they can be managed by the system administrator. Before we look at Solaris processes in general, the most common Solaris daemon processes are listed and described in Table 2.4. Where a particular daemon is dealt with in more detail in another chapter, that chapter is listed.

Table 2.4. Solaris Daemon Processes
Process	RC Script (in /etc/init.d)	Description
sched	n/a	This is the first Solaris process, created during the boot process. It is hand crafted, but most other processes are created from a copy of their parent process.
init	n/a	This is the parent of all standard Solaris processes. It runs the rc scripts that start and stop other daemons, and it clears up after all processes when they complete.
pageout	n/a	This is similar to the sched process, as it is not a regular daemon process but part of the Solaris kernel.
fsflush	n/a	This is part of the Solaris kernel and is responsible for writing file changes from memory back to the filesystem on the disk.
in.routed	inetinit	This daemon process manages the network routing tables. See Chapter 11, "Connecting to the Local Area Network."
sac	n/a	This is the Service Access Controller daemon. It controls the port monitor processes sacadm and pmadm. See Chapter 14, "Connecting Serial Devices."
devfseventd	devfsadm	The job of this process is to inform the devfsadmd daemon when devices are added or removed from the device tree.
devfsadmd	devfsadm	This daemon process manages the /dev and /devices namespaces. See Chapter 21, "Kernels and All About Them."
statd	nfs.client	This daemon is part of the NFS subsystem. It works with lockd to provide crash and recovery functions for networked filesystems. See Chapter 18, "NFS, DFS, and Autofs."
keyserv	rpc	This daemon stores users' private encryption keys for access to secure network services, such as secure NFS and NIS+.
rpcbind	rpc	This daemon acts as a server to convert Remote Procedure Call (RPC) program numbers to universal addresses. When a client wishes to make an RPC call to a program on this server it makes a call to rcpbind to find out what address to send its RPC request to. See Chapter 18, "NFS, DFS, and Autofs."
lockd	nfs.client	The lockd daemon handles locking within the NFS subsystem. It will send requests to lock files on a remote host and it will lock local files upon receiving a lock request. See Chapter 18, "NFS, DFS, and Autofs."
syslogd	syslog	The syslogd daemon forwards system messages to the log files or users specified in the file /etc/syslog.conf.
automountd	autofs	This daemon is actually an RPC server. It answers mount and unmount requests and will undertake such tasks as automatically mounting a CD when inserted in the drive. See Chapter 18, "NFS, DFS, and Autofs."
powerd	power	This daemon is responsible for shutting down Solaris if the power is getting low or the system has been configured to shut down after a specific idle period.
inetd	inetsvc	The inet daemon (inetd) is the server process for Internet standard services, and it can also handle RPC services. It is inetd that spawns the in.telnetd processes when a telnet connection is received.
cron	cron	This daemon process is responsible for running commands at specific dates and times. It looks in configuration files held under /var/spool/cron/crontabs when it first starts up to see what it needs to run and when.
lpsched	lp	This daemon manages the local printer services.
nscd	nscd	This is the Name Service Cache Daemon. It provides caching for various databases (e.g., passwd, group, and hosts) based on the settings in its configuration file /etc/nscd.conf.
vold	volmgt	This daemon performs automatic mounting of CDs and floppy disks based on the configuration file /etc/vold.conf.
in.telnetd	n/a	This daemon is not started by an rc script, but by the inetd daemon process when a connection is initiated via telnet. There will be one occurrence of this daemon running for each telnet session. Other Transmission Control Protocol/Internet Protocol (TCP/IP) programs work in a similar way (e.g., ftp and rlogin).
utmpd	utmpd	This daemon monitors the /var/adm/utmp file and all currently running processes to ensure that when a process terminates, its entry in /var/adm/utmp gets removed. The same applies for the file /var/adm/utmpx. Utmp is being phased out in favor of utmpx.
snmpdx	init.snmpdx	This daemon process is the Sun Solstice Enterprise Master Agent. It handles Simple Network Management Protocol (SNMP) requests that enable you to do such things as provide alerts of critical events (such as a filesystem filling up) to a central node or alert monitoring tool.
dmispd	init.dmi	This daemon is also part of the Sun Solstice Enterprise System and is the Digital Multiplexed Interface (DMI) service provider.
snmpXdmid	init.dmi	This is also part of the Sun Solstice Enterprise System. This particular daemon maps SNMP requests from snmpdx into DMI requests and vice versa.
mibiisa	init.snmpdx	This daemon is the Sun SNMP agent. This passes SNMP traps to be dealt with by the snmpdx daemon.
ttymon	n/a	This daemon sets up and monitors ports (e.g., the console) and will spawn a program such as login, as configured by the sacadm command. See Chapter 14, "Connecting Serial Devices."
sendmail	sendmail	The sendmail daemon (as its name suggests) is responsible for sending mail. It will determine if the recipient is local or not and ensure the message is directed to the correct network or host. See Chapter 20, "Setting Up the Mail System."

Processes can be viewed using the ps command. The following ps listing shows all the processes that can be found running on a sample Solaris system. There are no third-party applications running so all the processes we can see are Solaris processes:

 hydrogen# ps -ef     UID  PID PPID  C    STIME TTY     TIME CMD    root    0    0  0 15:46:55 ?       0:01 sched    root    1    0  0 15:46:58 ?       0:01 /etc/init -    root    2    0  0 15:46:58 ?       0:00 pageout    root    3    0  0 15:46:58 ?       0:22 fsflush    root  112    1  0 15:47:52 ?       0:00 /usr/sbin/in.routed -q    root  251    1  0 15:48:33 ?       0:00 /usr/lib/saf/sac -t 300    root   49    1  0 15:47:18 ?       0:00 /usr/lib/devfsadm/devfseventd    root   51    1  0 15:47:20 ?       0:00 /usr/lib/devfsadm/devfsadmd  daemon  149    1  0 15:47:58 ?       0:00 /usr/lib/nfs/statd    root  118    1  0 15:47:54 ?       0:00 /usr/sbin/keyserv    root  116    1  0 15:47:53 ?       0:00 /usr/sbin/rpcbind    root  150    1  0 15:47:58 ?       0:00 /usr/lib/nfs/lockd    root  172    1  0 15:48:06 ?       0:01 /usr/sbin/syslogd    root  162    1  0 15:48:03 ?       0:00 /usr/lib/autofs/automountd    root  208    1  0 15:48:15 ?       0:00 /usr/lib/power/powerd    root  147    1  0 15:47:58 ?       0:01 /usr/sbin/inetd -s    root  174    1  0 15:48:07 ?       0:00 /usr/sbin/cron    root  195    1  0 15:48:11 ?       0:00 /usr/lib/lpsched    root  190    1  0 15:48:10 ?       0:00 /usr/sbin/nscd    root  217    1  0 15:48:16 ?       0:02 /usr/sbin/vold    root  544  147  0 17:28:02 ?       0:00 in.telnetd    root  221    1  0 15:48:17 ?       0:00 /usr/lib/utmpd    root  546  544  0 17:28:02 pts/0   0:00 -sh    root  252    1  0 15:48:33 console 0:00 /usr/lib/saf/ttymon -g -h     -p hydrogen console login:  -T sun -d /dev/console -l    root  236    1  0 15:48:26 ?       0:00 /usr/lib/snmp/snmpdx -y -c     /etc/snmp/conf    root  602  597  2 17:37:10 pts/1   0:00 ps -ef    root  244    1  0 15:48:31 ?       0:02 /usr/lib/dmi/dmispd    root  247    1  0 15:48:32 ?       0:01 /usr/lib/dmi/snmpXdmid     -s hydrogen    root  253  236  0 15:48:33 ?       0:12 mibiisa -r -p 32784    root  255  251  0 15:48:36 ?       0:00 /usr/lib/saf/ttymon    root  272    1  0 15:49:19 ?       0:00 /usr/lib/sendmail -bd -q15m    root  595  147  1 17:37:02 ?       0:00 in.telnetd    root  597  595  2 17:37:03 pts/1   0:00 sh hydrogen#

Table 2.5 describes the column headings that ps presents us with.

Table 2.5. Column Headings from ps -ef
Column Heading	Description
UID	The first column lists the UID of the user that initiated this process.
PID	This column lists the process ID of this process.
PPID	This column lists the process ID of the parent of this process. Daemon processes will usually have a PPID of 1 (init); processes initiated by a user will usually have a PPID equal to that of the user's shell.
C	This column is actually obsolete now, but used to contain the processor utilization of this process and was used for scheduling purposes.
STIME	This is the time (or date) that the process started running. If the process has been running for less than 24 hours, you will see the time it started; otherwise you will see the date.
TTY	This is the teletype (TTY) device that the process was run from. If the process was not initiated by a logged-in user, this column will contain a question mark (?).
TIME	This is the total time the process has spent running. You will notice that this does not relate to the actual time that has passed since the process was started, but is literally the time it has spent running on a processor (which is generally not very long).
CMD	This is the full command that was run, including all its arguments (up to a limit of 80 characters).

The TTY column shows which terminal (or TTY device) that process is attached to (or was executed from). We can see that when the above ps command was run there were two users logged in (both as root): one on device pts/0 (which is the user that actually ran the ps -ef command) and the other on pts/1. There is a process attached to the console, but it does not mean someone has logged in on the console; it is actually the console login prompt.

All the processes that have a question mark in the TTY column are not associated with any terminal and are daemon processes. Daemon processes are named after the Daemons of Greek mythology who ran around the Underworld doing lots of jobs; this is pretty much what daemon processes do. They may run constantly, but they are not necessarily active the whole time. They may spend most of their time asleep and just wake up every now and again to perform an action, or they may spend their time waiting for a certain event or condition to be met and then spring into action.

Most Solaris daemons are started at boot-time when the rc scripts run, though some, such as the ttymon process that runs on the console, are started and managed directly by init and have their own entries in the /etc/inittab file. If a problem arises that causes a daemon process to fall over and the process is defined in the inittab file, then init may automatically restart it, depending on the value in the action field (see above). If, however, the daemon process was started by an rc script, then it is usually up to the system administrator to get it going again. Usually the safest thing to do, after locating the correct rc script, is to first run the script with the parameter "stop" and then run it again with the parameter "start." This will ensure that you are performing a clean shutdown of that application or subsystem before starting it again.

Killing Processes

If you need to terminate a process for any reason, you can use the kill command. The name of this command is actually rather misleading. Although it is normally used to kill processes, that is not its only purpose. The actual purpose of the kill command is for it to ask the kernel to tell the process that a certain event has occurred by sending it a specific signal. The kernel will only send the signal if the user is either the owner of the process or root. A signal is basically a number that will mean something to that process (see Table 2.6). By default the kill command will request the kernel to send signal number 15 to the specified process. This indicates that we don't want it to run any longer and should cause the process to terminate. However, the program may have been written to ignore signal 15 and simply carry on running. Alternatively it may perform some action (such as removing a temporary file or tidying up after itself) before it terminates. The process will usually terminate at this point, but it might not, and we'll look at why in a moment. First we will look at how the kill command works:

 hydrogen# sleep 500 & 339 hydrogen# ps PID TTY      TIME CMD    283 pts/1    0:01 sh    340 pts/1    0:00 ps    339 pts/1    0:00 sleep hydrogen# kill 339 339 Terminated hydrogen# ps PID TTY      TIME CMD    283 pts/1    0:01 sh    341 pts/1    0:00 ps hydrogen#

In the above example, the kill command requested the Solaris kernel to send signal number 15 to the sleep process. We could also have typed the following command to achieve the same effect:

 hydrogen# kill -15 <pid> hydrogen#

Table 2.6 describes the most useful of the many signals. The full list is described in the signal man page (man -s 5 signal).

Table 2.6. Solaris Signals
No.	Signal	Description
1	SIGHUP	This is the hangup signal. It will be sent to any background processes you have started but have not finished when you log out. If you run a background process with nohup, it will ignore the SIGHUP and carry on running. This signal will cause some daemon processes to reread their configuration files rather than terminate.
2	SIGINT	This is sent to your currently running program when you type the interrupt key (usually <control c>). The default action is to cause the program to terminate, but it won't if the process "traps" the signal (see below).
3	SIGQUIT	This signal will cause the process it is sent to to terminate and produce a core dump. It can usually be sent to the current process by typing <control \> (control backslash).
4	SIGILL	This signal is normally sent to a process by the kernel to signify that it has performed an illegal instruction.
6	SIGABRT	This signal will cause a program to abort and produce a core dump.
8	SIGFPE	The kernel will send this signal to a program if it tries to perform a divide by zero.
9	SIGKILL	This signal causes a process to terminate, but unlike some of the other signals, it cannot be trapped.
10	SIGBUS	This signal tells the process a bus error has occurred.
11	SIGSEGV	This signal is sent by the kernel to any process that tries to access any part of the system memory outside of the part that has been allocated to it. This will cause a program to end with the error "segmentation violation." This is invariably caused by an error in the program.
14	SIGALRM	This signal is used to wake up a sleeping process.
15	SIGTERM	This is the default signal sent by the kill command. It will cause a process to terminate unless it traps this signal.
19	SIGPWR	This is used to signal a loss of power or system restart. If the system has a soft power button, this signal will be sent to all processes when it is pressed. Most processes will probably ignore it, but init will respond by running the "powerfail" entry in /etc/inittab.
23	SIGSTOP	This will cause a process to move into a suspend state. It will no longer run, but its state is preserved and it can be made to carry on from the point it was suspended by sending SIGCONT.
25	SIGCONT	This will cause a suspended process to carry on running from the point at which it was suspended.

In practice, the signals that are usually sent using the kill command are 15 (the default) and 9. Signal 9 requests a process to terminate much as signal 15 does, but in this case, the process cannot ignore it (or perform any alternative action). This means that if the program needs to tidy up after itself, for example, a different signal should be sent, and signal 9 should only be used as a last resort.

The majority of the Solaris signals are only sent by the kernel if one of two situations arises:

To let a process know that an external event has occurred that has some impact on it (for example, signal 1 is sent to a process if the processes TTY connection is closed)
To let the process know that it has done something it shouldn't have, such as perform a divide by 0 (signal 8) or try to update a memory address that is outside of its allocated address space (signal 11)

If you wish to prove the effect any of the Solaris signals has on a process, you can check using the kill command:

 hydrogen# sleep 500 & 335 hydrogen# ps PID TTY      TIME CMD    283 pts/1    0:01 sh    336 pts/1    0:00 ps    335 pts/1    0:00 sleep hydrogen# kill -11 335 335 Segmentation Fault - core dumped hydrogen# ps PID TTY      TIME CMD    283 pts/1    0:01 sh    337 pts/1    0:00 ps hydrogen#

Trapping Signals

It is possible to make any shell scripts you write handle signals by using the trap command. Using trap will enable your scripts to either ignore certain signals (there are some that cannot be ignored) or perform specific actions if a certain signal is received by the script. Two common uses of trap are to prevent users from being able to break out of a script by hitting <control c> (which will send signal 2 to the script) or to make sure that if a process is killed for any reason it clears out any temporary files it may have created.

We will look at the former first, as it is the simplest. We simply need to put the following line at the start of the script that we want to prevent users from breaking out of:

 trap '' 2 3 15

The format of the trap command is to say what action we want to perform, inside quotes, followed by a list of the signals that will trigger that action when they are received. In the above example, there is no action, so if the script receives signals 2, 3, or 15 it will do nothing and then carry on with whatever it was doing when the signal was received. In other words, it will ignore signals 2, 3, and 15. We see the trap command used in this way in Chapter 5, "Shells," to prevent users from breaking out of the system profile (/etc/profile) while logging in.

The following code segment can be used in any script to tidy up after itself if it receives any of the common signals:

 #!/bin/ksh # generate a unique temporary filename tempFile=/tmp/bigfile.$$ # delete the temp file before exiting upon receipt of # signal 2, 3, or 15 trap "rm -f ${tempFile}; exit 1" 2 3 15 <put the rest of your code here>

In this example we make sure that the temporary file is deleted if the script is ever terminated by one of the signals at the end of the trap line. We have used the "-f" option with rm to ensure that we do not get an error if the temporary file has not yet been created, and we need to put the whole command in double quotes rather than single as it contains a shell variable. If we did not specifically put the exit command in the line, the script would actually carry on running after deleting the temporary file, which is not what we want.

Additional Process Management Tools

Traditionally, if we had wanted to include code to kill a specific process in a shell script, we would use a combination of grep and awk as follows:

 pid=$(ps -ef | grep 'proc_name$' | grep -v grep | awk '{print $2}') kill ${pid}

Here we use grep to find the line of the ps -ef output for process proc_name and awk to obtain the process ID of that process. If we forget to include "grep -v grep" we can have a problem because there are now two processes running that contain the string proc_name (the process itself and also the grep process) as the following example demonstrates:

 hydrogen# ps -ef | grep "sleep"     root   398   375  0 11:49:41 pts/4    0:00 sleep 500     root   410   404  1 11:54:57 pts/5    0:00 grep sleep hydrogen#

Solaris 8 provides us with two new utilities that save us from having to go to such lengths to either get the process ID of a process or kill a process. These are pgrep and pkill. The first of these (pgrep) will return the process ID of all processes that match the argument supplied (which is what we needed grep and awk to do before):

 hydrogen# pgrep "sleep 500" 398 hydrogen#

The second utility is very similar, but instead of just listing the process ID (or IDs) that match it, will kill them:

 hydrogen# pkill "sleep 500" hydrogen# pgrep "sleep 500" hydrogen#

The use of constructs such as "$()" and the pipe symbol are described in more detail in Chapter 5, "Shells."

Defunct Processes

Once init has started running, all processes are created by an existing process. The existing process will be the parent of the process it created (which will be called the child process). The parent should wait for the kernel to inform it that the child process has finished running (by means of signal 18, SIGCLD), and the child process can then be removed from the process table and cleaned up. However, if the parent process does not wait for the child and simply exits itself, when the child exits there is no parent to receive the signal. When this happens the exited child process becomes a defunct (or zombie) process. These tend to hang around and, although the odd one won't cause too much of a problem, if a number of processes become defunct they can use up system resources.

A defunct process will show up in a ps listing as follows:

 hydrogen# ps -ef <lines removed for clarity> root   489   404  0                0:00 <defunct> hydrogen#

You will notice that we don't even get to see what the process was before it became defunct.

If you are running Solaris 8 or below, there is little you can do about these processes other than ignore them or reboot the system. However, Solaris 9 has introduced a new command, preap, that can be used to remove a zombie process. The usage is as follows:

 hydrogen# preap PID hydrogen#

It can only be used to clear defunct processes, so if you try it on a normal running process you will get the following error:

 hydrogen# preap 422 preap: process not defunct: 422 hydrogen#


Team-Fly

Top

Table 2.4. Solaris Daemon Processes

Table 2.5. Column Headings from ps -ef

Killing Processes

Table 2.6. Solaris Signals

Trapping Signals

Additional Process Management Tools

Defunct Processes