Managing Processes

Now that you have examined what processes are, you will now look at some special features of processes as implemented in Solaris. One of the most innovative characteristics of processes under Solaris is the process file system (PROCFS), which is mounted as the /proc file system. Images of all currently active processes are stored in the /proc file system by their PID.

Here s an example. First, a process is identified ”in this example, the current Korn shell for the user pwatters:

 # ps -eaf  grep pwatters  pwatters 310   291  0   Mar 20 ?        0:04 /usr/openwin/bin/Xsun  pwatters 11959 11934  0 09:21:42 pts/1    0:00 grep pwatters  pwatters 11934 11932  1 09:20:50 pts/1    0:00 -ksh 

Now that you have a target PID (11934), you can change to the /proc/11934 directory and you will be able to view the image of this process:

 # cd /proc/11934  # ls -l total 3497 -rw-------   1 pwatters     other    1769472 Mar 30 09:20 as -r--------   1 pwatters     other        152 Mar 30 09:20 auxv -r--------   1 pwatters     other         32 Mar 30 09:20 cred --w-------   1 pwatters     other          0 Mar 30 09:20 ctl lr-x------   1 pwatters     other          0 Mar 30 09:20 cwd ->> dr-x------   2 pwatters     other       1184 Mar 30 09:20 fd -r--r--r--   1 pwatters     other        120 Mar 30 09:20 lpsinfo -r--------   1 pwatters     other        912 Mar 30 09:20 lstatus -r--r--r--   1 pwatters     other        536 Mar 30 09:20 lusage dr-xr-xr-x   3 pwatters     other         48 Mar 30 09:20 lwp -r--------   1 pwatters     other       2016 Mar 30 09:20 map dr-x------   2 pwatters     other        544 Mar 30 09:20 object -r--------   1 pwatters     other       2552 Mar 30 09:20 pagedata -r--r--r--   1 pwatters     other        336 Mar 30 09:20 psinfo -r--------   1 pwatters     other       2016 Mar 30 09:20 rmap lr-x------   1 pwatters     other          0 Mar 30 09:20 root ->> -r--------   1 pwatters     other       1440 Mar 30 09:20 sigact -r--------   1 pwatters     other       1232 Mar 30 09:20 status -r--r--r--   1 pwatters     other        256 Mar 30 09:20 usage -r--------   1 pwatters     other          0 Mar 30 09:20 watch -r--------   1 pwatters     other       3192 Mar 30 09:20 xmap 

Each of the directories with the name associated with the PID contains additional subdirectories, which contain state information and related control functions. In addition, a watchpoint facility is provided, which is responsible for controlling memory access.

Tip  

A series of proc tools are available to interpret the information contained in the /proc subdirectories.

Using proc tools

The proc tools are designed to operate on data contained within the /proc file system. Each utility takes a PID as its argument and performs operations associated with the PID. For example, the pflags command prints the flags and data model details for the PID in question.

For the preceding Korn shell example, you can easily print out this status information:

 # /usr/proc/bin/pflags 29081 29081:  /bin/ksh         data model = _ILP32  flags = PR_ORPHAN   /1:   flags = PR_PCINVALPR_ASLEEP [ waitid(0x7,0x0,0x804714c,0x7) ] 

You can also print the credential information for this process, including the effective and real UID and GID of the process owner, by using the pcred command:

 $ /usr/proc/bin/pcred 29081 29081:  e/r/sUID=100  e/r/sGID=10 

Here, both the effective and the real UID is 100 (user pwatters), and the effective and real GID is 10 ( group staff).

To examine the address space map of the target process, you can use the pmap command and all of the libraries it requires to execute:

 # /usr/proc/bin/pmap 29081 29081:  /bin/ksh 08046000      8K read/write/exec     [ stack ] 08048000    160K read/exec         /usr/bin/ksh 08070000      8K read/write/exec   /usr/bin/ksh 08072000     28K read/write/exec     [ heap ] DFAB4000     16K read/exec         /usr/lib/locale/en_AU/en_AU.so.2 DFAB8000      8K read/write/exec   /usr/lib/locale/en_AU/en_AU.so.2 DFABB000      4K read/write/exec     [ anon ] DFABD000     12K read/exec         /usr/lib/libmp.so.2 DFAC0000      4K read/write/exec   /usr/lib/libmp.so.2 DFAC4000    552K read/exec         /usr/lib/libc.so.1 DFB4E000     24K read/write/exec   /usr/lib/libc.so.1 DFB54000      8K read/write/exec     [ anon ] DFB57000    444K read/exec         /usr/lib/libnsl.so.1 DFBC6000     20K read/write/exec   /usr/lib/libnsl.so.1 DFBCB000     32K read/write/exec     [ anon ] DFBD4000     32K read/exec         /usr/lib/libsocket.so.1 DFBDC000      8K read/write/exec   /usr/lib/libsocket.so.1 DFBDF000      4K read/exec         /usr/lib/libdl.so.1 DFBE1000      4K read/write/exec     [ anon ] DFBE3000    100K read/exec         /usr/lib/ld.so.1 DFBFC000     12K read/write/exec   /usr/lib/ld.so.1  total     1488K 

It s always surprising to see how many libraries are loaded when an application is executed, especially something as complicated as a shell, leading to a total of 1488KB memory used. You can obtain a list of the dynamic libraries linked to each process by using the pldd command:

 # /usr/proc/bin/pldd 29081 29081:  /bin/ksh /usr/lib/libsocket.so.1 /usr/lib/libnsl.so.1 /usr/lib/libc.so.1 /usr/lib/libdl.so.1 /usr/lib/libmp.so.2 /usr/lib/locale/en_AU/en_AU.so.2 

As discussed in the previous section Sending Signals, signals are the way in which processes communicate with each other, and they can also be used from shells to communicate with spawned processes (usually to suspend or kill them).

By using the psig command, it is possible to list the signals associated with each process:

 $ /usr/proc/bin/psig 29081 29081:  /bin/ksh HUP     caught  RESTART INT     caught  RESTART QUIT    ignored ILL     caught  RESTART TRAP    caught  RESTART ABRT    caught  RESTART EMT     caught  RESTART FPE     caught  RESTART KILL    default BUS     caught  RESTART SEGV    default SYS     caught  RESTART PIPE    caught  RESTART ALRM    caught  RESTART TERM    ignored USR1    caught  RESTART USR2    caught  RESTART CLD     default NOCLDSTOP PWR     default WINCH   default URG     default POLL    default STOP    default TSTP    ignored CONT    default TTIN    ignored TTOU    ignored VTALRM  default PROF    default XCPU    caught  RESTART XFSZ    ignored WAITING default LWP     default FREEZE  default THAW    default CANCEL  default LOST    default RTMIN   default RTMIN+1 default RTMIN+2 default RTMIN+3 default RTMAX-3 default RTMAX-2 default RTMAX-1 default RTMAX   default 

It is also possible to print a hexadecimal format stack trace for the lightweight process (LWP) in each process by using the pstack command. This can be useful in the same way that the truss command was used:

 $ /usr/proc/bin/pstack 29081 29081:  /bin/ksh  dfaf5347 waitid   (7, 0, 804714c, 7)  dfb0d9db _waitPID (ffffffff, 8047224, 4) + 63  dfb40617 waitPID  (ffffffff, 8047224, 4) + 1f  0805b792 job_wait (719d) + 1ae  08064be8 sh_exec  (8077270, 14) + af0  0805e3a1 ???????? ()  0805decd main     (1, 8047624, 804762c) + 705   0804fa78 ???????? () 

Perhaps the most commonly used proc tool is the pfiles command, which displays all of the open files for each process. This is useful for determining operational dependencies between data files and applications:

 $ /usr/proc/bin/pfiles 29081 29081:  /bin/ksh   Current rlimit: 64 file descriptors    0: S_IFCHR mode:0620 dev:102,0 ino:319009 UID:6049 GID:7 rdev:24,8       O_RDWRO_LARGEFILE    1: S_IFCHR mode:0620 dev:102,0 ino:319009 UID:6049 GID:7 rdev:24,8       O_RDWRO_LARGEFILE    2: S_IFCHR mode:0620 dev:102,0 ino:319009 UID:6049 GID:7 rdev:24,8       O_RDWRO_LARGEFILE   63: S_IFREG mode:0600 dev:174,2 ino:990890 UID:6049 GID:1 size:3210       O_RDWRO_APPENDO_LARGEFILE FD_CLOEXEC 

In addition, it is possible to obtain the current working directory of the target process by using the pwdx command:

 $ /usr/proc/bin/pwdx 29081 29081:  /home/paul 

If you need to examine the process tree for all parent and child processes containing the target PID, you can use the ptree command. This is useful for determining dependencies between processes that are not apparent by consulting the process list:

 $ /usr/proc/bin/ptree 29081 247   /usr/dt/bin/dtlogin -daemon   28950 /usr/dt/bin/dtlogin -daemon     28972 /bin/ksh /usr/dt/bin/Xsession       29012 /usr/dt/bin/sdt_shell -c       unset DT;      DISPLAY=lion:0;         29015 ksh -c       unset DT;      DISPLAY=lion:0;                 /usr/dt/bin/dt           29026 /usr/dt/bin/dtsession             29032 dtwm               29079 /usr/dt/bin/dtterm                 29081 /bin/ksh                   29085 /usr/local/bin/bash                     29230 /usr/proc/bin/ptree 29081 

Here, ptree has been executed from the Bourne again shell (bash), which was started from the Korn shell (ksh), spawned from the dtterm terminal window, which was spawned from the dtwm window manager, and so on.

Tip  

Although many of these proc tools will seem obscure, they are often very useful when trying to debug process-related application errors, especially in large applications like database management systems.

Using the lsof Command

lsof stands for "list open files" and lists information about files that are currently opened by the active processes running on Solaris. It is not included in the Solaris distribution; however, the current version can always be downloaded from ftp://vic.cc.purdue.edu/pub/tools/unix/lsof . Keep in mind that lsof is very sensitive to changes in OS releases, and recompilation may be necessary between Solaris 8 and 9.

What can you use lsof for? The answer largely depends on how many problems you encounter that relate to processes and files. Often, administrators are interested in knowing which processes are currently using a target file or files from a particular directory. This can occur when a file is locked by one application but is required by another application (again, a database system s data files are one example where this might happen, if two database instances attempt to write to the files at once). If you know the path to a file of interest, you can use lsof to determine which processes are using files in that directory.

To examine the processes that are using files in the /tmp file system, use this:

 $ lsof /tmp COMMAND    PID USER      FD   TYPE DEVICE SIZE/OFF      NODE NAME ssion      338 pwatters  txt   VREG    0,1   271596 471638794 /tmp (swap) (unknown)  345 pwatters  txt   VREG    0,1   271596 471638794 /tmp (swap) le        2295 pwatters  txt   VREG    0,1   271596 471638794 /tmp (swap) le        2299 pwatters  txt   VREG    0,1   271596 471638794 /tmp (swap) 

Obviously, there s a bug in the routines that obtain the command name (the first four characters are missing!), but since the PID is correct, this is enough information to identify the four applications that are currently using files in /tmp . For example, dtsession (PID 338) manages the CDE session for the user pwatters, who is using a temporary text file in the /tmp directory. Later versions of lsof have fixed this bug.

Another common problem that lsof is used for, with respect to the /tmp file system, is the identification of processes that continue to write to unlinked files: thus space is being consumed, but it may appear that no files are growing any larger! This confusing activity can be traced back to a process by using lsof . However, rather than using lsof on the /tmp directory directly, you would need to examine the root directory ("/") on which /tmp is mounted. After finding the process that is writing to an open file, the process can be killed . If the size of a file is changing across several different sampling epochs (for example, by running the command once a minute), you ve probably found the culprit:

 # lsof / COMMAND    PID   USER   FD  TYPE DEVICE SIZE/OFF   NODE NAME (unknown)    1   root  txt  VREG  102,0   446144 118299 / (/dev/dsk/c0d0s0) (unknown)    1   root  txt  VREG  102,0     4372 293504 / (/dev/dsk/c0d0s0) (unknown)    1   root  txt  VREG  102,0   173272 293503 / (/dev/dsk/c0d0s0) sadm        62   root  txt  VREG  102,0   954804 101535 / (/dev/dsk/c0d0s0) sadm        62   root  txt  VREG  102,0   165948 101569 / (/dev/dsk/c0d0s0) sadm        62   root  txt  VREG  102,0    16132 100766 / (/dev/dsk/c0d0s0) sadm        62   root  txt  VREG  102,0     8772 100765 / (/dev/dsk/c0d0s0) sadm        62   root  txt  VREG  102,0   142652 101571 / (/dev/dsk/c0d0s0) 

One of the restrictions on mounting a file system is that you can t unmount that file system if files are open on it: if files are open on a file system and it is dismounted, any changes made to the files may not be saved, resulting in data loss. Looking at a process list may not always reveal which processes are opening which files, and this can be very frustrating if Solaris refuses to unmount a file system because some files are open. Again, lsof can be used to identify the processes that are opening files on a specific file system.

The first step is to consult the output of the df command to obtain the names of currently mounted file systems:

 $ df -k Filesystem            kbytes    used   avail capacity  Mounted on /proc                      0       0       0     0%    /proc /dev/dsk/c0d0s0      2510214  929292 1530718    38%    / fd                         0       0       0     0%    /dev/fd /dev/dsk/c0d0s3      5347552  183471 5110606     4%    /usr/local swap                  185524   12120  173404     7%    /tmp 

If you wanted to unmount the /dev/dsk/c0d0s3 file system, but you were prevented from doing so because of open files, you can obtain a list of all open files under /usr/local by using this command:

 $ lsof /dev/dsk/c0d0s3 COMMAND PID   USER  FD TYPE DEVICE SIZE/OFF   NODE NAME httpd   981   root txt VREG  102,3  1747168 457895 /usr/local httpd   982   root txt VREG  102,3   333692  56455 /usr/local httpd   983   root txt VREG  102,3   333692  56455 /usr/local httpd   984   root txt VREG  102,3   333692  56455 /usr/local javac   985   root txt VREG  102,3   333692  56455 /usr/local httpd   986   root txt VREG  102,3   333692  56455 /usr/local httpd   987   root txt VREG  102,3   333692  56455 /usr/local httpd   988   root txt VREG  102,3   333692  56455 /usr/local httpd   989   root txt VREG  102,3   333692  56455 /usr/local httpd   990   root txt VREG  102,3   333692  56455 /usr/local 

Obviously, all of these processes will need to stop using the open files before the file system can be unmounted. If you re not sure where a particular command is running from, or on which file system its data files are stored, you can also use lsof to check open files by passing the PID on the command line. First, you need to identify a PID by using the ps command:

 $ ps -eaf  grep apache   nobody  4911  4905  0   Mar 22 ?        0:00 /usr/local/apache/bin/httpd   nobody  4910  4905  0   Mar 22 ?        0:00 /usr/local/apache/bin/httpd   nobody  4912  4905  0   Mar 22 ?        0:00 /usr/local/apache/bin/httpd   nobody  4905     1  0   Mar 22 ?        0:00 /usr/local/apache/bin/httpd   nobody  4907  4905  0   Mar 22 ?        0:00 /usr/local/apache/bin/httpd   nobody  4908  4905  0   Mar 22 ?        0:00 /usr/local/apache/bin/httpd   nobody  4913  4905  0   Mar 22 ?        0:00 /usr/local/apache/bin/httpd   nobody  4909  4905  0   Mar 22 ?        0:00 /usr/local/apache/bin/httpd   nobody  4906  4905  0   Mar 22 ?        0:00 /usr/local/apache/bin/httpd 

Now examine the process 4905 for Apache to see what files are currently being opened by it:

 $ lsof -p 4905 COMMAND  PID  USER   FD   TYPE DEVICE  SIZE/OFF       NODE NAME d       4905 nobody txt   VREG  102,3   333692  56455 /usr/local (/dev/dsk/c0d0s3) d       4905 nobody txt   VREG  102,0    17388 100789 / (/dev/dsk/c0d0s0) d       4905 nobody txt   VREG  102,0   954804 101535 / (/dev/dsk/c0d0s0) d       4905 nobody txt   VREG  102,0   693900 101573 / (/dev/dsk/c0d0s0) d       4905 nobody txt   VREG  102,0    52988 100807 / (/dev/dsk/c0d0s0) d       4905 nobody txt   VREG  102,0     4396 100752 / (/dev/dsk/c0d0s0) d       4905 nobody txt   VREG  102,0   175736 100804 / (/dev/dsk/c0d0s0) 

Apache obviously has a number of open files!

The ps Command

The following table summarizes the main options used with ps .

Option

Description

-a

Lists most frequently requested processes.

-A, -e

Lists all processes.

-c

List processes in scheduler format.

-d

List all processes.

-f

Prints comprehensive process information.

-g

Prints process information on a group basis for a single group.

-G

Prints process information on a group basis for a list of groups.

-j

Includes SID and PGID in printout.

-l

Prints complete process information.

-L

Displays LWP details.

-p

Lists process details for a list of specified processes.

-P

Lists the CPU ID to which a process is bound.

-s

Lists session leaders .

-t

Lists all processes associated with a specific terminal.

-u

Lists all processes for a specific user.

kill

The following table summarizes the main signals used to communicate with processes using kill .

Signal

Code

Action

Description

SIGHUP

1

Exit

Hang up

SIGINT

2

Exit

Interrupt

SIGQUIT

3

Core

Quit

SIGILL

4

Core

Illegal instruction

SIGTRAP

5

Core

Trace

SIGABRT

6

Core

Abort

SIGEMT

7

Core

Emulation trap

SIGFPE

8

Core

Arithmetic exception

SIGKILL

9

Exit

Killed

SIGBUS

10

Core

Bus error

SIGSEGV

11

Core

Segmentation fault

SIGSYS

12

Core

Bad system call

SIGPIPE

13

Exit

Broken pipe

SIGALRM

14

Exit

Alarm clock

SIGTERM

15

Exit

Terminate

pgrep

The pgrep command is used to search for a list of processes whose names match a pattern specified on the command line. The command returns a list of corresponding PIDs. This list can then be piped to another command, such as kill , to perform some action on the processes or send them a signal.

For example, to kill all processes associated with the name java, the following command would be used:

 $ kill -9 `pgrep java` 

pkill

The pkill command can be used to send signals to processes that have the same name. It is a more specific version of ? , since it can be used only to send signals, and the list of PIDs cannot be piped to another program.

To kill all processes associated with the name java, the following command would be used:

 $ pkill -9 java 

killall

The killall command is used to kill all processes running on a system. It is called by shutdown when the system is being brought to run level 0. However, since a signal can be passed to the killall command, it is possible for a superuser to send a different signal (other than 15) to all processes. For example, to send a SIGHUP signal to all processes, the following command could be used:

 # killall 1 
 
 
   


Sun Certified Solaris 9.0 System and Network Administrator
Sun Certified Solaris(tm) 9 System and Network Administrator All-in-One Exam Guide
ISBN: 0072225300
EAN: 2147483647
Year: 2003
Pages: 265
Authors: Paul Watters

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net