2.2 Processes

In simple terms, a process is a single executable program that is running in its own address space.^[10] It is distinct from a job or a command, which, on Unix systems, may be composed of many processes working together to perform a specific task. Simple commands like ls are executed as a single process. A compound command containing pipes will execute one process per pipe segment. For Unix systems, managing CPU resources must be done in large part by controlling processes, because the resource allocation and batch execution facilities available with other multitasking operating systems are underdeveloped or missing.

^[10] I am not distinguishing between processes and threads at this point.

Unix processes come in several types. We'll look at the most common here.

2.2.1 Interactive Processes

Interactive processes are initiated from and controlled by a terminal session. Interactiveprocesses may run either in the foreground or the background. Foreground processes remain attached to the terminal; the foreground process is the one with which the terminal communicates directly. For example, typing a Unix command and waiting for its output means running a foreground process.

While a foreground process is running, it alone can receive direct input from the terminal. For example, if you run the diff command on two very large files, you will be unable to run another command until it finishes (or you kill it with CTRL-C).

Job control allows a process to be moved between the foreground and the background at will. For example, when a process is moved from the foreground to the background, the process is temporarily stopped, and terminal control returns to its parent process (usually a shell). The background job may be resumed and continue executing unattached to the terminal session that launched it. Alternatively, it may eventually be brought to the foreground, and once again become the terminal's current process. Processes may also be started initially as background processes.

Table 2-6 reviews the ways to control foreground and background processes provided by most current shells.

Table 2-6. Controlling processes
Form	Meaning and examples
`&`	Run command in background. $ long_cmd &
`^Z`	Stop foreground process. $ long_cmd ^Z Stopped $
`jobs`	List background processes. $ jobs [1] - Stopped emacs [2] - big_job & [3] + Stopped long_cmd
`%n`	Refers to background job number n. $ kill %2
`fg`	Bring background process to foreground. $ fg %1
`%?str`	Refers to the background job command containing the specified characters. $ fg %?em
`bg`	Restart stopped background process. $ long_cmd ^Z Stopped $ bg [3] long_cmd &
`~^Z`	Suspend `rlogin` session. bridget-27 $ ~^Z Stopped henry-85 $
`~~^Z`	Suspend second-level `rlogin` session. Useful for nested `rlogin`s; each additional tilde says to pop back to the next highest level of `rlogin`. Thus, one tilde pops all the way back to the lowest level job (the job on the local system), two tildes pops back to the first `rlogin` session, and so on. bridget-28 $ ~~^Z Stopped peter-46 $

2.2.2 Batch Processes

Batchprocesses are not associated with any terminal. Rather, they are submitted to a queue, from which jobs are executed sequentially. Unix offers a very primitive batch command, but vendors whose customers require queuing have generally implemented something more substantial. Some of the best known are the Network Queuing System (NQS), developed by NASA and used on many high-performance computers including Crays, as well as several network-based process-scheduling systems from various vendors. These facilities usually support heterogeneous as well as homogeneous networks, and they attempt to distribute the aggregate CPU load evenly among the workstations in the network, a process known as load balancing or load leveling.

2.2.3 Daemons

Daemons are serverprocesses, often initiated at boot time, that run continuously while the system is up, waiting in the background until a process requires their service.^[11] For example, network daemons are idle until a process requests network access.

^[11] Daemon is an ancient Greek word meaning "divinity" or "spirit" (but keep the character of the Greek gods in mind). The OED defines it as a "tutelary deity": the guardian of a particular person, place or thing. More recently, the poet Yeats wrote at length about daemons, defining them as that which we continually struggle against yet paradoxically need in order to survive, simultaneously the source of our pain and of our strength, even in some sense, the very essence of our being. For Yeats, the daemon is "of all things not impossible the most difficult."

Table 2-7 provides a brief overview of the most important Unix daemons.

Table 2-7. Important Unix daemons
Facility	Description	Daemon Names
`init`	First created process	`init`
`syslog`	System status/error message logging	`syslogd`
email	Mail message transport	`sendmail`
printing	Print spooler	`lpd`, `lpsched`, `qdaemon`, `rlpdaemon`
cron	Periodic process execution	`crond`
tty	Terminal support.	`getty` (and similar)
sync	Disk buffer flushing	`update`, `syncd`, `syncher`, `fsflush`, `bdflush`, `kupdated`
paging and swapping	Daemons to support virtual memory management	`pagedaemon`, `vhand`, `kpiod`, `pageout`, `swapper`, `kswapd`, `kreclaimd`
`inetd`	Master TCP/IP daemon, responsible for starting many others on demand: `telnetd`, `ftpd`, `rshd`, `imapd`, `pop3d`, `fingerd`, `rwhod` (see /etc/inetd.conf for a full list)	`inetd`
name resolution	DNS server process	`named`
routing	Routing daemon	`routed`, `gated`
DHCP	Dynamic network client configuration	`dhcpd`, `dhcpsd`
RPC	Remote procedure call facility network port-to-service mapper	`portmap`, `rpcbind`
NFS	Network File System: native Unix network file sharing	`nfsd`, `rpc.mountd`, `rpc.nfsd`, `rpc.statd`, `rpc.lockd`, `nfsiod`
Samba	File/print sharing with Windows systems	`smbd`, `nmbd`
WWW	HTTP server	`httpd`
network time	Network time synchronization	`timed`, `ntpd`

2.2.4 Process Attributes

Unix processes have many associated attributes. Some of the most important are:

Process ID (PID): A unique identifying number used to refer to the process.
Parent process ID (PPID): The PID of the process's parent process (the process that created it).
Nice number: The process's scheduling priority, which is a number indicating its importance relative to other processes. This needs to be distinguished from its actual execution priority, which is dynamically changed based on both the process's nice number and its recent CPU usage. See Section 15.3 for a detailed discussion of nice numbers and their effect on execution priority.
TTY: The terminal (or pseudo-terminal) device associated with the process.
Real and effective user ID (RUID, EUID): A process's real UID is the UID of the user who started it. Its effective UID is the UID that is used to determine the process's access to system resources (such as files and devices). Usually the real and effective UIDs are the same, and the process accordingly has the same access rights as the user who launched it. However, when the setuid access mode is set on an executable image, then the EUIDs of processes executing it are set to the UID of the file's user owner, and they are accorded corresponding access rights.
Real and effective group ID (RGID, EGID): A process's real GID is the user's primary or current group. Its effective GID, used to determine the process's access rights, is the same as the real GID except when the setgid access mode is set on an executable image. The EGIDs of processes executing such files are set to the GID of the file's group owner, and they are given corresponding access to system resources.

2.2.4.1 The life cycle of a process

A new process is created in the following manner. An existing process makes an exact copy of itself, a procedure known as forking. The new process, called the child process, has the same environment as its parent process, although it is assigned a different process ID. Then, this image in the child process's address space is overwritten by the one the child will run; this is done via the exec system call. Hence, the often-used phrase fork-and-exec. The new program (or command) completely replaces the one duplicated from the parent. However, the environment of the parent still remains, including the values of environment variables; the assignments of standard input, standard output, and standard error; and its execution priority.

Let's make this picture a bit more concrete. What happens when a user runs a command like grep? First, the user's shell process forks, creating a new shell process to run the command. Then, the new shell process execs grep, which overlays the shell's executable image in memory with grep's, which begins executing. When the grep command finishes, the process dies.

This is the way that all Unix processes are created. The ultimate ancestor for every process on a Unix system is the process with PID 1, init, created during the boot process (see Chapter 4). init creates many other processes (all by fork-and-exec). Among them are usually one or more executing the getty program. The gettys are each assigned to a different serial line; they display the login prompt and wait for someone to respond to it. When someone does, the getty process execs the login program, which validates user logins, among other activities.^[12]

^[12] The process is similar for an X terminal window. In the latter case, the xterm or other process is created by the window manager in use, which was itself started by a series of other X-related processes, ultimately deriving from a command issued from the login shell (e.g., startx) or as part of the login process itself.

Once the username and password are verified,^[13] login execs the user's shell. Forking is not always required to run a new program, and login does not fork in this case. After logging in, the user's shell is the same process as the getty that was watching the unused serial line. That process changed programs twice by execing a new executable, and it will go on to create new processes to execute the commands that the user types. Figure 2-3 illustrates Unixprocess creation in the context of initial user login.

^[13] If the login attempt fails, login exits, sending a signal to its parent process, init, indicating it should create a new getty process for the terminal.

Figure 2-3. Unix process creation: fork and exec

When any process exits, it sends a signal to inform its parent process that is has completed. So, when a user logs out, her login shell sends a signal to its parent, init, as it dies, letting init know that it's time to create a new getty process for the terminal. init forks again and starts the getty, and the whole cycle repeats itself again and again as different users use that terminal.

2.2.4.2 Setuid and setgid file access and process execution

The purpose of the setuid and setgid access modes is to allow ordinary users to perform tasks requiring privileges and access rights that are ordinarily denied to them. For example, on many systems the write command is owned by the tty group, which also owns all of the terminal and pseudo-terminal device files. The write command has setgid access, allowing any user to use it to write a message to another user's terminal or window (to which they do not normally have any access). When users execute write, their effective GID is set to that of the group owner of the executable file (often /usr/bin/write) for the duration of the command.

Setuid and/or setgid access are also used by the printing subsystem, by programs like mailers, and by some other system facilities. However, setuid programs are also notorious security risks. In practice, setuid almost always means setuid to root, and the danger is that somehow, through program stupidity or their own cleverness or both, users will figure out a way to perform additional, unauthorized functions while the setuid command is running or to retain their inherited root status after the command ends. In general, setuid access should be avoided since it involves greater security risks than setgid, and almost any function can be performed by using the latter in conjunction with carefully designed groups. See Chapter 7 for a more detailed discussion of the security issues involved with setuid and setgid programs. Keep in mind, though, that while setgid programs are safer than setuid ones, they are not risk-free themselves.

2.2.4.3 The relationship between commands and files

The Unix operating system does not distinguish between commands and files in the ways that some systems do. Aside from a few commands that are built into each Unix shell, Unix commands are executable files stored in one of several standard locations within the filesystem. Access to commands is exactly equivalent to access to these files. By default, there is no other privilege mechanism. Even I/O is handled via special files, stored in the directory /dev, which function as interfaces to the device drivers. All I/O operations look just like ordinary file operations from the user's point of view.

Unix shells use search paths to locate the executable's images for commands that users enter. In its simplest form, a search path is simply an ordered list of directories in which to look for command executables, and it is typically set in an initialization file ($HOME/.profile or $HOME/.login). A faulty (incomplete) search path is the most common cause for "Command not found" error messages.

Search paths are stored in the PATH environment variable. Here is a typical PATH:

$ echo $PATH  /bin:/usr/ucb:/usr/bin:/usr/local/bin:.:$HOME/bin

The various directories in the PATH are separated by colons. The search path is used whenever a command name is entered without an explicit directory location. As an example, consider the following command:

$ od data.raw

The od command is used to display a raw dump of a file. To locate this command, the operating system first looks for a file named od in /bin. If such a file exists, it is executed. If there is no od file in the /bin directory, /usr/ucb is checked next, followed by /usr/bin (where od is in fact usually located). If it were necessary, the search would continue in /usr/local/bin, the current directory, and finally the bin subdirectory of the user's home directory.

The order of the directories in the search path is important when more than one version of a command exists. Such effects come into play most frequently when both the BSD and the System V versions of commands are available on a system. In this case, you should put the directory holding the versions you want to use first in your search path. For example, if you want to use the BSD versions of commands such as ls and ln on a System V-based system, then put /usr/ucb ahead of /usr/bin in your search path. Similarly, if you want to use the System V-compatible commands available on some systems, put /usr/5bin ahead of /usr/bin and /usr/ucb in your search path. These same considerations will obviously apply to users' search paths that you define for them in their initialization files (see Section 4.2).

Most of the Unix administrative utilities are located in the directories /sbin and /usr/sbin. However, the locations of administrative commands can vary widely between Unix versions. These directories typically aren't in the search path unless you put them there explicitly. When executing administrative commands, you can either add these directories to your search path or provide the full pathname for the command, as in the example below:

# /usr/sbin/ping hamlet

I'm going to assume in my examples that the administrative directories have been added to the search path. Thus, I won't be including the full pathname for any of the commands I'll be discussing.

The Unix Way of System Administration

System administrators are stereotypically arrogant, single-minded, and opinionated. For Unix system administrators, the stereotype was born in the days when Unix was this bizarre operating system that ran on only a few systems, and the local Unix guru was some guy who generally kept to himself, locked away with his system or so the story goes.

The skepticism I'm exhibiting with this view of Unix system managers does not mean that there is no truth in it at all. Like most caricatures, this one has roots in reality. For example, it is all too easy to find people who will tell you that there is one right editor to use, one right shell for writing scripts, one right way to do anything you care to name. Discussing the advantages and liabilities of alternative approaches to problems can be both useful and entertaining, but only within reason.

Since you're reading this introductory chapter, I'm assuming that you are only beginning your exploration of Unix administration. I certainly want to encourage you to consider for yourself all the tasks and issues you will face as you proceed and to provide help when I can. You'll quickly form your own opinions and define what system administration is for you. Doing so is a process, which can continue for as long and range as widely as you want it to. However, if you get to a point where fanaticism replaces thinking, you've gone too far.