Maintenance Tasks | UNIX: The Complete Reference, Second Edition (Complete Reference Series)

Once your system is set up, it is important that you stay in touch with your computer and its users. The remainder of this chapter describes how you can help ensure the good working condition of your system. This includes means for checking on the system and suggestions about what you can do if you find something wrong.

Several subjects pertaining to ongoing maintenance are not in this chapter but are important for keeping your system working. See Chapter 14 for discussions of these and other administrative topics not covered here.

Communicating with Users

If more than one or two people are using your system, you will probably want to use some of the tools the UNIX System provides to communicate with users. The talk command, the wall command, the news command, and the /etc/motd file are of particular interest.

The talk Command

If you want to chat with someone on your machine or another machine, you can use the talk command to do so. This facility is the forerunner of the chat capability used by many Internet users, and is similar to IM (instant messaging) capabilities. For example, the user jennifer on machine sis1 can set up an interactive talk session with the user sharlene on machine sis2 by issuing the command

 talk sharlene@sis2

This will send a message to the screen for user sharlene with text similar to this:

 Message from TalkDaemon@sis1 at 10:03 a.m. talk: connection requested by jennifer talk: respond talk jennifer@sis1

Sharlene would then reply

 talk jennifer@sis1

to complete the connection. Each user can then type text that will be displayed on the other user’s screen. The conversation is ended when one of the participants enters an interrupt (or EOF character). At this point the other participant will receive a message that the conversation has been terminated.

If you don’t want to be disturbed during a work session, you can prevent other users from attempting to contact you by using the mesg command. Entering

 mesg -n

at your command prompt will set your terminal to reject messages from other users. Entering

 mesg -y

will reset your terminal to allow subsequent messages from other users.

The wall Command

If you want to immediately send a message to every user that is currently logged in, you can use the wall command. This is most often used when you need to bring down the system in an emergency while other users are logged in. Here is an example of how to use the wall command:

 # wall I need to bring the system down in about 5 minutes. Log off now or risk having your work interrupted. I expect to have it running again in about two hours. CTRL-D

The message will be directed to every active terminal on the system. It will show up right in the middle of the user’s work, but it will not cause any damage. Note that you must end the wall message by typing a CTRL-D.

/usr/news Messages

Longer messages can be written in a file and placed in the /usr/news directory Any user can read the news and, if the permissions to /usr/news directory are open, write their own news messages. To read the news, type the following:

 $ news notice (root) Wed Jul 12 11:30:15 2006     We just purchased another printer to attach to trigger. If you have any suggestions about where it should be located, please send mail to trigger!root.

You will see all news messages that have been added since the last time you read your news. The name of the file is the message name, the user is shown in parenthesis, and the date/time the message was created is also listed.

Message of the Day (/etc/motd)

The message-of-the-day file (/etc/motd) is used to communicate short messages to users on a more regular basis. You can simply add information to the /etc/motd file using a text editor. The information will then be displayed automatically when the user logs in. The description of the /etc/profile file shows how the motd file is read.

Following is an example of the kind of information you might want to put in your computer’s /etc/motd file:

 10/10: Trigger down at 1:00 pm today for  1 hour to add cards.

Checking the System

If you are doing administration for a system that is already set up, you will want to familiarize yourself with the system. For instance, you will want to know the system’s name, its current run state, the users who have logins to the system, and who are logged in. You might also be interested in what processes are currently running, the file systems that are available for storing data, and how much space is currently available in each file system.

The following commands will help you find out how the system is configured and what activities are occurring on the system.

Display System Name (uname)

You can use the uname -a command to display all system name information. Other options to uname let you display or change parts of this information. Here’s an example of uname with the -a option:

 # uname -a SunOS attlis 10 sun4u sparc SUNW, Ultra-Enterprise

“SunOS" identifies the operating system name, “attlis” is the computer’s communication node name, “10" is the operating system release (Solaris 10), and “sun4u…" is the machine hardware name (here a Sun Enterprise5000).

You will need to know the node name if you want to tell other users and systems how to identify your system. The operating system version is important to know if a software package you want to run is dependent on a particular operating system version.

If you just want to know the operating system name, you can use uname with the -s option. Similar to the previous example,

 # uname -s SunOS

indicates that the machine’s operating system name is SunOS.

Display Current System State (who)

You can use the who command to see whether your system is in single-user state or one of the multiuser states. To display the current system state of your computer, type this:

 # who -r    ·       run-level 2 Oct 16 16:16   2   0   S

You see that the run level is 2 (multiuser state). Other information includes the process termination status, process ID, and process exit status.

Display User Names

To see the names of those who have logins on your system, along with their user IDs, group names/IDs, and other information, type this:

 # logins root          0       other       1     0000-Admin(0000) sysadm        0       other       1     0000-Admin(0000) daemon        1       other       1     0000-Admin(0000) bin           2       bin         2     0000-Admin(0000) sys           3       sys         3     0000-Admin(0000) uucp          5       uucp        5     0000-uucp(0000) lp            7       tty         7nuucp         10    10    0000-uucp(0000) oamsys        101     other       1     Object Architecture Files mib           102     docs        77    Ida Beecher gwagner       210     docs        77    Greg Wagner gkw           212     docs        77    Karen Williams oasys         215     other       1     Object Architecture Files

Some reasons you might want to do this are that you forgot Ida Beecher’s user name; you want to add gwagner’s login to another system and you want to use the same UID number he has on this system; or you need to see which users are in the docs group because you want to add the whole group to another machine.

Display Who Is on the System

To get a list of who is currently logged into the system, the ports where they are logged in, the times/dates they logged in, how long a user has been inactive (“.” if currently active), and the process ID that relates to each user’s shell, type this:

 # who -u root       console     Oct 18 13:06   .       3158 mcn        term/12     Oct 18 20:06   .       8224

You may want to do this to check who is on the system before you shut it down. Or you may want to check for terminals that have been inactive for a long time, since long inactivity may mean that users left for the day without turning off their terminals.

Display System Definition

Most UNIX systems have some sort of utility that will display basic system definition information. This might include such information as the device used to access swap space (/dev/swap), the UNIX System boot program (/boot/KERNEL), the boards that are in each slot in the computer, and the system’s tunable parameters.

Among the most important items of system information are tunable parameters. Tunable parameters help set various tables for the UNIX System kernel and devices and put limits on resources usage. For example, the MAXUP parameter limits the number of processes a user (other than superuser) can have active simultaneously in the kernel.

Usually the default tunable settings are acceptable. However, if you are having performance problems or are running applications that place heavy demands on the system, such as networking applications, you should explore your system’s tunables. Check the documentation that comes with your system for a description of its tunables.

The sysdef utility is used on some UNIX variants to display system definition information. Here is an example of some of the contents:

 # sysdef * Hostid 806d5cid * Devices /dev/swap           17,1       0  30192  28804    .     (long list of devices may follow)    . * Tunable Parameters *    100 buffers in buffer cache (NBUF)    60  entries in callout table (NCALL)    25  processes per user id (MAXUP)      .      .      .

Other variants have similar utilities to list devices and drivers. For example, in AIX you use the lsdev -C command, in HP-UX you use the sbin/ioscan command, and in Red Hat Linux you use the cat /proc/devices command.

Display Mounted File Systems

File systems are specific areas of storage media (such as hard disks) where information is stored. When a file system is mounted, it becomes accessible from a particular point in the UNIX System directory structure. See Chapters 3 and 14 for a description of file systems.

To display the file systems that are mounted on your system, use the mount command, like this:

 # mount / on  /dev/root read/write/setuid on Thu Jul 27  15:06:40 2006/proc on /proc read/write on Thu Jul 27 15:06:41 2006 /stand on  /dev/dsk/c1d0s3 read/write on Thu Jul 27 15:06:44 2006 /var on  /dev/dsk/c1d1s8 read/write on Thu Jul 27 15:07:11 2006 /usr on  /dev/dsk/c1d0s2 read/write on Thu Jul 27 16:47:44 2006 /home2 on  /dev/dsk/c1d0sa read/write on Thu Jul 27 16:47:48 2006 /home on  /dev/dsk/c1d1s9 read/write on Thu Jul 27 16:47:52 2006

The information that is returned tells you the point in the directory structure on which the file system is mounted, the device through which it is accessible, whether the file system is read-only or readable and writable, and the date on which it was last mounted. This listing will also include any file systems that are mounted from another system across the network (remote).

After you have changed system states, you can check the mounted file systems to make sure they were successfully mounted and unmounted as appropriate.

Display Disk Space

Occasionally you will want to check how much disk space is available on each file system on your system to make sure that there is enough space to serve your users’ needs. To see the amount of disk space available in each file system on your system, use the df command as follows:

 # df -k /              (/dev/root       ):     12150 blocks    2339 files                             total:     25146 blocks    3136 files /proc          (/proc           ):         0 blocks     185 files                             total:         0 blocks     202 files /stand          (/dev/dsk/c1d0s3):      1095 blocks      45 files                             total:      5148 blocks      51 files /var            (/dev/dsk/c1d1s8):     37128 blocks    2145 files                             total:     40192 blocks    2496 files /usr            (/dev/dsk/c1d0s2):     29982 blocks    7330 files                             total:     86308 blocks   10784 files /home2          (/dev/dsk/c1d0sa):      1972 blocks      93 files                             total:      2000 blocks      96 files /home           (/dev/dsk/c1d1s9):     59420 blocks    3988 files                             total:    108504 blocks    6752 files

For each file system, you will see the mount point, related device, total number of blocks of memory, and files used. Listed underneath the blocks and files used are the total number of each available in the file system.

Even if you check nothing else, check this information occasionally If you begin to run out of either blocks of memory or the number of files you can create in that file system, consider following one of these courses:

You can distribute files to different file systems that have more room. In particular, you may want to relocate software add-on packages or one or more users to a file system with more space.
You can delete files you no longer need. Do a cleanup of administrative log files and spool files (see the description of the du command coming up). Also encourage your users to do the same.

You can copy files that do not need to be immediately accessible onto tape or floppy storage. You can always restore them later if you need to.

Display Disk Usage

If you are running out of disk space, you can use the du command to see how much space is being used by each directory. The following example shows the amount of disk space used by each directory under the directory /var/spool:

 # du /var/spool 4       /var/spool/pkg 4       /var/spool/locks 52      /var/spool/uucp/trigger 88      /var/spool/uucp 4       /var/spool/uucppublic 8       /var/spool/lp/admins 4       /var/spool/lp/fifos 4       /var/spool/lp/requests 4       /var/spool/lp/system 4       /var/spool/lp/tmp 36      /var/spool/lp 140     /var/spool

Note that each directory shows the amount of space used in it and each directory below it.

Some files and directories will grow over time. In particular, you should keep an eye on log files. These are files that keep records of different types of activities on the system, such as file transfers and system resource usage. You can set up your system to delete these files at given times (see the description of cron earlier in this chapter).

Here is a list of some of the files and directories that you should monitor:

/var/spool/uucp This directory contains files that are waiting to be sent by the Basic Networking Utilities. Files that cannot be sent because of bad addressing or network problems can accumulate here.
/var/spool/uucppublic This directory contains files that are received by Basic Networking Utilities. If these files are not retrieved by the users they are intended for, the directory may begin to fill up.
/var/adm/sulog This file contains a history of commands run by the superuser. It will grow if it is not truncated or deleted occasionally
/var/cron/log This file contains a history of jobs that are kicked off by the cron facilities. Like sulog, it should be truncated or deleted occasionally

System Activity Reporting (sar)

You can gather a wide variety of system activity information from your UNIX system using the sar command and related tools. The sar command can show you performance activity of the central processor or of a particular hardware device. Activity can be monitored for different time periods.

Here are a few examples of the reports you can generate using the sar command with various options:

 # sar -d SunOS attlis 10 sun4ru    07/21/06 13:46:28  device %busy avque r+w/s blks/s  avwait  avserv 13:46:58  sd01      6   1.6     3      5    13.8    23.7           sd04     93   2.1     2      4   467.8   444.0 13:47:28  sd04     13   1.3     4      8    10.8    32.3           sd05    100   3.1     2      5   857.4   404.1 13:47:58  sd04     17    .7     2     41      .6    48.1           sd09    100   4.4     2      6  1451.9   406.5 Average   sd04     12   1.2     3     18     8.4    34.7           sd09    100   3.2     2      5   925.7   418.2

The information given by the preceding command shows disk activity for various hard disk devices. At given times, it shows the percentage of time each disk was busy, the average number of requests that are outstanding, the number of read and write transfers to the device per second, the number of blocks transferred per second, and the average time (in milliseconds) that transfer requests wait in the queue and take to be completed. The command

 # sar -u SunOS attlis 10 sun4ru    07/21/06 10:02:07    %usr    %sys    %wio   %idle 10:02 :27     82      18       0       0 10:02 :47     39      35      16      10 10:03 :07      7      28      16      50 10:03 :27      1      16       0      83 Average         32      24      8       36

shows the central processor unit utilization. It shows the percentage of time the CPU is in user mode (%usr), system mode (%sys), waiting for input/output completion (%wio), and idle (%idle) for a given time period. It is possible to run sar as a cron job to take snapshots of your system throughout the day You can store this information in a file to be viewed by the superuser. If you wish to get an accurate picture of system performance, this is a good way to do it.

Check Processes Currently Running (ps -ef)

You can use the ps command with the -ef options to see all the processes currently running on the system. You may want to do this if performance is very slow and you suspect either a runaway process or that particular users are using more than their share of the processor.

Following is an example of some of the processes you would typically see on a running system:

 # ps -ef      UID   PID  PPID  C    STIME TTY   TIME COMD     root     1     0  0   Oct 29 ?    14:47 /sbin/init     root   213     1  0   Oct 29 ?     0:40 /usr/lib/saf/sac -t 300     root  3107     1  0   Nov 01 ?     0:04 /usr/lib/lp/lpsched     root   103     1  0   Oct 29 ?     0:03 /usr/slan/lib/admdaemon     root   113     1  0   Oct 29 ?     3:03 /usr/sbin/cron     root   216     1  0   Oct 29 ?     0:04 /usr/lib/saf/ttymon -g -m Idterm -d /dev/contty -1 contty     root  3157     1  0   Nov 01 console 0:03 /usr/lib/saf/ttymon -g -p Console Login: -m Idterm -d /dev/console -1 console     root   217     1  0   Oct 29 ?   0:01 /usr/sbin/hdelogger     root   221   213  0   Oct 29 ?   0:21 /usr/lib/saf/ttymon     root   222   213  0   Oct 29 ?   0:19 /usr/lib/saf/ttymon      mcn  4431   221  4 02:43:20 term/11 0:03 -sh      mcn  4436  4431 32 02:43:57 term/11 11:54 testprog

If the system is very slow, you may want to check for runaway processes on your system. If you see a process that is consuming a great deal of CPU time, you may want to consider killing that process (see Chapter 11).

Caution

Do not kill processes without careful consideration. If you delete one of the important system processes by mistake, you may have to reboot your system to correct the problem.

To kill the runaway process called testprog in the preceding example, you first need to know that you kill process ids (PIDs), not process names. Since the PID for testprog is 4436, typing

 kill -9 4436

will terminate the process unconditionally

The Sticky Bit

An innovative feature of UNIX in the early days of small machines was the concept of the sticky bit in file permissions. As originally implemented, if an executable file had the sticky bit set, the operating system would not delete the program text from memory when the last user process terminated. The program text would be available in memory when the next user of the file executed it. Consequently, the program did not need to be loaded, and execution was much faster. This was a useful feature, improving performance, in the days of small machines and expensive memory Today, however, with fast disk drives and cheap memory, using the sticky bit to keep a program in memory is obsolete, and most UNIX systems simply ignore it.

One feature of the sticky bit is important for system administration. Setting the sticky bit has important effects when it is set on a directory Using the sticky bit on directories provides some added security Some directories on the UNIX System must allow general read, write, and search permission, for example, tmp and spool A danger with this arrangement is that others could delete a user’s files. In most current UNIX versions, the sticky bit can be set for directories to prevent others from removing a user’s files. If the sticky bit is set on a directory, files in that directory can only be removed if one or more of the following conditions is true:

The user owns the file.
The user owns the directory
The user has write permission for the file.
The user is the superuser.

In order to set the sticky bit, you use the chmod command, like this:

 # chmod 1753 progfile

 # chmod +t progfile

In order to change the access permissions of a file, you must either own the file or be the superuser. To see if the sticky bit is set, use the ls -l command to check permissions. If you set the sticky bit of a file, a t will appear in the execute portion of the others permissions field, like this:

 $ ls -l vi -rwxr-xr-t   5 bin      bin       213824 Jul  1  2006 vi