Once your system is set up, it is important that you stay in touch with your computer and its users. The remainder of this chapter describes how you can help ensure the good working condition of your system. This includes means for checking on the system and suggestions about what you can do if you find something wrong.
Several subjects pertaining to ongoing maintenance are not in this chapter but are important for keeping your system working. See Chapter 14 for discussions of these and other administrative topics not covered here.
If more than one or two people are using your system, you will probably want to use some of the tools the UNIX System provides to communicate with users. The talk command, the wall command, the news command, and the /etc/motd file are of particular interest.
If you want to chat with someone on your machine or another machine, you can use the talk command to do so. This facility is the forerunner of the chat capability used by many Internet users, and is similar to IM (instant messaging) capabilities. For example, the user jennifer on machine sis1 can set up an interactive talk session with the user sharlene on machine sis2 by issuing the command
talk sharlene@sis2
This will send a message to the screen for user sharlene with text similar to this:
Message from TalkDaemon@sis1 at 10:03 a.m. talk: connection requested by jennifer talk: respond talk jennifer@sis1
Sharlene would then reply
talk jennifer@sis1
to complete the connection. Each user can then type text that will be displayed on the other user’s screen. The conversation is ended when one of the participants enters an interrupt (or EOF character). At this point the other participant will receive a message that the conversation has been terminated.
If you don’t want to be disturbed during a work session, you can prevent other users from attempting to contact you by using the mesg command. Entering
mesg -n
at your command prompt will set your terminal to reject messages from other users. Entering
mesg -y
will reset your terminal to allow subsequent messages from other users.
If you want to immediately send a message to every user that is currently logged in, you can use the wall command. This is most often used when you need to bring down the system in an emergency while other users are logged in. Here is an example of how to use the wall command:
# wall I need to bring the system down in about 5 minutes. Log off now or risk having your work interrupted. I expect to have it running again in about two hours. CTRL-D
The message will be directed to every active terminal on the system. It will show up right in the middle of the user’s work, but it will not cause any damage. Note that you must end the wall message by typing a CTRL-D.
Longer messages can be written in a file and placed in the /usr/news directory Any user can read the news and, if the permissions to /usr/news directory are open, write their own news messages. To read the news, type the following:
$ news notice (root) Wed Jul 12 11:30:15 2006 We just purchased another printer to attach to trigger. If you have any suggestions about where it should be located, please send mail to trigger!root.
You will see all news messages that have been added since the last time you read your news. The name of the file is the message name, the user is shown in parenthesis, and the date/time the message was created is also listed.
The message-of-the-day file (/etc/motd) is used to communicate short messages to users on a more regular basis. You can simply add information to the /etc/motd file using a text editor. The information will then be displayed automatically when the user logs in. The description of the /etc/profile file shows how the motd file is read.
Following is an example of the kind of information you might want to put in your computer’s /etc/motd file:
10/10: Trigger down at 1:00 pm today for 1 hour to add cards.
If you are doing administration for a system that is already set up, you will want to familiarize yourself with the system. For instance, you will want to know the system’s name, its current run state, the users who have logins to the system, and who are logged in. You might also be interested in what processes are currently running, the file systems that are available for storing data, and how much space is currently available in each file system.
The following commands will help you find out how the system is configured and what activities are occurring on the system.
You can use the uname -a command to display all system name information. Other options to uname let you display or change parts of this information. Here’s an example of uname with the -a option:
# uname -a SunOS attlis 10 sun4u sparc SUNW, Ultra-Enterprise
“SunOS" identifies the operating system name, “attlis” is the computer’s communication node name, “10" is the operating system release (Solaris 10), and “sun4u…" is the machine hardware name (here a Sun Enterprise5000).
You will need to know the node name if you want to tell other users and systems how to identify your system. The operating system version is important to know if a software package you want to run is dependent on a particular operating system version.
If you just want to know the operating system name, you can use uname with the -s option. Similar to the previous example,
# uname -s SunOS
indicates that the machine’s operating system name is SunOS.
You can use the who command to see whether your system is in single-user state or one of the multiuser states. To display the current system state of your computer, type this:
# who -r · run-level 2 Oct 16 16:16 2 0 S
You see that the run level is 2 (multiuser state). Other information includes the process termination status, process ID, and process exit status.
To see the names of those who have logins on your system, along with their user IDs, group names/IDs, and other information, type this:
# logins root 0 other 1 0000-Admin(0000) sysadm 0 other 1 0000-Admin(0000) daemon 1 other 1 0000-Admin(0000) bin 2 bin 2 0000-Admin(0000) sys 3 sys 3 0000-Admin(0000) uucp 5 uucp 5 0000-uucp(0000) lp 7 tty 7nuucp 10 10 0000-uucp(0000) oamsys 101 other 1 Object Architecture Files mib 102 docs 77 Ida Beecher gwagner 210 docs 77 Greg Wagner gkw 212 docs 77 Karen Williams oasys 215 other 1 Object Architecture Files
Some reasons you might want to do this are that you forgot Ida Beecher’s user name; you want to add gwagner’s login to another system and you want to use the same UID number he has on this system; or you need to see which users are in the docs group because you want to add the whole group to another machine.
To get a list of who is currently logged into the system, the ports where they are logged in, the times/dates they logged in, how long a user has been inactive (“.” if currently active), and the process ID that relates to each user’s shell, type this:
# who -u root console Oct 18 13:06 . 3158 mcn term/12 Oct 18 20:06 . 8224
You may want to do this to check who is on the system before you shut it down. Or you may want to check for terminals that have been inactive for a long time, since long inactivity may mean that users left for the day without turning off their terminals.
Most UNIX systems have some sort of utility that will display basic system definition information. This might include such information as the device used to access swap space (/dev/swap), the UNIX System boot program (/boot/KERNEL), the boards that are in each slot in the computer, and the system’s tunable parameters.
Among the most important items of system information are tunable parameters. Tunable parameters help set various tables for the UNIX System kernel and devices and put limits on resources usage. For example, the MAXUP parameter limits the number of processes a user (other than superuser) can have active simultaneously in the kernel.
Usually the default tunable settings are acceptable. However, if you are having performance problems or are running applications that place heavy demands on the system, such as networking applications, you should explore your system’s tunables. Check the documentation that comes with your system for a description of its tunables.
The sysdef utility is used on some UNIX variants to display system definition information. Here is an example of some of the contents:
# sysdef * Hostid 806d5cid * Devices /dev/swap 17,1 0 30192 28804 . (long list of devices may follow) . * Tunable Parameters * 100 buffers in buffer cache (NBUF) 60 entries in callout table (NCALL) 25 processes per user id (MAXUP) . . .
Other variants have similar utilities to list devices and drivers. For example, in AIX you use the lsdev -C command, in HP-UX you use the sbin/ioscan command, and in Red Hat Linux you use the cat /proc/devices command.
File systems are specific areas of storage media (such as hard disks) where information is stored. When a file system is mounted, it becomes accessible from a particular point in the UNIX System directory structure. See Chapters 3 and 14 for a description of file systems.
To display the file systems that are mounted on your system, use the mount command, like this:
# mount / on /dev/root read/write/setuid on Thu Jul 27 15:06:40 2006/proc on /proc read/write on Thu Jul 27 15:06:41 2006 /stand on /dev/dsk/c1d0s3 read/write on Thu Jul 27 15:06:44 2006 /var on /dev/dsk/c1d1s8 read/write on Thu Jul 27 15:07:11 2006 /usr on /dev/dsk/c1d0s2 read/write on Thu Jul 27 16:47:44 2006 /home2 on /dev/dsk/c1d0sa read/write on Thu Jul 27 16:47:48 2006 /home on /dev/dsk/c1d1s9 read/write on Thu Jul 27 16:47:52 2006
The information that is returned tells you the point in the directory structure on which the file system is mounted, the device through which it is accessible, whether the file system is read-only or readable and writable, and the date on which it was last mounted. This listing will also include any file systems that are mounted from another system across the network (remote).
After you have changed system states, you can check the mounted file systems to make sure they were successfully mounted and unmounted as appropriate.
Occasionally you will want to check how much disk space is available on each file system on your system to make sure that there is enough space to serve your users’ needs. To see the amount of disk space available in each file system on your system, use the df command as follows:
# df -k / (/dev/root ): 12150 blocks 2339 files total: 25146 blocks 3136 files /proc (/proc ): 0 blocks 185 files total: 0 blocks 202 files /stand (/dev/dsk/c1d0s3): 1095 blocks 45 files total: 5148 blocks 51 files /var (/dev/dsk/c1d1s8): 37128 blocks 2145 files total: 40192 blocks 2496 files /usr (/dev/dsk/c1d0s2): 29982 blocks 7330 files total: 86308 blocks 10784 files /home2 (/dev/dsk/c1d0sa): 1972 blocks 93 files total: 2000 blocks 96 files /home (/dev/dsk/c1d1s9): 59420 blocks 3988 files total: 108504 blocks 6752 files
For each file system, you will see the mount point, related device, total number of blocks of memory, and files used. Listed underneath the blocks and files used are the total number of each available in the file system.
Even if you check nothing else, check this information occasionally If you begin to run out of either blocks of memory or the number of files you can create in that file system, consider following one of these courses:
You can distribute files to different file systems that have more room. In particular, you may want to relocate software add-on packages or one or more users to a file system with more space.
You can delete files you no longer need. Do a cleanup of administrative log files and spool files (see the description of the du command coming up). Also encourage your users to do the same.
You can copy files that do not need to be immediately accessible onto tape or floppy storage. You can always restore them later if you need to.
If you are running out of disk space, you can use the du command to see how much space is being used by each directory. The following example shows the amount of disk space used by each directory under the directory /var/spool:
# du /var/spool 4 /var/spool/pkg 4 /var/spool/locks 52 /var/spool/uucp/trigger 88 /var/spool/uucp 4 /var/spool/uucppublic 8 /var/spool/lp/admins 4 /var/spool/lp/fifos 4 /var/spool/lp/requests 4 /var/spool/lp/system 4 /var/spool/lp/tmp 36 /var/spool/lp 140 /var/spool
Note that each directory shows the amount of space used in it and each directory below it.
Some files and directories will grow over time. In particular, you should keep an eye on log files. These are files that keep records of different types of activities on the system, such as file transfers and system resource usage. You can set up your system to delete these files at given times (see the description of cron earlier in this chapter).
Here is a list of some of the files and directories that you should monitor:
/var/spool/uucp This directory contains files that are waiting to be sent by the Basic Networking Utilities. Files that cannot be sent because of bad addressing or network problems can accumulate here.
/var/spool/uucppublic This directory contains files that are received by Basic Networking Utilities. If these files are not retrieved by the users they are intended for, the directory may begin to fill up.
/var/adm/sulog This file contains a history of commands run by the superuser. It will grow if it is not truncated or deleted occasionally
/var/cron/log This file contains a history of jobs that are kicked off by the cron facilities. Like sulog, it should be truncated or deleted occasionally
You can gather a wide variety of system activity information from your UNIX system using the sar command and related tools. The sar command can show you performance activity of the central processor or of a particular hardware device. Activity can be monitored for different time periods.
Here are a few examples of the reports you can generate using the sar command with various options:
# sar -d SunOS attlis 10 sun4ru 07/21/06 13:46:28 device %busy avque r+w/s blks/s avwait avserv 13:46:58 sd01 6 1.6 3 5 13.8 23.7 sd04 93 2.1 2 4 467.8 444.0 13:47:28 sd04 13 1.3 4 8 10.8 32.3 sd05 100 3.1 2 5 857.4 404.1 13:47:58 sd04 17 .7 2 41 .6 48.1 sd09 100 4.4 2 6 1451.9 406.5 Average sd04 12 1.2 3 18 8.4 34.7 sd09 100 3.2 2 5 925.7 418.2
The information given by the preceding command shows disk activity for various hard disk devices. At given times, it shows the percentage of time each disk was busy, the average number of requests that are outstanding, the number of read and write transfers to the device per second, the number of blocks transferred per second, and the average time (in milliseconds) that transfer requests wait in the queue and take to be completed. The command
# sar -u SunOS attlis 10 sun4ru 07/21/06 10:02:07 %usr %sys %wio %idle 10:02 :27 82 18 0 0 10:02 :47 39 35 16 10 10:03 :07 7 28 16 50 10:03 :27 1 16 0 83 Average 32 24 8 36
shows the central processor unit utilization. It shows the percentage of time the CPU is in user mode (%usr), system mode (%sys), waiting for input/output completion (%wio), and idle (%idle) for a given time period. It is possible to run sar as a cron job to take snapshots of your system throughout the day You can store this information in a file to be viewed by the superuser. If you wish to get an accurate picture of system performance, this is a good way to do it.
You can use the ps command with the -ef options to see all the processes currently running on the system. You may want to do this if performance is very slow and you suspect either a runaway process or that particular users are using more than their share of the processor.
Following is an example of some of the processes you would typically see on a running system:
# ps -ef UID PID PPID C STIME TTY TIME COMD root 1 0 0 Oct 29 ? 14:47 /sbin/init root 213 1 0 Oct 29 ? 0:40 /usr/lib/saf/sac -t 300 root 3107 1 0 Nov 01 ? 0:04 /usr/lib/lp/lpsched root 103 1 0 Oct 29 ? 0:03 /usr/slan/lib/admdaemon root 113 1 0 Oct 29 ? 3:03 /usr/sbin/cron root 216 1 0 Oct 29 ? 0:04 /usr/lib/saf/ttymon -g -m Idterm -d /dev/contty -1 contty root 3157 1 0 Nov 01 console 0:03 /usr/lib/saf/ttymon -g -p Console Login: -m Idterm -d /dev/console -1 console root 217 1 0 Oct 29 ? 0:01 /usr/sbin/hdelogger root 221 213 0 Oct 29 ? 0:21 /usr/lib/saf/ttymon root 222 213 0 Oct 29 ? 0:19 /usr/lib/saf/ttymon mcn 4431 221 4 02:43:20 term/11 0:03 -sh mcn 4436 4431 32 02:43:57 term/11 11:54 testprog
If the system is very slow, you may want to check for runaway processes on your system. If you see a process that is consuming a great deal of CPU time, you may want to consider killing that process (see Chapter 11).
Caution | Do not kill processes without careful consideration. If you delete one of the important system processes by mistake, you may have to reboot your system to correct the problem. |
To kill the runaway process called testprog in the preceding example, you first need to know that you kill process ids (PIDs), not process names. Since the PID for testprog is 4436, typing
kill -9 4436
will terminate the process unconditionally
An innovative feature of UNIX in the early days of small machines was the concept of the sticky bit in file permissions. As originally implemented, if an executable file had the sticky bit set, the operating system would not delete the program text from memory when the last user process terminated. The program text would be available in memory when the next user of the file executed it. Consequently, the program did not need to be loaded, and execution was much faster. This was a useful feature, improving performance, in the days of small machines and expensive memory Today, however, with fast disk drives and cheap memory, using the sticky bit to keep a program in memory is obsolete, and most UNIX systems simply ignore it.
One feature of the sticky bit is important for system administration. Setting the sticky bit has important effects when it is set on a directory Using the sticky bit on directories provides some added security Some directories on the UNIX System must allow general read, write, and search permission, for example, tmp and spool A danger with this arrangement is that others could delete a user’s files. In most current UNIX versions, the sticky bit can be set for directories to prevent others from removing a user’s files. If the sticky bit is set on a directory, files in that directory can only be removed if one or more of the following conditions is true:
The user owns the file.
The user owns the directory
The user has write permission for the file.
The user is the superuser.
In order to set the sticky bit, you use the chmod command, like this:
# chmod 1753 progfile
or
# chmod +t progfile
In order to change the access permissions of a file, you must either own the file or be the superuser. To see if the sticky bit is set, use the ls -l command to check permissions. If you set the sticky bit of a file, a t will appear in the execute portion of the others permissions field, like this:
$ ls -l vi -rwxr-xr-t 5 bin bin 213824 Jul 1 2006 vi