Not that anything ever goes wrong with your system, but when something does go wrong, there are some things you can try in order to figure out what has happened .
Troubleshooting problems in a Unix system is similar in many ways to troubleshooting on any system: You start by comparing the symptoms of the problem with the patient's medical history. When did the problem start? Oh, right after you installed the system-configuration files you were up all night editing? Hmmm. Maybe that's a clue to the problem . . .
The system log files (described earlier in this chapter) often have an error message related to the problem you are experiencing. Usually you won't understand the error message, but don't stop there. You can search the Web for information regarding the exact error message you are seeing.
To search the Web for an error message:
1. | Copy whatever seems to be the most descriptive part of the error message. |
2. | Use your favorite Web search engine to search for the error message. This usually means enclosing all the words in quotesfor example, "DNSAgent: dns_send_query_server - timeout" |
3. | Consider adding "Mac OS X" or "Darwin" as a separate search string. For example, using the Google search engine, "DNSAgent: dns_send_query_server - timeout" + "Mac OS X" limits the search to pages that contain both of the phrases enclosed in quotes. (We found five pages with that search.) |
If you are getting an error that includes the phrase "Permission denied " or something similar, it's a sign that you have a permission problem somewherea common problem in Unix. Permission problems crop up because a program might not be able to write to a directory or file it expects to, or because it might not be able to read a file and thus is missing some configuration information.
Tracking down permission problems, like much computer troubleshooting, requires that you think like the machine. Remember that in order to create a file, a process must have write permission for the directory containing the file (because the filename is an entry in the directory), while in order to change a file, you must have write permission on the file itself.
One quick thing to try is the "Repair Permissions" feature in the GUI application Disk Utility (located in the Utilities folder of the Applications folder). It will restore the permissions on many system files to their Apple-supplied defaults. Review Chapter 8 for details on permissions.
Another problem you are likely to run into sooner or later is when a disk fills up.
If you see an error message that says, "Write Error: No space left on device," you've filled up a disk volume; that is, you've used up all the available storage space. (Note that in Unix documentation the terms volume and partition are often used interchangeablyfor example, in the man pages for df and diskutil .)
Although this doesn't happen every day (hopefully!), the consequences can be pretty harsh : Some programs may simply stop working. For example, a mail server cannot save incoming mail if there is no disk space left.
You can quickly see if you are running out of disk space by using the df command (described in "To see a summary of disk usage for the entire system," earlier in this chapter). Figure 11.39 shows an example of a machine with two disks, one of which has two volumes (also called partitions). In the example, volume s9 on disk0 is almost full.
If you see that any of the regular volumes are over 90 percent capacity, it's time to start worrying. By "regular volumes," we mean the ones where the filesystem column in df starts with /dev/disk . Remember that df displays information about various pseudovolumes that always show up at 100 percent capacityfor example, the fdesc ( file descriptor ) filesystem, which is used to keep track of open files.
This is possible because the operating system keeps a small amount of disk space in reserve to reduce the chance of a volume's filling up. If you see that a disk volume is at 101 percent capacity, then you have a problem now .
Basic steps to free up space on a disk volume:
On almost all other Unix systems, adding a disk is more complicated, but you can mount new disks on any directory. For example, you could mount a new disk on /Users (though you would still have to copy the old contents onto the new disk).
To clear out users' Trash for them:
1. | Find each user 's Trash directory. Each user has a .trash directory in his or her home directory. If the volume that's filling up is the one that holds users' home directories, go into each user's home directory and delete his or her .trash directory (it is re-created when the user needs it). You'll need to do the next step for each user. |
2. | sudo rm -rf ~ username /.TRash That removes an entire .trash directory. (In case you are wondering, the .trash directory will be re-created when needed; also, using rm -rf ~ username /.trash/* will remove the contents only, but will miss deleting dot-files at the top level of the .trash directory.) If you use the Finder to trash a file that is on a different volume than your home directory, then instead of going into ~/.trash , that file goes into a different Trash directory. There are directories at the root level of the directory on which each volume is mounted. Huh, you say? Here is an example: Let's say you have three disk volumes. Perhaps you have two disks, and one of them has two partitions, so you have a total of three disk volumes. Your df output might look like that shown in Figure 11.39. (Note the use of the -lk options to show only local volumes, and the sizes in kilobytes. Note too that df will show only volumes, not the Trash files themselves .) In that case, there are three directories, each called .TRashes : /.Trashes /Volumes/partition2/.Trashes /Volumes/flamepit/.Trashes Each of the directories has subdirectories for each user ID that has trashed files from that volume. So if user puffball has uid 502 , then there might be /Users/puffball/.Trash /Volumes/partition2/.Trashes/502 /Volumes/flamepit/.Trashes/502 You want to remove the .trashes directory from the critical volume (don't worry, it will be re-created when needed). |
3. | sudo rm -rf / volume-in-trouble /. Trashes To remove the .trashes directory for the volume mounted on /Volumes/partition2 : sudo rm -rf /Volumes/partition2/. Trashes |
Going Over 100 PercentOn some Unix systems, df may show you that a filesystem (as it does with a volume) is over 100 percent capacity. On those systems, the "used" and "available" columns in df add up to something less than the column showing the total capacity. On Mac OS X the numbers add up exactly. |
If you have one volume that is filling up and another with more space (perhaps you've added a second disk), you can move directories from the full volume to the spacious one and replace the original directory with a symbolic link.
To move directories from one disk to another while keeping the original path:
1. | Use ditto to copy the directory. For example, if you want to move the /Users directory to the volume mounted on /Volumes/partition2 , you can use sudo ditto -rsrc /Users /Volumes/partition2/Users |
2. | mv olddir olddir .save For example: mv /Users /Users.save You'll delete it later after making sure that everything is OK. |
3. | Create a symbolic link where the old directory was, pointing to the new directory. For example: ln -s /Volumes/partition2/Users /Users So anything that accesses /Users still works. |
4. | When everything seems OK, delete the old directory. For example: rm -rf /Users.save |
In the unlikely and scary event that your machine won't completely boot up, you may still be able to get things working againassuming that the machine can at least begin the boot process.
If you are able to boot into single-user mode, then you can attempt to repair file system damage using the fsck ( file system check ) command.
To watch all the system-startup messages:
In most cases your disks will be using a journaling file system , which makes data corruption extremely unlikely. You can see a list of your volumes with
diskutil list
and see what kind of file system a volume has with
diskutil info volume
For example:
diskutil list /dev/disk0s2
If the output includes Journaled HFS+ , then the volume has a journaled file system and the following tasks will probably have no effect.
This next task is mainly , useful only if your disk(s) are not formatted with a journaling file system, and should be considered a last resort. Do not attempt it unless you are willing to risk losing data (maybe even all the data on your disk) and you have tried all other available approaches.
To check and repair the file system with fsck:
1. | Boot into single-user mode. You do this by holding down both and while the machine starts up. If the system isn't too badly messed up, you end up at a prompt like this: localhost# Your next move is to try to check and repair the file system. | |
2. | /sbin/fsck -fy This is basically the command-line version of the repair feature in Disk Utility. See the man page for fsck to learn more about that command. When you get back to a prompt, you can try mounting the root volume and booting the machine. When you get back to the prompt, run it again to make sure that the repairs were effective.
| |
3. | /sbin/fsck -fy If you get a message saying that your disk "appears to be OK," then fsck worked.
| |
4. | reboot Hopefully, the machine starts up and all is well. |
Tip
In /Library/Logs you may find log files that have some record of things that have gone wrong. Look for files with names such as panic.log and a CrashReporter subdirectory. There may also be CrashReporter subdirectories in any users' own Library/Logs directoriesthat is, /Users/ user name /Library/Logs/CrashReporter/ .