The following figure shows a network that has several problems: Computers on the network can't access the Internet; the domain name system (DNS) server is down; some computers can't see the other computers on the network because they are getting their Internet Protocol (IP) addresses and DNS information from an unsanctioned Dynamic Host Configuration Protocol (DHCP) server; and a Mac OS X computer (in this case, the iMac) has some services like File Transfer Protocol (FTP) and remote login turned on. In this lesson, you'll learn how to troubleshoot these issues. Establishing a MethodologyThe following flowchart, which provides a framework for the network troubleshooting process, is a condensed version of the Apple General Troubleshooting Flowchart.
Gather InformationThe first step in this process is to gather information about the issue. You're trying to establish its exact nature by getting as much information as possible. For example, you may find that the symptom the end user is reporting has stopped the user but has nothing to do with the underlying issue. Initial reports may be misleading. "I can't connect to the Internet" is meaningless until you have more information. To ensure that you have the best possible understanding of the report, ask a mix of openended and yes/no questions. Keep in mind that your end user may not have an understanding of networking in general and almost certainly does not know your network architecture. The following questions are useful:
You should resist jumping to conclusions or making suggestions based only on the answer to one or two questions. While these suggestions might keep your users at bay for a short time or even temporarily cure the symptom, you still have not identified the cause. When you are gathering information, don't hesitate to request logs or System Profiler reports. You can also log in to the remote computer to view relevant log entries or run System Profiler remotely. Verify the IssueThe next step is to verify the issue. Ask yourself if you recognize the issue, log in to the remote computer, and try to reproduce it there. Walk your end user through the process and see if you can identify where the issue recurs. Use Apple Knowledge Base documents at www.apple.com/support as a reference. When you have completed the information-gathering and verification steps, you should have enough information to try a fix. Evaluate the nature of the issue: Is it local to this machine, specific to the network, or specific to a particular server? Note Fixing the issue may involve network configuration on servers that you do not control, so you'll want to discuss the issue with other system administrators in your organization. When you are ready to try a fix, start by isolating as much as possible. Eliminate possible sources. Narrow your scope from general topics ("the network is slow") to specifics ("the network is slow when browsing specific websites using specific machines"). Often the answer will reveal itself without your having to make major changes to the network. In any case, before making any changes, consult with your network architect or a senior system administrator to double-check your reasoning. Fix the IssueOnce you have established and verified the issue and have a solution in mind, apply the appropriate fix. Evaluate the fix to see whether it resolves the issue, and pay special attention to ensure that you have not introduced network instability or new issues for other end users. Give yourself a time frame for evaluating the results: In most cases, if the issue goes away for more than 24 hours, it is resolved. Finally, if you reach the point where you have evaluated several solutions and none of your fixes have worked, reevaluate your reasoning. If you can't find a flaw in your approach, or you don't find a fresh approach, escalate the issue to a senior system administrator or your network architect. Troubleshooting Network AccessWhen a computer cannot access other computers on the network, first check the physical connection. Many network problems stem from loose or incorrectly wired cables. To thoroughly check the physical connection between two machines, you may need to check a series of switches for activation lights. If the physical connection is active and you are working in a DHCP network environment, see whether the computer received a valid IP address and subnet mask from the DHCP server. Also check whether the computer can use Bonjour connections to servers. You can also use several command-line tools to troubleshoot connectivity, as illustrated in the following figure. Here are detailed descriptions of the command-line tools you can use to troubleshoot connectivity:
Note Some sites do not allow Internet Control Message Protocol (ICMP) traffic. This can hamper the troubleshooting effectiveness of ping and TRaceroute on those networks. Troubleshooting DNS and Domain NamesIf you determine that you have no problems accessing other computers on the network, but you cannot connect to hosts using their domain names, it is likely that the error lies with the domain name lookup process. The following figure illustrates that the problem likely lies with the DNS server. Make sure that the DNS server is properly set on the Network pane of System Preferences. Then use the following commands to figure out the issue:
Another useful tool to resolve names is lookupd d (used with options hostWithname: or hostWithInternet Address:). Lesson 11, "Planning and Deploying Directory Services," covers lookupd in more detail. Note You can also use the Lookup pane of the Network Utility tool to perform domain name lookups. Troubleshooting Network ServicesIf you are running services on your computer, and other computers are having difficulty reaching your machine, as shown in the following figure, you should ensure that the services are configured properly. Check the configuration files for each process; in this case, you would first examine the /System/Library/LaunchDaemons directory. Try to connect to these services locally from your own machine, such as ssh yourusername@127.0.0.1. If the service does not allow you to connect, then there is an issue with the service running locally on your computer. You can also use tools such as netstat, which allow you to see network statistics as well as the different sockets and ports that you have open on your machine. For example, the netstat an command displays the state of the ports that are currently being used. In the output, entries can contain one of the following keywords:
When looking at the output of the netstat an command, check for patterns that might indicate an issue. For example, if you notice that port 22 is being used by an unknown system when you know that only systems with certain IP addresses should be using the port, it is a sign of intrusion into your system. Also, if you notice that port 139 is closed when it's supposed to be open, it'll explain why Windows machines can't access your computer. One way to quickly check the status of ports is to use the following command: netstat an | grep LISTEN Another command to use to list processes listening for Internet connections is the following: sudo lsof i | grep LISTEN This command lists all open Internet files. Each entry lists the process that has opened the file and the port on which it's listening. Another thing to check when troubleshooting access to services is the firewall. Make sure that the firewall is not preventing other machines from connecting to you. You can use sudo ipfwlist to see firewall rules or use the Firewall pane of System Preferences' Sharing preference pane. Use grep to filter output to show rules based on whether they allow access or deny it: sudo ipfw list | grep allow sudo ipfw list | grep deny You should also check service-specific log files and run any service in question in the foreground or in debug mode. Most log files are found in either /var/log/ or /Library/Logs/. |