A network user named Alice calls the network help desk and reports that she has been trying to access a particular Web site for several hours and is consistently receiving an error message.
This is a common occurrence for all Internet users, because all Internet resources are prone to occasional and sometimes frequent outages. However, it's also possible that this is an indication of a problem with the caller's computer or with the internal network. Based on the information provided in the scenario, and knowing nothing about Alice's level of expertise, the help desk technician has no way of knowing whether the problem is being caused by user error, a computer configuration problem, a faulty network connection, a malfunction of the router providing the Internet access, or even some issue with the Internet or the specific Web site itself—either of which is beyond the local network's sphere of influence.
The first step for any technical support call is for the help desk staff to begin to document the incident. Help desks for many organizations use software that enables technicians to document calls and store them in a database. Help desk software typically makes it possible to assign a priority to each call; escalate calls to senior technicians, if necessary; list all of the information obtained from the caller; and document the steps taken to solve the problem.
Because the technician has only the most rudimentary information about Alice's problem at this point, it isn't possible for him or her to accurately assign a priority to this call. If the problem turns out to be with the router or the network and a large number of users are affected, it could be very serious indeed, especially if the organization relies on its Internet access for vital business communications. If, for example, the organization is a company that sells products over the Web, and the Web servers are located on site, an Internet connection failure means that the Web site is down and no orders are coming in. In a case like this, the call might be assigned the highest possible priority. If, on the other hand, revenue-producing work can go on without Internet access, the priority of the call would be somewhat lower. If the problem lies in Alice's computer or in her procedures, the priority of the call would be much lower, unless of course Alice is the company president. It might seem as though political considerations should not affect the priority assigned to a technical support call, but they invariably do, so you had better learn to live with it.
Many technical support operations separate their technicians into two or more tiers, depending on their expertise and experience. First-tier technicians typically take help desk calls, and if the problem is determined to be serious or complex enough, the first-tier technician escalates the call to the second tier. In a well-organized technical support team, the circumstances in which calls are escalated are explicitly documented. For example, problems involving user error and individual workstations might remain in the first tier, whereas network outages and problems affecting multiple users might be immediately escalated. Escalation should also occur when a technician in the first tier makes several earnest attempts to resolve the problem and is unable to do so. Of course, the escalation process is also likely to be affected by political concerns, just like the assignment of priorities. The purpose of this hierarchical arrangement is to prevent the organization's more experienced (and presumably more highly paid) technicians from spending their time fielding calls about elementary problems.
In this particular scenario, and in most others as well, the next step in the trouble-shooting process is for the technician to ask the user about the exact circumstances under which the problem occurred. Until more information is available, it's impossible to assign a priority to the call or determine if it should be escalated.
When asked to describe what she was doing when the error occurred, Alice says that she has been trying to open a Web site in Microsoft Internet Explorer, one that had always worked before, and after a few seconds she received an error message. She tried again several times over the course of an hour, and received the same error message every time. Alice had not written down the error message at the time, but she was able to re-create the error at will by trying again to access the site. The error message was the familiar "This Page Cannot Be Displayed" screen, shown in Figure 19.1, which also says "Cannot Find Server or DNS Error."
Figure 19.1 A common Internet Explorer error message
This error message is a common one that every user of Internet Explorer has likely seen at one time. This message can appear for many reasons: because the Web server the browser is trying to contact is down, because the client computer's Internet connection is broken, or because the client's Domain Name System (DNS) server fails to resolve the DNS name in the requested Uniform Resource Locator (URL). Determining the cause of the problem is a matter of isolating the component or components that are malfunctioning, which you do by eliminating all of the properly functioning components until you are left with only the problematic ones.
Difficulty in accessing the Internet is one of the most common problems handled by the help desk in almost any organization with a network that provides routed access to the Internet. For an organization with more than a handful of users, setting up a router that connects to an Internet service provider (ISP), as shown in Figure 19.2, is the easiest and most economical way of providing users with Internet access. The alternative is to equip all users with their own modems, telephone lines, and Internet access accounts, which is not only expensive, but requires the network support staff to install the modem and configure the operating system's dial-out capability with the right parameters on each computer. Depending on the size of the organization and the needs of the users, the router could be a stand-alone unit connected to an ISP using a leased telephone line, such as a T1 line; a computer with a modem that connects to the ISP using a standard dial-up connection and is configured to share that connection with network users; or any one of many solutions falling between these two extremes.
Figure 19.2 Most networks provide users with Internet access by sharing a router's connection to an ISP
There are a number of things that can go wrong with this type of routed Internet access solution, including the following:
Generally speaking, a router problem like this is one of the least likely causes of the problem Alice is experiencing. In addition, if the router were malfunctioning, the help desk would probably be receiving calls from many different users with the same problem. However, router problems are one of the easiest causes to check for, and the potential seriousness of a router problem makes it a high priority for the technical support staff. Therefore, it does no harm for the technician to eliminate the router as a possible source of the problem at the very beginning of the troubleshooting process.
The easiest way for the technician to test the router is to try to access an Internet site using a computer that shares the same routed Internet connection. In Alice's organization, all of the users on the network share a single Internet connection, so the technician simply has to launch his or her own Web browser and connect to an Internet site to determine that the connection and the router are indeed functioning properly. This narrows down the source of the problem to Alice's procedures, her computer, or her computer's connection to the router.
If the technician's computer also fails to access the Internet, the problem could lie in any one of three areas of the network:
In some cases, network users access Internet Web sites through a proxy server or other device that functions as a "middle man" between the client and the Web server. This introduces another possible source of the problem the user is experiencing. However, if the technician or other users can access the Internet through the same server, you know that it, along with the router and the ISP connection, is functioning properly.
If none of these is the cause of the problem, the difficulty lies in the ISP's network or in the Internet itself. The problem might clear up by itself in a few minutes or hours, but if Internet access is essential to the business, the ISP should be contacted right away. Dealing with the ISP might be the responsibility of a senior technical support representative, so it's likely that the call would be escalated, if this were found to be the problem.
In Alice's case, the technician determines that the router is functioning normally because he can connect to an Internet site using his own browser.
The next step in narrowing down the cause of Alice's problem is to determine exactly what kinds of network communications are affected. This procedure should methodically test the entire data connection from Alice's computer to the Internet and, when a failure occurs, should trace backward, component by component, until the source of the problem is detected.
As a help desk technician, you should begin this process while you are still on the telephone with the user. First, ask the user to try connecting to a different Web site. Using one of the default links supplied with the browser is a good idea because these sites are nearly always in operation, and you minimize the possibility of user error. If you must have the user type in a Web site address, dictate the exact URL to the user, and keep it simple, such as www.microsoft.com. If the browser can connect to other Internet Web sites, you know that the network, the router, and the Internet connection are functioning properly. In this case, the problem can nearly always be traced to either a Web site that is down or user error. If the user's Web browser is unable to connect to any other Internet sites, you should then determine if any other network communications are possible.
Next, ask the user to open a different client application and try to connect to the Internet. The application you select doesn't matter, as long as it connects directly to an Internet site. For example, an e-mail client or a newsreader is a good choice, as long as the user would not be connecting to a mail or news server on the local network. As a last resort, you can always have the user launch the File Transfer Protocol (FTP) client from the command line. Virtually every operating system that supports Transmission Control Protocol/Internet Protocol (TCP/IP) includes an FTP client, but you might have to walk the user carefully through the process of connecting to an FTP server.
If the user cannot use a Web browser to access Internet sites but can connect to the Internet using a different client application, you know that the problem lies in the browser software running on the user's computer. If the user can't connect to the Internet at all using any client application (and other users can), the next step is to determine which part of the computer's Internet access architecture is failing.
One of the most common causes of Internet access problems (and of the error message that Alice received) is the failure of the user's computer to resolve DNS names into the Internet Protocol (IP) addresses that client applications need to communicate with Internet servers. DNS servers are a vital part of any Internet communication that uses a name to refer to an Internet server. IP communications are based solely on IP addresses, not names, so the first thing that a client application does when given a name of a computer, such as www.microsoft.com, is send the name to a DNS server for resolution. When you type the name of a server into your Web browser, part of the brief delay that you experience before the Web page starts loading is the result of the time it takes for the client application to generate a DNS Request message containing the server name, send it to a DNS server, and wait for a reply from the DNS server containing the IP addresses associated with the name. Only then can the client transmit its first Hypertext Transfer Protocol (HTTP) message to the Web server.
The address of the DNS server that a computer uses to resolve names is supplied as part of the system's TCP/IP client configuration. On a computer running Microsoft Windows 2000, for example, the DNS server address is found in the Internet Protocol (TCP/IP) Properties dialog box, shown in Figure 19.3. If the addresses in the Preferred DNS Server and Alternate DNS Server fields in this dialog box do not point to DNS servers that are up and running, the name resolution process will fail when the user attempts to connect to a Web server, resulting in the error message shown earlier.
Figure 19.3 The Windows 2000 Internet Protocol (TCP/IP) Properties dialog box
To configure the DNS server addresses on a computer running Windows 2000, open the Network And Dial-Up Connections window from the Start menu's Settings group, right-click the Local Area Connection icon, and select Properties from the shortcut menu. Highlight the Internet Protocol (TCP/IP) entry in the components list and click Properties to display the Internet Protocol (TCP/IP) Properties dialog box. The other Windows operating systems use a similar arrangement of dialog boxes, although the access procedures are slightly different.
The easiest way to test for a DNS name resolution problem is to use an IP address instead of a server name in the URL you supply to the Web browser. For example, when the user's browser fails to connect to a Web server using its name, but other computers are able to access the Internet, use the Ping program on another computer to resolve the name of the desired server into an IP address, using a command like the following:
ping servername
This command first displays the server's name followed by the server's IP address, then displays the results of the attempt to communicate with that server. When the attempt is successful, the program lists each of the replies received from the server, with information such as the number of data bytes included in the message, the time elapsed between the transmission of the request and the receipt of the reply, and the Time To Live (TTL) value for the transmission. On a computer running Windows 2000, the Ping output appears as follows:
Pinging www.microsoft.com [38.144.95.172] with 32 bytes of data: Reply from 38.144.95.172: bytes=32 time=320ms TTL=238
Reply from 38.144.95.172: bytes=32 time=280ms TTL=238
Reply from 38.144.95.172: bytes=32 time=381ms TTL=238
Reply from 38.144.95.172: bytes=32 time=280ms TTL=238
Ping statistics for 38.144.95.172:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 280ms, Maximum = 381ms, Average = 315ms
If the ping command fails to resolve the name (perhaps because one of the network's DNS servers is not available), you can use the nslookup command to send a name resolution request to a particular DNS server, on the local network or the Internet, that you know is operational, as demonstrated in Lesson 2: TCP/IP Utilities, in Chapter 10, "TCP/IP Applications."
Have the user replace the server name in the browser's URL with the IP address you've discovered. If the browser succeeds in connecting to the server using an IP address when using a server name failed, there is definitely a problem with the DNS name resolution process.
DNS name resolution problems have two major causes: either the computer's TCP/IP client is configured with incorrect DNS server addresses or the DNS servers themselves are not functioning properly. One easy way to check the addresses of the DNS servers on a computer running a Windows operating system is to use the IPCONFIG.EXE program (for Windows 2000 or Microsoft Windows NT) or the WINIPCFG.EXE program (for Microsoft Windows 95, Microsoft Windows 98, and Microsoft Windows Me) to display the TCP/IP configuration. For more information about using these programs, see Lesson 2: TCP/IP Utilities, in Chapter 10, "TCP/IP Applications." If the addresses are incorrect, they must be changed, using the Internet Protocol (TCP/IP) Properties dialog box shown earlier.
The user can conceivably perform all of the tests described thus far with instruction from the help desk technician over the telephone. However, modifying the computer's TCP/IP configuration might be a task that the technician should perform in person. Depending on the user's location and computing skills and the organization's technical support policies, the technician might decide to travel to the user's site and personally perform the tests on the computer.
How the DNS server addresses got changed, if the computer was previously functioning properly, might remain a mystery. When users are asked if they've changed anything in their computer's configuration recently, those who have been messing around with settings they don't understand invariably answer "No." However, if your network uses Dynamic Host Configuration Protocol (DHCP) servers to configure its TCP/IP clients automatically, you should definitely check the DHCP server configuration to see if it is supplying incorrect addresses to the network clients. If this is the case, do not manually change the DNS server configuration in the user's computer, but rather correct the DHCP server's configuration instead. After you have done this, you can repair the user's computer by renewing the DHCP lease using the IPCONFIG.EXE or WINIPCFG.EXE program.
If the DNS server addresses in the user's TCP/IP client configuration are correct, the problem might lie in the DNS servers themselves or in the computer's network connection to the DNS servers. The DNS servers that a network uses for Internet name resolution might be supplied by the organization's ISP, or they might be located on site. If the DNS servers belong to the ISP, all you can do is test to see if they are available. If you can contact the DNS servers using the Ping command with an IP address, you know that they are at least up and running. However, this does not necessarily mean that they are capable of processing DNS Request messages. Nonetheless, if you can execute a Ping command using a server name successfully, you've proven that the DNS server can resolve the server's name into its IP address.
If the DNS servers belong to your organization, you can check them more thoroughly. However, this is another area in which the first-tier technician might be obligated to escalate the call to a senior technician. A Ping test can determine that the DNS server is functioning, but checking the status of the DNS server software itself depends on the operating system and the application software running on the computer. On a Windows 2000 Server computer running Microsoft DNS Server, for example, you can start by opening the Services console from the Start menu's Administrative Tools group and checking to see that the DNS Server service is running, as shown in Figure 19.4.
Figure 19.4 The Windows 2000 Services console
If the service isn't running, you must find out why. The Startup Type field for the DNS Server service should be set to Automatic, indicating that the service loads when the computer starts. If the Startup Type field is set to Manual or Disabled, this is the reason the service isn't running. However, before you manually start the service or change the Startup Type setting to Automatic, check with your colleagues to see if someone has configured it this way for a good reason. If the Startup Type is set to Automatic but the service isn't running, someone manually stopped it, the service failed to start, or the service shut itself down.
Check the computer's Event Viewer (also accessible from the Administrative Tools group) for log entries that might explain why the service isn't running. Failure of the service to start during boot time should generate a log entry indicating the reason. Various types of environmental problems could cause the service to shut down, including a memory shortage or a configuration problem. Troubleshooting issues like these requires knowledge of the operating system and the DNS Server software.
If the DNS Server service is running but names are still not being resolved, it's time to look at the server software and the DNS communications process in more detail. Examining the DNS server's configuration files is a good place to start. For example, if the server's list containing the names and addresses of the DNS root name servers has somehow been modified or erased, this would prevent names from being resolved, despite everything else functioning correctly. The DNS server's own network connection and Internet access are also vital to the name resolution process. The server itself might be functioning properly, but if network conditions prevent it from receiving DNS Request messages from the client or if it can't access the Internet to relay the requests to other DNS servers, the name resolution process stops.
If the DNS server's configuration files show no obvious problems, you might have to go so far as to use a protocol analyzer to determine if the DNS server is communicating with the network and the Internet properly, by examining the network traffic running to and from the DNS server computer. A protocol analyzer is a hardware or software program that captures network traffic and displays it for study, as described in Lesson 2: Logs and Indicators, in Chapter 18, "Network Troubleshooting Tools."
Using the protocol analyzer, you should be able to see the DNS Request packets arriving at the server, and the server's own DNS Requests being transmitted to other DNS servers on the Internet, as shown in Figure 19.5. Analyzing network traffic in this way requires familiarity with what is known as a baseline. In other words, you have to know what the network traffic pattern is supposed to look like before you can determine what's wrong. By analyzing the traffic traveling to and from the server, you might be able to isolate the problem as being in the server's communications with the local network or in its communications with the Internet.
Figure 19.5 A captured DNS traffic exchange, as displayed in a protocol analyzer
The procedures for diagnosing and repairing DNS name resolution problems described here are also useful in other scenarios. Computers running the Windows operating system, for example, might use the Windows Internet Naming Service (WINS) to resolve NetBIOS names into IP addresses, just as they use DNS servers to resolve DNS names. The same type of client and server configuration problems affecting DNS name resolution can also affect the WINS name resolution process. You can check the addresses of the WINS servers in the client computer's TCP/IP configuration and the functionality of the WINS servers in much the same way you check the equivalent DNS resources.
If the user's problem is not being caused by an Internet communications problem or a DNS name resolution problem, it's time to start examining the computer's general network communication capabilities. The technician begins by having the user try to access resources on the local network. Local network resources can include shared server drives, internal network applications (such as e-mail or database servers), and browsing the network using a tool like Windows Explorer. The best way to start is by having the user try to access nearby resources.
The first test might be for the user to open My Network Places in Windows Explorer and see if computers belonging to other nearby users are visible. The assumption here is that other computers nearby are connected to the same network hub as the user experiencing the problem. If there is an internal network communications difficulty, the object is to narrow down where it might be.
Information about which computers are connected to specific hubs and LANs should be available to the help desk technician, preferably in the form of a map or diagram that shows the cables and connection devices that make up the network. This resource should be developed during the initial planning stages of the network and maintained consistently throughout its life. Relying on someone's memory of the network installation makes the technical support process far more difficult, especially as people leave the company or move on to other jobs. It's also important for the technician to remember that users probably do not have access to this type of network information and wouldn't know what to do with it if they did.
Windows Explorer displays the computers on the network in terms of domains and workgroups, which probably don't correspond to the hubs and LANs that form the network's physical configuration. If the user and the technician are still working together over the telephone at this point, many of the instructions the user is receiving won't make much sense, so it's important for the technician to explain carefully what must be done, without introducing unnecessary technical details. This is another case where the technician might consider traveling to the user's site, if it is at all practical to do so.
Using My Network Places, if the user can't see the other computers connected to the same hub, the problem is likely to be in the user's connection to the hub, in the computer hardware or software, or in the user's procedures. In some cases, testing the computer's connection to the hub can be quite easy. If the computer is connected to the hub using a prefabricated network cable, you can try replacing the cable with one that you know is functioning properly. If the computer is connected to the hub using an internal cable run, begin by switching the network cable plugged into the user's computer with a cable from a nearby computer that is working properly. If the user's computer can now access the network, you know that the problem is somewhere in the original cable run, and you can start trying to determine exactly where the problem is.
Internal cable installations use three lengths of cable per connection: a patch cable connecting the computer to the wall plate, the cable inside the walls or ceilings running from the wall plate to the patch panel, and another patch cable connecting the patch panel port to a hub port. Because the patch cables are exposed, it's easy to test them first by replacing them. For more information about internal cable installations, see Chapter 15, "Installing a Network."
Begin by swapping out the patch cables at both ends of the connection with replacements that you know are working properly. If the patch cables are not the cause of the problem, you can proceed to test the internal cable run. If you have the proper cable testing equipment handy, you can test the cable run that way. A multifunction cable tester, a wire map tester, or even an inexpensive tone generator and locator can tell you if the cable is wired properly and signals are getting through.
If there is a break in the cable, the multifunction tester can also tell you where it is in relation to the end you're testing from. If you don't have cable-testing equipment, you can plug the patch cables at both ends into a different cable run that you know is working properly. Swapping out equipment wherever possible is one of the most basic and most effective troubleshooting techniques.
For more information about cable testing equipment, see Lesson 3: Network Testing and Monitoring Tools, in Chapter 18, "Network Troubleshooting Tools."
Problems with internal cable runs don't usually happen by themselves. Usually they're the result of someone working in the spaces where the cables are located and accidentally damaging one of the cables. In fact, just moving a cable inside a drop ceiling closer to a fluorescent light fixture can be enough to induce communication problems on that connection. Therefore it is strongly recommended that you secure your cables well when installing them, even when they're running through relatively inaccessible areas, such as walls and ceilings.
If the user's computer can see and access other computers connected to the same hub, the next step is to try to access other computers on the same LAN that are connected to different hubs. If the user can access computers attached to the same hub, but can't access the other computers on the LAN connected to different hubs, the problem might be in the connection between the user's hub and the rest of the network. What to check next depends on the physical configuration of the network. If, for example, the user's hub is connected to another hub, that connection might not be functioning properly for several reasons, such as the following:
The same problems can affect a switch.
If the user can access other computers on other segments of the LAN, it's time to test connections to other LANs. This assumes that the organization's network is really an internetwork that consists of multiple LANs connected by routers. Once again, a technician can test the computer's connectivity simply by using Windows Explorer to access computers that are located on other networks. If the user's computer can access resources in all of the LANs that make up the organization's internetwork, the problem is not one of network connectivity, and it's time to look at the computer itself.
If the user's computer can access resources in some LANs but not others, the problem might be in one of the routers that connect the networks together. The difficulty of locating the malfunction depends on how complicated the internetwork configuration is. If the network consists of 30 LANs interconnected by dozens of routers with redundant access paths, finding one malfunctioning router can be a complicated process, one that almost certainly has to be attended to by the technicians at the top of the organization's technical support hierarchy.
One method for isolating the router causing the user's problem is to use the Traceroute program to see exactly where the packets generated by the computer are going. Traceroute is a TCP/IP command-line utility that transmits packets to a given destination and displays a list of the routers that the packets pass through on the way to that destination. Most TCP/IP implementations include a version of Traceroute; on computers running the Windows operating system, the program is called TRACERT.EXE. Run Traceroute with the name of the Web server the user is trying to reach. A display similar to the one shown here will indicate exactly how far the packets are going through the local internetwork:
Tracing route to www.abccorp.co.uk [173.146.1.1]
over a maximum of 30 hops:
1 <10 ms 1 ms <10 ms 192.168.6.1
2 1 ms 1 ms <10 ms 192.168.10.1
3 1 ms <10 ms <10 ms 192.168.17.1
When the packets reach a router that is malfunctioning, the program should stop displaying information. In other words, the last router listed in the Traceroute display should be that of the last properly functioning router in the path to the destination. With knowledge of your network's configuration, you should be able to figure out which router the packets are trying to go to next. This is the router that either isn't receiving the packets or isn't forwarding them properly, causing the user's communication failure.
Suppose, for example, that your network consists of a number of LANs containing user computers, all of which are connected to a single backbone LAN, as shown in Figure 19.6.
Figure 19.6 Routers provide communications between LANs; a router failure can be inconvenient or catastrophic
One of the user LANs also contains the router that connects the network to the Internet. Any of the following scenarios could cause the problem that Alice is experiencing. All of these scenarios are likely to cause more than one call to the help desk, with the last one probably causing a flood of complaints:
Sometimes router failure is a less likely cause of communication problems because of the configuration of the internetwork. The internetwork in this example has only one path between each pair of LANs. To guard against the outages caused by router failures, many internetworks are designed with redundant routers and backbones, in which case there would have to be two major failures at the same time to cause any of the three preceding problem scenarios.
For a single user help call like Alice's, a diagnosis of router failure is comparatively rare. It's far more likely for a problem like Alice's to be caused by a procedural error, a configuration error in her computer, or possibly a minor network problem. A router failure would probably result in a more general network failure that would cause a large number of simultaneous complaints, which would immediately be brought to the attention of the network's senior support staff, and not left to the help desk. When the network administrators are aware of the problem, the role of the first-tier technician is to inform users that they know of the problem and that a fix is forthcoming. There is no need to troubleshoot each call when they all have the same cause.
If the user's computer can't access the network in any way, and troubleshooting has determined that neither the network nor the computer's cable connecting it to the network is at fault, it's time to look at the computer itself. Although it might seem that it has been a long journey to this point, a problem that prevents any network access would eliminate the hub and router troubleshooting processes described in the previous sections. The technician might even proceed to this point as soon as he or she determines that no network communication is possible.
Unless the user is familiar with the configuration interface of the operating system, it's generally preferable for the technician to troubleshoot the computer in person. This eliminates the difficulties than can arise from giving instructions over the telephone.
If the user's problem is determined to be in the computer, the difficulty can exist at almost any level, and it's a good idea to use the Open Systems Interconnection (OSI) reference model to list the various possible causes, as explained in the following sections.
If it has been determined that the cable used to connect the computer to the network is functioning properly, the problem could be in the computer's network interface adapter itself. One common cause of communication problems is the network interface card (NIC) being loose in its bus slot. If the card is not installed firmly into the slot and secured in place with a screw or other device, a tug on the network cable can loosen the card and break the connection between the NIC and the computer. If the NIC is completely disconnected, most operating systems report that the device is not functioning. The Device Manager application in most versions of the Windows operating system can report when a device is or is not functioning properly, for example, as shown in Figure 19.7. However, if the NIC is only slightly loosened and not pulled completely out of the slot, the problem could be intermittent and especially difficult to detect.
Figure 19.7 The Windows 2000 Device Manager displays information about the network interface adapter and other hardware devices
The network interface adapter could also be physically damaged by a power surge, static electricity, or a manufacturing defect. If the adapter's cable connector is damaged, the contacts in the cable plug might not connect properly to the contacts in the adapter's jack. Cases like this are difficult to detect, except by ruling out all other possible causes of the problem. The solution is nearly always to replace the network interface adapter, but technicians rarely do this until they have checked the configuration of the computer's networking software. If the network interface adapter comes with a diagnostic program, however, and you have a loopback connector available, you can test the adapter without having to open up the computer.
For more information about network adapter loopback testing, see Lesson 3: Network Testing and Monitoring Tools, in Chapter 18, "Network Troubleshooting Tools."
Apart from the network interface adapter itself, the network interface adapter device driver implements the data-link layer protocol in the computer. The driver must be configured with the same hardware settings as the network interface adapter so that the two can communicate. Incorrect configuration settings are a common reason a computer cannot communicate with the network, but this generally does not occur in a computer that has been functioning properly unless someone manually changes the configuration settings or a device installation affects them.
When something used to work but now doesn't work, the technician should ask the user what has changed on the computer. Has the user installed any new hardware or software? Has the user changed any configuration settings? The answer from the user is usually "No," however, even when it becomes increasingly obvious that something has changed.
In most cases, the hardware settings of both the network interface adapter and the network interface adapter driver are configurable. You generally configure the adapter driver using an interface provided by the operating system, like that shown in Figure 19.8.
Figure 19.8 The Properties dialog box for a network interface adapter driver
To manually configure the adapter, you typically have to use a special utility that the manufacturer supplies. Today, most network interface adapters are installed using Plug and Play, which automatically configures both the adapter and its driver to use the same settings. The settings chosen are based on an evaluation of the hardware requirements for all of the devices in the computer, so installing a new piece of hardware into the computer can cause Plug and Play to alter the settings of existing devices. It isn't common, but it is possible for Plug and Play to select hardware settings that cause either the adapter or its driver to malfunction. If you determine that some new hardware device has been installed, you might have to disable it or remove it to determine if it is the cause of the network adapter's configuration problem. If this is the case, you might have to manually configure the new device to use it in the computer.
If the configuration of the adapter or driver parameters have been manually changed (presumably accidentally), the best course of action is to delete the device from the system configuration (again using Device Manager in Windows 2000), restart the computer, and let Plug and Play detect the adapter and reinstall it, reconfiguring both the adapter and the driver in the process.
Although they span other layers as well, the primary functions of the TCP/IP protocols are at the network and transport layers, and the TCP/IP client configuration is one of the chief causes of network communication problems. As mentioned earlier, improperly configured DNS server addresses can prevent the computer from resolving server names into addresses and, as a result, prevent the user from accessing the Internet. WINS servers perform the same type of name resolution process for NetBIOS names, and incorrect WINS server addresses can prevent the computer from accessing some of the other computers on the network. A computer running the Windows operating system that is not configured with WINS server addresses can still resolve the name of other computers on its own LAN using broadcast messages. However, broadcasts cannot reach the computers on other LANs, so WINS is needed to resolve these names.
WINS support is included in Windows 2000 only to enable the computer to communicate with other computers using NetBIOS names, such as Windows NT and Windows 98 systems. Windows 2000 uses its directory service, Active Directory service, which relies on DNS servers to resolve names.
Incorrect DNS and WINS server addresses can prevent a computer from accessing other computers by name, but other TCP/IP configuration parameters can have an even greater effect on network communications. An incorrect IP address or subnet mask can completely prevent all network communications, and—even worse—an IP address duplicated on a second computer can prevent both from accessing the network. Therefore, an interruption can occur if the IP address on the user's computer has been changed or if a computer somewhere else on the network has been configured to use the same IP address as the user.
To test for a duplicate IP address, shut down the user's computer and ping that computer's IP address using another workstation. If you receive a response to the Ping command, there is another computer using that same IP address.
An incorrect or missing default gateway parameter can also be the cause of the user's problem. As with the router failures described earlier, a computer that is not configured with a correct default gateway address can access the other computers on its own LAN, but not any of the other LANs on the internetwork. Without a default gateway address, the computer does not know where to send packets that are destined for other networks. This would prevent the user's Web browser from connecting to any sites on the Internet. In Windows 2000, to modify any of the TCP/IP configuration parameters listed here, use the Internet Protocol (TCP/IP) Properties dialog box as described earlier in this chapter.
If the network has DHCP servers that configure the network's TCP/IP clients, none of the fields in the Internet Protocol (TCP/IP) Properties dialog box should have values in them. Manually configured TCP/IP parameters take precedence over those supplied by DHCP. If someone has been "experimenting" by supplying their own TCP/IP values, remove them before reactivating the DHCP client.
It's also important for the technician to know what allocation mode the DHCP servers are using. If they're using automatic allocation, which assigns the IP address to clients permanently, moving the computer to a different subnet requires that you manually release the assigned IP address and renew it so that the DHCP server can assign one from the proper subnet. This is another way for the computer to have an incorrect IP address. If you move computers around on the network frequently, consider using dynamic allocation, which leases addresses to computers for a short period of time and renews them each time the computer starts.
Application layer networking protocols are generally not configurable, but there can be problems at the application level that affect network communications. One issue that is best to get out of the way early is the possibility of a virus infection. It isn't likely that a virus could be the cause of the user's failure to access a Web site, but new viruses that can have unpredictable effects on a computer are constantly being invented. If you do not already have antivirus software installed on the computer, you should install it, make sure the virus signatures are updated, and run a complete system scan, just to be safe.
Although it doesn't affect Internet access directly, having the incorrect network client installed on a computer can also cause network communication problems. For computers running the Windows operating system, the Client for Microsoft Networks module provides the redirector that enables the computer to send resource access requests to other computers running the Windows operating system. If this component is removed, there is a break in the protocol stack and network communication ceases.
Applications themselves can be damaged or improperly configured as well, interfering with network communications. If, for example, Alice were to modify the configuration of her browser, causing it to access the Internet by dialing out to an ISP instead of using the LAN, she would be unable to access any Web sites if a modem was not installed or a dial-up account was not properly configured. This problem would be specific to the browser, however, and would be caught when the technician had Alice try to use another application to access the Internet.
Errors in user procedures are one of the most common causes of help desk calls, and listing this possible cause last does not imply that you should go through all of the testing procedures described thus far before addressing the possibility of user error. In fact, it is often possible to quickly determine that the user's equipment and the network are functioning properly, and that the problem must be in something the user did. However, in the interests of diplomacy, it's often a good idea to be certain that a procedural error is the problem before you broach the subject with the user. Some people are perfectly willing to admit that they might be at fault, whereas others can be very sensitive about it. Part of the help desk technician's job is to resolve callers' problems without making them feel foolish, a skill that is becoming increasingly rare in the technical support industry.
User error can easily be the reason for a failure to access a Web site, and it can sometimes be difficult for the technician to detect when working with the user over the telephone. Many common Internet access problems are caused by the incorrect entry of URLs into the browser. For this reason, when a technician is having the user test the system by trying to access other sites, it is best to use existing bookmarks or favorites whenever possible. It might seem as though the user is experiencing a severe Internet connectivity problem, and the technician might be compelled to perform all sorts of network and hardware tests like those described earlier, when the problem is actually that the user is typing URLs with backslashes instead of forward slashes, or is inserting three forward slashes after the http: prefix instead of two.
This latter error is, in fact, what was causing Alice's problem. She had somehow gotten the impression that three forward slashes were correct, and was using them even when the technician was dictating the URLs of other sites she should try over the telephone to test her Internet connectivity. He started his dictation with www, knowing that typing the http:// prefix isn't necessary in most cases, but Alice added it to each URL on her own, assuming that it had to be there, but with three forward slashes instead of two. Thus, this particular problem could have been solved almost immediately if the technician had gone to Alice's location and watched her type in the URLs. This is not to say that every call to the help desk should be immediately followed by a trip to the user's location. In many cases, that would be impractical, but this particular case demonstrates how important the communication between the technician and the user can be.
There are many other common procedural errors that can interfere with a user's network connectivity, and many of these can be very difficult to catch over the telephone. Sometimes there is no substitute for actually watching what the user is doing. User logons, for example, are a common source of difficulties. Users often call the help desk because they are unable to log on to the network. If they have been trying to log on repeatedly and are failing every time, the technician should first check to see if the user has been locked out of the account. Many networks are configured to disable accounts after a certain number of failed logon attempts, in an effort to prevent brute force attempts by intruders. If the account is not locked, password policies might also be to blame. Users might ignore a message telling them that a periodic password change is required or attempt to reuse an old password when policy dictates against it. Another common occurrence among Windows 2000 and Windows NT users is for them to be trying to log on to the wrong domain or onto the local system using the wrong account. The domain selector in the logon dialog box might have been changed somehow, which is something that a technician is not likely to realize without actually watching the user try to log on.
On an internetwork consisting of several user segments connected by a backbone, with an Internet router connected directly to the backbone, specify whether the following network conditions would normally cause Internet access problems for one user only, for all of the users connected to one hub, for all of the users on one LAN, or for the entire internetwork.