Troubleshooting DNS, DHCP, and Active Directory Issues
Because of the interdependence of Active Directory and DNS, many DNS issues may actually seem to be Active Directory problems at first glance. The remainder of this chapter provides insight into some commonly experienced problems and troubleshooting tips to help resolve issues quickly.
Troubleshooting Internal DNS Lookup Issues
By far the most common internal network problems occur when the server or a workstation does not have DNS configured correctly. The next two examples identify the behavior seen and describe why the problem occurs.
Server Hangs at Applying Network Settings
With few exceptions, when an SBS server takes a long time to boot, and specifically appears to hang for 20 minutes or longer at the Preparing Network Connections portion of the boot process, the network cards on the server are pointing to an external server for DNS. The output of an ipconfig /all command on a server experiencing this problem might look like this:
In this instance, when the server is in the Preparing Network Connections stage, it is attempting to register all its Active Directory information into the DNS server(s) listed on the NIC. Needless to say, the DNS servers hosted by the ISP are not going to accept any DNS registrations from just any server out on the Net, and it certainly does not recognize the nonroutable DNS suffix. When the attempt is made to register the DNS information with the remote server, the remote server will not respond, and the attempt will eventually time out. Unfortunately, the Active Directory startup routines are persistent and will keep making multiple attempts to register this information with the DNS server until it finally gives up. This process can take 20 minutes or longer, depending on how many external DNS servers are listed for each NIC.
This behavior also occurs when the internal NIC is listed as the DNS server for both NICs, but one of the NICs also has a secondary DNS server listed with an external address. The Active Directory DNS registration process is successful for the internal DNS server, but the server will attempt to register Active Directory information with each DNS server listed in the NIC configuration. The only way to avoid this situation is to have the internal IP address of the SBS server listed as the only DNS server for each network card in the server.
Connect Computer Wizard Fails to Find Users and Computers
Another common error occurs when a workstation is attempting to join the SBS domain using the Connect Computer Wizard. After starting the wizard from the web browser, an error is generated that says "The list of users and computers could not be found on the server. Make sure that the Small Business Server network adapters are configured correctly." The error occurs because the client workstation is not configured correctly, not because the server is misconfigured as the error applies. Microsoft has published KB article 837369 (http://support.microsoft.com/?id=837369) on this error. The KB article also indicates that the problem is the result of the client workstation having a DNS server entry that is not the SBS server's internal IP address. Again, this problem is resolved by modifying the network settings on the workstation so that it points to the SBS server as the only server for DNS.
In SBS 2003 SP1, a different error is generated in this situation (see Figure 5.8). Instead of the cryptic error described previously, the error details the exact problem and gives steps to resolve the problem.
Figure 5.8. The Connect Computer Wizard describes the most common reason for not being able to complete.
Using nslookup to Search for Internal DNS Names
Sometimes you may run across a situation where internal DNS name resolution just doesn't work correctly but with no obvious cause. Perhaps users start reporting that when they open their web browser, the Companyweb page fails to load and generates a Page Cannot Be Displayed error. Or they are suddenly unable to open a share on another server or workstation in the local network. If the problem seems to be isolated to a single machine or a small group of computers, it is unlikely that a problem exists on the SBS server, so troubleshooting should start at the workstation.
The best tool to use to troubleshoot client DNS problems is the command-line tool nslookup. This tool is installed by default on every Windows 2000, Windows XP, and Windows 2003 system. For this type of troubleshooting, we will use nslookup in interactive mode, which is entered by typing nslookup at a command prompt or after choosing Start, Run. In interactive mode, you are presented with a > prompt where you can enter multiple lookup commands. Type exit at the > prompt to exit the interactive mode of nslookup.
To test the DNS lookup of a local system, enter the system name at the nslookup prompt and press Enter. If you enter the name of a local workstation (jimdough01 in this example), you see a result similar to the following listing.
C:\>nslookup Default Server: sbs.smallbizco.local Address: 192.168.16.2 > jimdough01 Server: sbs.smallbizco.local Address: 192.168.16.2 Name: jimdough01.SmallBizCo.local Address: 192.168.16.25
When nslookup first enters interactive mode, it displays the name and IP address of the default DNS server being used by the client. In the preceding example, the workstation is pointing to the local SBS server, which is the correct configuration. If you see a different server listed in the initial nslookup output, you know that the default DNS server for the workstation is not set correctly, which is the likely cause of the problem. In this example, the workstation can look up the name of the workstation jimdough01 and get an IP address for the workstation.
If the reverse DNS lookup zones are not properly configured, the initial response from nslookup generates the following output:
*** Can't find server name for address 192.168.16.2: Non-existent domain Default Server: UnKnown Address: 192.168.16.2
If you see this response, you are not going to have problems doing DNS lookups through the server. You can resolve this issue by creating the reverse lookup zone for the internal network and adding a pointer (PTR) record for the SBS server in the zone.
When nslookup queries the name of a system that is not in the DNS table of the SBS server, it generates a response similar to the following listing. In this case, you would want to check the DNS entries in the forward lookup zone for the smallbizco.local domain and see whether there is an entry present for companyweb. In this example, there is not:
> companyweb Server: sbs.smallbizco.local Address: 192.168.16.2 *** sbs.smallbizco.local can't find companyweb: Non-existent domain
In the following example, nslookup returns an address for companyweb from the server:
> companyweb Server: sbs.smallbizco.local Address: 192.168.16.2 Name: companyweb.SmallBizCo.local Address: 192.168.16.8
However, the address returned is not the same as the address of the SBS server. In this case, we know that the failure to load companyweb in the workstation's Internet browser is because the DNS record is pointing to the wrong address.
As you can see from these few examples, nslookup can provide quite a bit of information on the local network with just a few commands. The next section takes a deeper look at how to use nslookup to troubleshoot DNS problems on the external network.
Troubleshooting External DNS Lookup Issues
System administrators learn quickly when there is a problem with the organization's Internet connection. Users tend to be quick to complain when they cannot get to a certain website, but often that call for help is phrased as "The Internet is down!" instead of "I am having trouble reaching this one site in particular even though other sites are working fine." So the first step in troubleshooting DNS problems on the Internet is asking a few pointed questions to determine the scope of the problem.
Certain Sites Have Intermittent Connection Problems
Intermittent problems are often the most difficult to diagnose because they do not always fail or do not fail in the same way every time they are encountered. When users complain that certain sites sometimes work and sometimes do not, but the problem is limited to a particular set of sites whereas others work with no difficulty, you will first want to take a look at EDNS as the source of the problem.
EDNS, often referred to as Extended DNS, is an enhanced DNS query process that has been implemented by default in Windows Server 2003. The EDNS specification allows for larger DNS query responses than standard DNS, and these larger responses can cause problems in some network configurations.
To turn off EDNS on the server and clear the DNS cache, enter the following two commands in a command prompt on the SBS server:
dnscmd /Config /EnableEdnsProbes 0 ipconfig /flushdns
The first command tells the server to send standard DNS queries instead of the extended DNS queries. The second command flushes the local DNS lookup cache on the SBS server and forces new lookups on all DNS requests. This modification should resolve the problem of intermittent connection problems to specific websites.
Connections to All External Sites Fail Periodically
The other side of the Internet connectivity coin comes when all access to external sites fails intermittently. If the actual Internet connection itself is good, meaning that you can access the SBS server from the Internet or you can access certain sites by IP address, the next step is to take a look at the DNS server on the SBS server. Again, you can use the nslookup tool to help with the troubleshooting.
The following is a sample nslookup session that attempts to determine whether there is a problem with the SBS DNS server:
C:\>nslookup Default Server: sbs.smallbizco.local Address: 192.168.16.2 > www.google.com Server: sbs.smallbizco.local Address: 192.168.16.2 DNS request timed out. timeout was 2 seconds. *** Request to sbs.smallbizco.local timed-out > www.sams.com Server: sbs.smallbizco.local Address: 192.168.16.2 DNS request timed out. timeout was 2 seconds. *** Request to sbs.smallbizco.local timed-out > companyweb Server: sbs.smallbizco.local Address: 192.168.16.2 Name: sbs.Smallbizco.local Address: 192.168.16.2 Aliases: companyweb.Smallbizco.local
After starting nslookup, two queries to well-known websites fail. The error, *** Request to sbs.smallbizco.local timed-out, seems to indicate a problem with the DNS Server service on the SBS box. However, a third query for a local name, Companyweb, succeeds, which indicates that the server is working correctly. The next step would be to use nslookup to do a direct query against the DNS server or servers listed in the Forwarders section of the DNS Management Console. The following listing shows an example transcript:
> server 18.104.22.168 DNS request timed out. timeout was 2 seconds. Default Server: [22.214.171.124] Address: 126.96.36.199 > www.google.com Server: [188.8.131.52] Address: 184.108.40.206 DNS request timed out. timeout was 2 seconds. DNS request timed out. timeout was 2 seconds. *** Request to [220.127.116.11] timed-out
The first command listed in the example is used to change the DNS server that nslookup will use. In this case, we see a timeout when attempting to change the DNS server. Although this initial response is not unusual when changing DNS servers, it could be an indication that there is a problem with the remote DNS server. The second command is a lookup attempt against a well-known web address. In this case, we get two timeout responses from the request. This is a solid indication that there is a problem with the remote DNS server.
One method to confirm that the DNS servers listed as forwarders are having problems is to do a lookup against a different DNS server. When ISPs provide DNS server information for network connections, they usually provide the addresses for two servers so that in case the first server goes down, the second is available as a backup. Not all ISPs keep their DNS servers on different network segments, so if a network segment fails that prevents connections to one of the servers, it is likely that a connection to the second server will fail as well. To that end, many consultants and IT professionals keep a listing of alternate DNS servers available for use and testing. They may use DNS servers from other ISPs they have used in the past, or they may use well-known DNS servers.
The next step in this troubleshooting process is to test DNS resolution against the secondary server provided by the ISP. The same steps shown in the previous listing will verify whether the secondary server is working. If that test fails as well, try to use the DNS servers at 18.104.22.168 and 22.214.171.124, two well-known public DNS servers. Those servers generally respond to DNS lookup requests unless network routing problems prevent the client site from reaching the servers on the Net. Because these servers do respond to ping requests, a simple ping 126.96.36.199 or ping 188.8.131.52 will determine whether the servers are reachable. If these servers respond to DNS queries, the lookup problems exist with the servers specified as forwarders.
As mentioned earlier in the chapter, the DNS server on SBS does not have to have DNS forwarders specified. If no DNS servers are listed as forwarders, the DNS server uses the root hint servers for lookups. This would be one additional test that could be used to determine whether there is a problem with the forwarders. If the SBS DNS server begins processing lookups correctly when the forwarders are removed, the problem lies with the forwarders.
To resolve a problem with forwarders failing to respond to DNS queries, either remove the forwarders from the DNS Management Console and use the root hints, or configure alternate DNS forwarders in the console. The changes take effect immediately so there is no need to restart the DNS Server service, but it may be necessary to run an ipconfig /flushdns command after making the changes to clear out any bad DNS lookups from the local cache.
Troubleshooting DHCP Configuration Issues
In the SBS world, two main DHCP issues will crop up from time to time. The first is that the DHCP service on the SBS server will stop unexpectedly and generate errors in the event logs. This almost always occurs when a second DHCP server is activated on the internal SBS network. The SBS DHCP Server service detects the second DHCP server and shuts itself down to avoid conflicts. Unfortunately, in doing so the SBS box allows the rogue DHCP server to handle DHCP requests, usually passing on invalid configuration information. This problem often presents itself on the network as though the DHCP server is not configuring the workstations correctly. Only on further review will it become clear that the DHCP Server service on the SBS server is actually shut down and not handing out configuration information at all.
The SBS server generates two errors in the System event log at startup when it detects another DHCP server on the local network. The first error is a 1053 error from DhcpServer. The error description reads:
The DHCP/BINL service on this computer running Windows Server 2003 for Small Business Server has encountered another server on this network with IP Address, [IP address], belonging to the domain: .
The second error is a 1054 error, also from DhcpServer, reading:
The DHCP.BINL service on this computer is shutting down. See the previous event log messages for reasons.
In a dual-NIC configuration, the SBS server will not complain about an active DHCP server on the external network. In some cases, the server may be configured to get its external IP address from a DHCP server. The only time it will have problems is when it identifies another DHCP server on the internal network.
The second main issue occurs when the internal network IP address is changed on the server without using the Change IP Address Wizard. If the IP address on the internal NIC is changed in the network card configuration directly, the DHCP scope is not updated automatically. In this case, when a workstation boots up, it will not be able to get an address from the DHCP server and will end up with an Automatic Private IP Address (APIPA) in the 169.254.x.x range. This situation presents itself as a workstation no longer able to communicate with the network. Running an ipconfig /all command on the workstation and comparing that output to the output of an ipconfig /all command run on the server will reveal that the workstation and the server are on separate networks or that the workstation is looking to the wrong IP address for the SBS server. You may also see this situation if the server IP address is changed as described previously and the workstation has not had its DHCP lease renewed since the change. Again, the IP address range on the server and the workstation will be different.
To resolve this problem, the DHCP server settings can be modified manually, but the easier route is to run the Change Server IP Address Wizard, which will rebuild the DHCP scope automatically. When this wizard is run, however, the new DHCP scope will be set with the SBS defaults. Any customizations that had been made to the DHCP scope previously will be lost.
Troubleshooting DNS-Related Active Directory Issues
Problems with Active Directory can often be traced back to DNS configuration problems or service errors. Some of these issues have been mentioned earlier in the chapter (NIC settings pointing to an external server for DNS, for example), but a number of other errors that may seem like AD failures are really just problems with the DNS service itself. This last section of the chapter looks at a few ways to quickly recover from the DNS problems that may be causing Active Directory errors.
DNS 4004/4015 Errors
If you encounter a number of DNS 4004 or DNS 4015 errors in the event logs, the first place to check is the DNS configuration for Active Directory in the DNS Management Console. Compare the contents of the Forward Lookup Zone for the internal domain to those shown in Figure 5.1 earlier in the chapter. The main lookup zone must contain at least these four records:
The first two records will have the internal FQDN of the server in the data field, and the last two will have the internal IP address in the data field, an example of which can be seen in Figure 5.1. If one of these records is missing or has incorrect data, the corrections can be made directly within the DNS Management Console by either adding the missing record or by editing a record and correcting any errors.
Figure 5.9 shows a portion of the Active Directory Forward Lookup Zone in the DNS Management Console. As with the Forward Lookup Zone for the internal domain discussed previously, some key elements must be present for Active Directory to function properly. In Figure 5.9, the _msdcs zone contains SOA (Start of Authority) and NS (Name Server) records just like the internal domain, and both of those records point to the internal FQDN of the SBS server. The _msdcs zone also contains an alias record, which points to the FQDN of the SBS server. Under the domains zone, there is a zone for the GUID for the domain as well.
Figure 5.9. The _msdcs forward lookup zone contains records for the server and domain.
Because the DNS service relies on a database for storage of its information, it is subject to database corruption like other systems. One sign of database corruption is that the CNAME record for the server in the _msdcs lookup zone is missing. Fortunately, recovering from this database corruption is not difficult.
The Netlogon service is the component that ties the DNS service in with Active Directory. It maintains the DNS records for AD in two files located in the config directory under system32. The files are netlogon.dns and netlogon.dnb. If these files are missing when the Netlogon process starts, they will be created automatically with the proper DNS information for Active Directory. If the files are present but corrupt, the Netlogon service will start but may produce unexpected results.
The Netlogon databases can be repaired in a single command line. First, set the current directory in a command prompt to C:\Windows\system32\config. Then enter the following command:
net stop netlogon && del netlogon.* && net start netlogon
This stops the Netlogon service, deletes the netlogon.dnb and netlogon.dns files from the config folder, and restarts the Netlogon process. If you look in the config folder after running this command, you will find that both the netlogon.dns and netlogon.dnb files have been re-created. When you refresh the DNS Management Console display, you will find that the CNAME record for the server has been re-created if it was missing. This process also re-creates the domains zone under _msdcs if it was missing as well.
If you find that the netlogon.dns and netlogon.dnb files do not get re-created in the config folder and you see warnings in the Event log (Netlogon 5781), check and make sure that the DNS server listed in the TCP/IP settings for all NICs in the server are pointing to the internal IP address of the SBS server. If the NIC DNS settings point elsewhere, this process will not register the DNS records correctly.
netdiag and dcdiag
Another set of tools that are useful in diagnosing network and Active Directory issues are included in the Support Tools package. Because the Support Tools are not installed by default, the package must be installed before the netdiag and dcdiag tools can be used.
Because the output of both netdiag and dcdiag fills several screen pages, the output from the command should be redirected to a file for ease of searching. Use the following commands at the command prompt to run neTDiag and dcdiag with verbose output, redirect the output to a file, and open the output file in Notepad after the command completes:
netdiag /v > netdiag.txt && netdiag.txt dcdiag /v > dcdiag.txt && dcdiag.txt
When Notepad brings up the output file, search through the file for the terms "fail" and "fatal" to quickly identify problems that the tools have identified. If any problems are found, a Google search on the error messages from the output can help quickly track down the source of the problem, if the problem is not evident from the description in the file itself.
Running neTDiag /fix also recovers any corruption within the Netlogon database files as well. This is effectively the same as stopping the Netlogon service, deleting the Netlogon database files, and restarting Netlogon as described previously.
Troubleshooting DNS, DHCP, and Active Directory Issues