Site-to-site VPN problems typically fall into the following categories:
Unable to connect. As with remote access, the procedures for troubleshooting the initial connection states follow the industry-standard protocols and are straight forward. The process is reiterated in this chapter so that you have in one place a clear methodology to work through to troubleshoot site-to-site connections.
Unable to reach locations beyond the VPN routers. This is where things start to differ from remote access. In remote access, only one side of the connection needed to handle routing issues, and it was able to mandate what the client’s routing looked like. In site-to-site, both sides of the connection are acting as routers for the sites they manage, and they both need to handle the IP routing issues. We will look at what to check to make sure routing operations are working according to specification.
Unable to reach the virtual interfaces of VPN routers. In remote access, only the VPN server needed to deal with IP address assignment. In site-to-site, each side needs to handle security for its side of the connection, and each VPN router assigns an address to the other router.
On-demand connection is not made automatically. In site-to-site configurations, demand-dial filters determine what kind of traffic will initiate the connection created or prevent the connection from being initiated. You need to be able to troubleshoot these filters and make sure connections are being created as needed.
Use the information in the following sections to isolate the configuration or infrastructure issue that is causing the problem. We start with the same basic connection troubleshooting that we used in Chapter 11, so much of this material is repeated. We will, however, emphasize the distinct differences you have to watch for.
When a calling router is unable to connect, check the following items:
Using the Ping command when connected to the Internet, verify that the host name for the answering router is being resolved to its correct IP address. Ping itself might not be successful because of packet filtering that is preventing Internet Control Message Protocol (ICMP) Echo messages being processed by the answering router.
If you are using password-based credentials, verify that the calling router’s credentials—consisting of user name, password, and domain name—are correct and can be validated by the answering router. Each side needs to maintain a set of credentials for the other. This is different than in remote access, where only one side needed to maintain a credential set.
Verify that the user account of the calling router is not locked out, expired, or disabled, or that the time the connection is being made corresponds to the configured logon hours.
Verify that the user account of the calling router is not configured to change its password at the next logon or that the password has not expired. A calling router cannot change an expired password during the connection process. If the password has expired or changed, the connection attempt is rejected.
Verify that the user account of the calling router has not been locked out because of remote access account lockout.
Verify that the Routing And Remote Access service is running on the answering router.
Verify that the answering router is enabled for both LAN and demand-dial routing by checking the General tab in the Properties dialog box of an answering router in the Routing And Remote Access snap-in.
On both the calling and answering routers, verify that the WAN Miniport (PPTP) and WAN Miniport (L2TP) devices are enabled for demand-dial routing connections (inbound and outbound) from the properties of the Ports object in the Routing And Remote Access snap-in.
Verify that the calling router, the answering router, and the remote access policy corresponding to site-to-site VPN connections are configured to use at least one common authentication method.
Verify that the calling router and the remote access policy corresponding to VPN connections are configured to use at least one common encryption strength.
Verify that the parameters of the connection are authorized through remote access policies.
For the connection to be accepted, the parameters of the connection attempt must do the following:
Match all the conditions of at least one remote access policy.
Be granted remote access permission through the user account (set to Allow Access). Or, if the user account has the Control Access Through Remote Access Policy option selected, the remote access permission of the matching remote access policy must have the Grant Remote Access Permission option selected.
Match all the settings of the profile.
Match all the settings of the dial-in properties of the user account.
To obtain the name of the remote access policy that rejected the connection attempt, scan the accounting log for the entry corresponding to the connection attempt and look for the policy name. If Internet Authentication Service (IAS) is being used as a Remote Authentication Dial-In User Service (RADIUS) server, check the system event log for an entry for the connection attempt.
If you are logged on using an account with domain administrator permissions when you run the Routing And Remote Access Server Setup Wizard, it automatically adds the computer account of the RAS and IAS Servers domain-local security group. This group membership allows the answering router computer to access user account information. If the answering router is unable to access user account information, verify that:
The computer account of the answering router computer is a member of the RAS and IAS Servers security group for all the domains that contain user accounts for which the answering router is authenticating. You can use the netsh ras show registeredserver command at the command prompt to view the current registration. You can use the netsh ras add registeredserver command to register the server in a domain in which the answering router is a member or other domains. Alternatively, you or your domain administrator can add the computer account of the answering router computer to the RAS and IAS Servers security group of all the domains that contain user accounts for which the answering router is authenticating site-to-site VPN connections.
If you add the answering router computer to or remove it from the RAS and IAS Servers security group, the change does not take effect immediately (because of the way that Windows Server 2003 caches Active Directory directory service information). For the change to take effect immediately, you need to restart the answering router computer.
For an answering router that is a member server in a Windows mixed-mode or a Windows native-mode Active Directory domain that is configured for Windows authentication, verify that:
The RAS and IAS Servers security group exists. If it doesn’t, create the group and set the group type to Security and the group scope to Domain Local.
The RAS and IAS Servers security group has Read permission to the RAS and IAS Servers Access Check object by checking the security permissions on the object and making sure that the security group exists and that it has Read permissions.
Verify that IP is enabled for routing on both the calling router and answering router.
Verify that all Point-to-Point Tunneling Protocol (PPTP) or Layer Two Tunneling Protocol (L2TP) ports on the calling router and answering router are not already being used. If necessary, go to the properties dialog box of the Ports object in the Routing And Remote Access snap-in and change the number of PPTP to L2TP ports to allow more concurrent connections.
Verify that the answering router supports the tunneling protocol of the calling router.
By default, a Windows Server 2003 demand-dial interface with the VPN Type set to Automatic will try to establish a PPTP-based VPN connection first, and then try an L2TP/Internet Protocol Security (IPSec)–based VPN connection. If either the Point to Point Tunneling Protocol (PPTP) or Layer 2 Tunneling Protocol (L2TP) VPN type option is selected, verify that the answering router supports the selected tunneling protocol.
Depending on your selections when running the Routing And Remote Access Server Setup Wizard, a Windows Server 2003–based computer running the Routing And Remote Access service is a PPTP and L2TP server with five or 128 L2TP ports and five or 128 PPTP ports. To create a PPTP-only server, set the number of L2TP ports to zero. To create an L2TP-only server, set the number of PPTP ports to 1 and disable remote access inbound connections and demand-dial connections for the WAN Miniport (PPTP) device in the properties dialog box of the Ports object in the Routing And Remote Access snap-in.
Verify the configuration of the authentication provider. The answering router can be configured to use either Windows or RADIUS to authenticate the credentials of the calling router.
For RADIUS authentication, verify that the answering router can communicate with the RADIUS server.
For an answering router that is a member of a native-mode domain, verify that the answering router has joined the domain.
For either a computer running Microsoft Windows NT version 4.0 Service Pack 4 (and later) with a Routing And Remote Access Service (RRAS) server that is a member of a Windows 2000 mixed-mode domain, or a Windows Server 2003 answering router that is a member of a Windows NT 4.0 domain that is accessing user account properties for a user account in a trusted Active Directory domain, use the net localgroup “Pre–Windows 2000 Compatible Access” command to verify that the Everyone group has been added to the Pre-Windows 2000 Compatible Access group. If it is not, issue the net localgroup “Pre–Windows 2000 Compatible Access” everyone /add command on a domain controller computer and then restart the domain controller.
For a Windows NT version 4.0 Service Pack 3 (and earlier) RRAS server that is a member of a Windows 2000 mixed-mode domain, verify that the Everyone group has been granted list contents, read all properties, and read permissions to the root node of your domain and all sub-objects of the root domain.
For PPTP connections using Microsoft Challenge-Handshake Authentication Protocol (MS-CHAP) and attempting to negotiate 40-bit Microsoft Point-to- Point Encryption (MPPE) encryption, verify that the user’s password is not larger than 14 characters.
Verify that packet filtering on a router or firewall interface between the calling router and the answering router is not preventing the forwarding of tunneling protocol traffic. See Appendix B, “Configuring Firewalls for VPN", for information on the types of traffic that must be allowed for PPTP and L2TP/ IPSec traffic.
On a Windows Server 2003–based answering router, IP packet filtering can be separately configured in the advanced TCP/IP properties dialog box and in the properties dialog box of an interface under IP Routing in the Routing And Remote Access snap-in. Check both places for filters that might be excluding VPN connection traffic.
Verify that the Winsock Proxy client is not currently running on the calling router.
You can tell if you have the Winsock Proxy Client installed on your computer by going to Control Panel and looking for the WSP Client icon. If it is present, go into the properties and disable it so that the VPN can operate.
When the Winsock Proxy client is active, Winsock API calls such as those used to create tunnels and send tunneled data are intercepted and forwarded to a configured proxy server. Proxy servers are typically used so that private users in an organization can have access to public Internet resources as if they were directly attached to the Internet. VPN connections are typically used so that authorized public Internet users can gain access to private organization resources as if they were directly attached to the private network. A single computer can act as a proxy server (for private users) and an answering router (for authorized Internet users) to facilitate both exchanges of information.
A proxy server–based computer allows an organization to access specific types of Internet resources (typically Web and FTP) without directly connecting that organization to the Internet. The organization can instead use private IP network IDs (such as 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/ 16).
We provide a typical L2TP log on the companion CD for your use to compare to your own. The most common problems that cause site-to-site L2TP/IPSec connections to fail are the following:
By default, site-to-site L2TP/IPSec connections require that the calling and answering router exchange computer certificates for IPSec peer authentication. Check the Local Computer certificate stores of both the calling and answering router using the Certificates snap-in to ensure that a suitable certificate exists.
If certificates exist, they must be verifiable. Unlike manually configuring IPSec rules, the list of certification authorities (CAs) for L2TP/IPSec connections is not configurable. Instead, each router in the L2TP/IPSec connection sends a list of root CAs to its IPSec peer from which it accepts a certificate for authentication. The root CAs in this list correspond to the root CAs that issued computer certificates to the computer. For example, if Router A was issued computer certificates by root CAs CertAuth1 and CertAuth2, it notifies its IPSec peer during main mode negotiation that it will accept certificates for authentication from only CertAuth1 and CertAuth2. If the IPSec peer, Router B, does not have a valid computer certificate issued from either CertAuth1 or CertAuth2, IPSec security negotiation fails.
The calling router must have a valid computer certificate installed that was issued by a CA that follows a valid certificate chain from the issuing CA up to a root CA that the answering router trusts. Additionally, the answering router must have a valid computer certificate installed that was issued by a CA that follows a valid certificate chain from the issuing CA up to a root CA that the calling router trusts.
By default, site-to-site L2TP/IPSec connections require that the calling and answering routers exchange computer certificates for IPSec peer authentication. Check the Local Computer certificate stores of both the calling and answering routers using the Certificates snap-in to ensure that a suitable certificate exists.
A network address translator (NAT) between the calling and answering routers.
If either the calling or answering router is running Windows 2000 Server and there is a NAT between the calling and answering router, you cannot establish an L2TP/IPSec connection because NAT-traversal (NAT-T) is not supported in Windows 2000 Server. IPSec NAT-T is supported only by Windows Server 2003 for site-to-site VPN connections.
A firewall between the calling and answering routers.
If there is a firewall between the calling and answering router and you cannot establish an L2TP/IPSec connection, verify that the firewall allows L2TP/ IPSec traffic to be forwarded. For more information, see Appendix B, “Configuring Firewalls for VPN.”
One of the best tools for troubleshooting IPSec authentication issues is the Oakley log. For more information, see the “Oakley Logging” section in Chapter 11. For a sample Oakley log, see the companion CD.
When Extensible Authentication Protocol-Transport Layer Security (EAP-TLS) is used for authentication, the calling router submits a Router (Offline request) user certificate and the authenticating server (the answering router or the RADIUS server) submits a computer certificate.
Verify that the calling router and answering router are correctly configured. To do this, use the following steps:
On the calling router, verify that EAP is configured as the authentication protocol in the advanced security properties of the demand-dial interface. Verify the settings of the properties of the Smart Card Or Other Certificate (encryption-enabled) EAP type. Verify that the correct Router (Offline request) certificate is selected when configuring the credentials of the demand-dial interface.
On the answering router, verify that EAP is enabled as an authentication method on the answering router and EAP-TLS is enabled on the matching remote access policy. Verify that the correct computer certificate of the authenticating server (the answering router or IAS server) is selected from the configuration settings of the Smart Card Or Other Certificate EAP type in the remote access policy for site-to-site VPN connections.
For the authenticating server to validate the certificate of the calling router, the following must be true for each certificate in the certificate chain sent by the calling router:
The current date must be within the validity dates of the certificate.
When certificates are issued, they are issued with a range of valid dates, before which they cannot be used and after which they are considered expired.
The certificate has not been revoked.
Issued certificates can be revoked at any time. Each issuing CA maintains a list of certificates that should no longer be considered valid by publishing an up-to-date certificate revocation list (CRL). By default, the authenticating server checks all the certificates in the calling router’s certificate chain (the series of certificates from the calling router certificate to the root CA) for revocation. If any of the certificates in the chain have been revoked, certificate validation fails.
If the CRL is locally available, it can be checked. In some configurations, the CRL cannot be checked until after the connection is made. The CRL is stored at the root CA and, optionally, in Active Directory. For a branch office router that is acting as an answering router in a site that does not contain the root CA, there are two solutions to this problem:
Publish the CRL in Active Directory. For more information, see the topics titled “Schedule the publication of the certificate revocation list” or “Manually publish the certificate revocation list” in Windows Server 2003 Help And Support. Once the CRL is published in Active Directory, the local domain controller in the site will have the latest CRL after Active Directory synchronization.
On the branch office router, create the following registry entry, and set the value to a DWORD of 1:
To view the CRL distribution points for a certificate in the Certificates snap- in, right-click the certificate and select Open, click the Details tab, and then click the CRL Distribution Points field from the drop-down list.
The certificate revocation validation works only as well as the CRL publishing and distribution system. If the CRL in a certificate is not updated often, a certificate that has been revoked can still be used and considered valid because the published CRL that the authenticating server is checking is out of date.
The certificate has a valid digital signature.
CAs digitally sign certificates they issue. The authenticating server verifies the digital signature of each certificate in the chain, with the exception of the root CA certificate, by obtaining the public key from the certificate’s issuing CA and mathematically validating the digital signature.
The calling router certificate must also have the Client Authentication certificate purpose (also known as Enhanced Key Usage [EKU] object identification [OID] 18.104.22.168.22.214.171.124.2) and must either contain a user principal name (UPN) of a valid user account or a fully qualified domain name (FQDN) of a valid computer account for the Subject Alternative Name property of the certificate.
To view the EKU for a certificate in the Certificates snap-in, double-click the certificate in the Contents pane, click the Details tab, and then click the Enhanced Key Usage field. To view the Subject Alternative Name property for a certificate in the Certificates snap-in, double-click the certificate in the contents pane, click the Details tab, and then click the Subject Alternative Name field.
Finally, to trust the certificate chain offered by the calling router, the authenticating server must have the root CA certificate of the issuing CA of the calling router certificate installed in its Trusted Root Certification Authorities store. To access the store, go to Start > Run> type “mmc”.
Additionally, the authenticating server verifies that the identity sent in the EAP- Response/Identity message is the same as the name in the Subject Alternative Name property of the certificate. This prevents a malicious user from masquerading as a different user from that specified in the EAP-Response/Identity message.
If the authenticating server is a Windows Server 2003 answering router or an IAS server, the following registry settings in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\RasMan\PPP\EAP\13 can modify the behavior of EAP-TLS when performing certificate revocation:
When set to 1, the authenticating server allows EAP-TLS clients to connect even when it does not perform or cannot complete a revocation check of the calling router’s certificate chain (excluding the root certificate). Typically, revocation checks fail because the certificate doesn’t include CRL information.
IgnoreNoRevocationCheck is set to 0 (disabled) by default. An EAP-TLS client cannot connect unless the server completes a revocation check of the client’s certificate chain (including the root certificate) and verifies that none of the certificates have been revoked.
You can use this entry to authenticate clients when the certificate does not include CRL distribution points, such as those from third parties.
When set to 1, the authenticating server allows EAP-TLS clients to connect even when a server that stores a CRL is not available on the network. IgnoreRevocationOffline is set to 0 by default. The authenticating server does not allow clients to connect unless it can complete a revocation check of their certificate chain and verify that none of the certificates have been revoked. When it cannot connect to a server that stores a revocation list, EAP-TLS considers the certificate to have failed the revocation check.
Setting IgnoreRevocationOffline to 1 prevents certificate validation failure because poor network conditions prevented their revocation check from completing successfully.
When set to 1, the authenticating server prevents EAP-TLS from performing a revocation check of the calling router’s certificate. The revocation check verifies that the calling router’s certificate and the certificates in its certificate chain have not been revoked. NoRevocationCheck is set to 0 by default.
When set to 1, the authenticating server prevents EAP-TLS from performing a revocation check of the calling router’s root CA certificate. NoRootRevocationCheck is set to 0 by default. This entry eliminates only the revocation check of the client’s root CA certificate. A revocation check is still performed on the remainder of the calling router’s certificate chain.
You can use this entry to authenticate clients when the certificate does not include CRL distribution points, such as those from third parties. Also, this entry can prevent certification-related delays that occur when a certificate revocation list is offline or is expired.
All these registry settings must be added as a DWORD type and have the valid values of 0 or 1. The calling router does not use these settings.
For the calling router to validate the certificate of the authenticating server for either EAP-TLS authentication, the following must be true for each certificate in the certificate chain sent by the authenticating server:
The current date must be within the validity dates of the certificate.
The certificate must have a valid digital signature.
Additionally, the authenticating server computer certificate must have the Server Authentication EKU (OID 126.96.36.199.188.8.131.52.1). To view the EKU for a certificate in the Certificates snap-in, double-click the certificate in the contents pane, click the Details tab, and then click the Enhanced Key Usage field.
Finally, to trust the certificate chain offered by the authenticating server, the calling router must have the root CA certificate of the issuing CA of the authenticating server certificate installed in its Certificates (Local Computer)\Trusted Root Certification Authorities store.
Notice that the calling router does not perform certificate revocation checking for the certificates in the certificate chain of the authenticating server’s computer certificate. The assumption is that the calling router does not yet have a connection to the network, and therefore might not have access to a Web page or other resource in order to check for certificate revocation.
Now that we have the two VPN routers connecting, we must make sure they are able to forward packets to each other’s network. We will now discuss routing and filtering issues. If traffic cannot be sent and received between locations on the intranet that are beyond the VPN routers, check the following:
Verify that IP routing is enabled (on the IP tab in the properties dialog box of the VPN router, in the Routing and Remote Access snap-in). Without this enabled, there are no routing capabilities on the server, and you will not be able to route traffic between interfaces as needed.
Verify that the demand-dial interface over which traffic is being sent has been added to IP Routing\General folder in the Routing And Remote Access snap-in. This is done automatically when you create the interface with the Demand-Dial Interface Wizard.
Verify that there are routes in the site routers on the calling router’s and answering router’s sites so that all locations on both networks are reachable. You can add routes to the routers of each site through static routes or by enabling a routing protocol on the site interface of the calling and answering routers. In practice, try to use the techniques of route summarization for the rest of the network. This accomplishes two things:
It eliminates the need to have extensive routing tables on the VPN routers.
It makes the convergence of the network much faster for the VPN servers in the case of a network change. If route summarization is properly used, the VPN routers will not have to change their routing tables at all.
Unlike a remote access connection, a demand-dial connection does not automatically create a default route. You need to create routes on both sides of the demand-dial connection so that traffic can be routed to and from the other side of the demand-dial connection.
You can manually add static routes to the routing table, or you can use routing protocols. For persistent demand-dial connections, you can enable Open Shortest Path First (OSPF) or Routing Information Protocol (RIP) across the demand-dial connection. Do not use dynamic routing on on-demand connections—doing so can cause a condition known as flapping, where the connection will look like a link that is continually activating and deactivating on the network. OSPF and RIP will constantly send out updates to the network to change the routing tables if nonpersistent connections are used. Always use static routes for this on-demand connection, and let the “next- hop” router beyond the VPN deal with the dynamic routing. For on-demand site-to-site VPN connections, you can automatically update routes through an auto-static RIP update.
For two-way initiated site-to-site VPN connections, verify that the answering router is not interpreting the site-to-site VPN connection as a remote access connection.
For two-way initiated connections, either router can be the calling router or the answering router. The user names and demand-dial interface names must be properly matched. For example, two-way initiated connections would work in the configuration shown in Table 12-1.
Router 1 in New York
Router 2 in Seattle
Router 1 has a demand-dial interface named NEW YORK that is configured to use User_Seattle as the user name when sending authentication credentials.
Router 2 has a demand-dial interface named SEATTLE that is configured to use User_NewYork as the user name when sending authentication credentials.
This example assumes that Router 2 can validate the User_Seattle user name and Router 1 can validate the User_NewYork user name.
If the incoming caller is a router, the port on which the call was received shows a status of Active and the corresponding demand-dial interface is in a Connected state. If the user account name in the credentials of the calling router appears under Remote Access Clients in the Routing And Remote Access snap-in on the answering router, the answering router has interpreted the calling router as a remote access client.
For a one-way initiated demand-dial connection, verify that the appropriate static routes are enabled on the user account of the calling router and that the answering router is configured with a routing protocol so that when a connection is made, the static routes of the user account of the calling router are advertised to neighboring routers.
Verify that there are no IP packet filters on the demand-dial interfaces of the calling router and answering router that prevent the sending or receiving of TCP/IP.
You can configure each demand-dial interface with IP input and output filters to control the exact nature of TCP/IP traffic that is allowed into and out of the demand-dial interface.
The virtual interfaces of the VPN routers are the interfaces on either side of the site- to-site VPN connection that represent the ends of the VPN tunnel. If traffic cannot be sent and received between the VPN router virtual interfaces, check the following:
Verify the IP address pool of the calling router and answering router.
If the VPN router is configured to use a static IP address pool, verify that the routes to the range of addresses defined by the static IP address pools are reachable by the hosts and routers of the site. If they aren’t, IP routes consisting of the VPN router static IP address ranges—as defined by the IP address and mask of the range—must be added to the routers of the site or enable the routing protocol of your routing infrastructure on the VPN router. If the routes to the address range subnets are not present, the calling-router logical interfaces cannot receive traffic from locations on the site. Routes for the subnets are implemented either through static routing entries or through a routing protocol, such as OSPF or RIP.
If the VPN router is configured to use Dynamic Host Configuration Protocol (DHCP) for IP address allocation and no DHCP server is available, the VPN router assigns addresses from the Automatic Private IP Addressing (APIPA) address range from 169.254.0.1 through 169.254.255.254. Assigning APIPA addresses to VPN routers works only if the network to which the VPN router is attached is also using APIPA addresses.
If the VPN router is using APIPA addresses when a DHCP server is available, verify that the proper adapter is selected from which to obtain DHCP-allocated IP addresses. By default, the VPN router chooses the adapter to use to obtain IP addresses through DHCP based on your selections in the Routing And Remote Access Server Setup Wizard. You can manually choose a local area network (LAN) adapter from the Adapter list on the IP tab in the properties dialog box of the VPN router in the Routing And Remote Access snap-in.
If the static IP address pools are a range of IP addresses that are a subset of the range of IP addresses for the network to which the VPN router is attached, verify that the range of IP addresses in the static IP address pool are not assigned to other TCP/IP nodes, either through static configuration or through DHCP.
If an on-demand connection is not being made automatically, check the following:
Verify that IP routing is enabled on the IP tab in the properties of the calling router.
In the Routing And Remote Access snap-in, check that the correct static routes exist and are configured with the appropriate demand-dial interface.
For the static routes that use a demand-dial interface, verify that the Use This Route To Initiate Demand-Dial Connections check box in the properties dialog box of the route is selected.
Verify that the demand-dial interface is not in a disabled state.
To enable a demand-dial interface that is in a disabled state, right-click the demand-dial interface under Network Interfaces, and then click Enable.
Verify that the dial-out hours for the demand-dial interface on the calling router are not preventing the connection attempt.
To configure dial-out hours, right-click the demand-dial interface under Network Interfaces, and then click Dial-Out Hours.
Verify that the demand-dial filters for the demand-dial interface on the calling router are not preventing the connection attempt.
To configure demand-dial filters, right-click the demand-dial interface under Network Interfaces, and then click Set IP Demand-Dial Filters.