9.7 Link state routing | Microsoft Exchange Server 2003 Administrators Pocket Consultant

< Day Day Up >

Any large messaging infrastructure is usually in a state of flux. While network links are normally available all the time, human, computer, or network error can conspire to interrupt traffic on a circuit and block the flow of messages. The more distributed and extensive the network, the more likely it is that some part of the network is currently unavailable.

As mentioned earlier, Exchange 5.5 uses the GWART to maintain a list of the routes messages can take to a final destination, including gateways. The GWART does not attempt to keep track of temporary network outages and merely consists of information about routes, so it is essentially a list about point-to-point connections that may or may not be available to route messages at any point in time. If a block occurs along the route for any reason, large queues can quickly build up in the MTA or connectors. Exchange now sends link state information between servers to provide an up-to-date picture of available routes messages can take. Exchange sends link state information using two methods:

Within a routing group, the routing service on each bridgehead server binds to port 691 via the IIS on the routing master to send and receive link state table updates. Communication occurs using a special protocol called LSA,^[8] specially developed by Microsoft for this purpose. In its turn, after receiving updates from bridgehead servers, the routing master broadcasts changes to all the servers in the routing group, again via port 691. If the IETF eventually sets out a common mechanism for email servers to share routing information, Exchange's current routing architecture is flexible enough to accommodate multiple protocols.
Exchange sends link state table updates between bridgehead servers in different routing groups whenever an update is available. The bridgehead server then passes the data on to the group's routing master. If Exchange uses an RGC or SMTP connector to link routing groups, it sends SMTP messages between the servers using port 25 instead of the LSA protocol across port 691. The connection starts with an "EHLO" to tell the server that ESMTP is going to be used, and then an "X-LINK2STATE" command to advertise the fact that the server is capable of exchanging link state information. If the receiving server acknowledges the command, the two servers then trade link state information. Exchange passes link state data in a highly compressed format and only requires a single DWORD to pass the up/down information; so little overhead is required to accommodate the basic data. Configuration data updates take up a little more space, and Exchange also sends the GUID and digest for the organization, but the overhead usually remains small.^[9] X.400 connections use a field to store and transmit link state information. Before Exchange sends any information, servers check that they belong to the same Exchange organization by verifying that they share the same organizational GUID and digest. The digest contains a hash of the organization name and version number and Exchange uses this to generate a string value that it can check quickly against a value generated by another server. If the check passes, the servers proceed with the update.

Exchange 2003 improves the performance of link state routing updates, especially for slow links such as those used by remote offices. For example, in a hub and spoke network, where routing groups in the spokes have only one possible link back to the hub, it does not make sense to generate new routing tables if that link fails, so Exchange 2003 now suppresses updates in these circumstances.

In addition, Exchange 2003 attempts to dampen the number of updates that "oscillating" connections (those that come up and down frequently) cause within the network, which reduces bandwidth consumption because Exchange transfers fewer routing updates between servers. To accomplish the goal, Exchange 2003 makes two changes. First, where Exchange 2000 has a strictly binary view of whether a connector is available (up/down), Exchange 2003 attempts to analyze whether the connector is experiencing a transient failure that might clear up before it decides that the connector has really failed. Second, Exchange 2003 suppresses "minor" link state updates (which flag a connection being up or down) and only transmits "major" updates. Major updates are caused by the installation of a new connector and similar changes to the routing infrastructure.

If you need to, you can suppress link state updates on an Exchange 2000 server by creating a new DWORD value called SuppressStateChanges in the registry at:

HKLM\SYSTEM\CurrentControlSet\Services\RESvc\Parameters

Set the value to 1 to suppress link state updates. This is not something to rush into unless you understand the consequences of suppressing the updates and are confident that the net saving in bandwidth will make a difference.

Anything to reduce bandwidth consumption always helps and you can gain a further boost by running on Windows 2003 servers, because, Exchange can then use the updated Windows DNS resolver code, which is less sensitive to slow response to DNS queries than the previous DNS resolver. In large organizations that have good network links and stable DNS environments, you may not notice these improvements, since they become most obvious when servers and links change state frequently. You can obtain some insight on the flow of SMTP communication between two Exchange servers from Figure 9.23, which shows two typical transactions between servers.

11:39:47 19.209.12.154 HELO - 250 11:39:47 19.209.12.154 MAIL - 250 11:39:47 19.209.12.154 RCPT - 250 11:39:47 19.209.12.154 DATA - 250 11:39:47 19.209.12.154 QUIT - 0 11:41:02 19.40.65.204 EHLO - 250 11:41:02 19.40.65.204 x-link2state - 200 11:41:02 19.40.65.204 MAIL - 250 11:41:02 19.40.65.204 RCPT - 250 11:41:02 19.40.65.204 DATA - 250 11:41:02 19.40.65.204 QUIT - 0

Figure 9.23: Extract from SMTP log file.

The first transaction is with the server with IP address 19.209.12.154 and consists of a HELO/MAIL/RCPT/DATA/QUIT command sequence. These commands establish a link, set up a recipient for the message and some originator details, pass some message data, and then terminate the connection. We know that the server we are corresponding with is not second-generation Exchange, since the initial connect is made with the HELO command rather than EHLO. The next transaction is with the server with IP address 19.40.65.204 and begins with EHLO. This does not automatically mean that this is a recent Exchange server, since many other SMTP servers support extended SMTP, but it is a good start. The extract does not tell us how the remote server responded to the EHLO command, but the fact that Exchange then issues an X-LINK2STATE command is a very good indicator that we have connected to another Exchange server. As it happens, the remote server is a bridgehead for another routing group. Bridgeheads always take the opportunity to update each other every time they talk.

Assuming we have a LAN-quality connection, sending link state information does not take very much time (less than a second in this case), but it is always done first to allow the remote server to update its LST if the need occurs. After Exchange passes the link state information, the two servers settle down to the normal sequence of commands necessary to send a message and the transaction then ends.

It is possible that you may want to restrict LST updates within an organization, especially if the servers are connected using low-bandwidth links. To suppress LST change updates, create the following DWORD value (and set the value to 1) in the system registry on all Exchange servers that you want to control:

HKLM\System\CurrentControlSet\Services\RESvc\Parameters\ SuppressStateChanges

The LST holds information about connection availability and cost for an entire organization. The picture of the network changes as you add, remove, and update servers and connectors or adjust connection costs. The major value of the dynamic nature of the LST is when an organization evolves, as in when you upgrade the servers in an organization, or when a company goes through a period of acquisition, merger, and divesture. In these scenarios, the LST is often in a state of dynamic flux as the routing group master updates it with information coming in from other servers in the same routing group plus information from other routing groups.

You can envisage the LST in many different tables, one for each routing group, as shown in Table 9.5. The concept of availability extends beyond the boundaries of Exchange, since external connections are also included. This prevents the Routing Engine from continually attempting to send messages across a connection such as an external Internet gateway when a network link is down. When a particular link fails, a retry is attempted. If the retry fails, the Routing Engine fires an event to tell the server that it must issue a link state update to the routing group master.

Table 9.5: Example Link State Table
Link States for London Routing Group	State	Cost	Version 125
Link from London RG to Dublin RG	Up	20
Link from London RG to Paris RG	Up	20
Link from London RG to Frankfurt RG	Up	20
Link from London RG to New York RG	Down	10
Link from London RG to Copenhagen RG	Up	20
Link States for New York RG	State	Cost	Version 146
Link from New York RG to San Francisco RG	Up	20
Link from New York RG to Boston RG	Up	20
Link from New York RG to Houston RG	Up	20
Link from New York RG to London RG	Down	10
Link from New York RG to Copenhagen RG	Up	50
Link States for Copenhagen RG	State	Cost	Version 343
Link from Copenhagen RG to London RG	Up	20
Link from Copenhagen RG to New York RG	Up	50
Link from Copenhagen RG to Stockholm RG	Up	20

Every routing group maintains a version number for its link state information. The version number can only be incremented by the routing group master, and this happens whenever the information changes because of a link state update. Routing groups use the version number to compare information about the state of the network. You can think of this data as the unique view of the network known to each routing group, and if the versions do not match during link state operations, the servers know that they have to update each other.

The reason for maintaining dynamic link state information is to allow the Routing Engine to find the optimum path for messages, basing the decision on the LST data. Exchange uses a modified form of Dijkstra's algorithm, a commonly used method to determine the shortest path between two points in a network, to assess the available routing paths. Open Shortest Path First (OSPF) is another name for this type of routing, and many network routers use a variation of the same algorithm to send packets between computers in the most efficient manner.

Inside a network, Exchange determines the optimum route using factors such as delay, throughput, and connectivity. Messaging is a little different, because other factors come into play, such as the eventual destination of a message (e.g., does Exchange have to route the message across a specific connector to reach another email system), its size, the sender, and message priority. During the decision process, the Exchange organization is modeled as a network, with each routing group represented as a network node and each connector as a link between nodes. The basic decision that has to be made is: Given a message and its properties (current location, sender, recipient, priority, and size) and the network infrastructure (link state and cost), what is the next best hop to route the message? As already discussed, Exchange allows connectors to be limited to handle particular sizes of messages or only accept messages from specific email addresses.

Avoiding message "ping-pong" and the type of rerouting that occur in Exchange 5.5 are major reasons for implementing link state routing. The GWART is static, so messages can be routed to an inoperative connector. When this happens, the MTA checks the GWART to discover whether another route exists and attempts to send the messages across the alternate route. If this connector is also unavailable, the MTA will attempt other routes until it exhausts all possible routes, in which case the messages remain queued until a route becomes available. It sounds OK to reroute messages in this fashion, but the messages ping-pong around sites and connectors until all available routes are exhausted, and in a large organization containing many sites and connectors, it can take some time before the MTA decides that it has tried all available routes. Another complication arises from the fact that connectors built with the Exchange Development Kit (EDK) maintain their own queues, and once the MTA has passed responsibility for a message to a connector by placing it onto the connector's queue, no further rerouting can take place. The IMS is the best example of how this can cause a problem. If an Internet connection is down, then all of the messages queued to the IMS that serves the connection will remain on that queue until the connection comes back up. Because manual intervention is required to force a GWART update (by increasing the cost to use the IMS that is down), and time is required to replicate the GWART to all sites, messages continue to accumulate on the queue until every site is updated.

Dynamic updates address the problem. The key advance is to propagate updates fast so that all routing groups learn about downed connectors so that messages do not have to travel to the connector only to discover that they need to be rerouted. Each routing group maintains its own LST and has a copy of the LST from every other routing group. Updates can occur literally each time a bridgehead server communicates to another bridgehead server, and the LST is updated as quickly as the bridgehead server can make a connection across port 691 to the routing master. Exchange uses the updated LST immediately as the basis for subsequent routing decisions, so message queues do not accumulate. Best of all, despite the fact that the routing group connector and SMTP connector are both SMTP based, they both allow rerouting, so even if an SMTP connection is unavailable, messages can be rerouted quickly as soon as a connector is deemed to be unavailable, assuming that an alternate path exists. This makes the routing system very efficient, because less processing is expended to route messages to their final destination. Note that an SMTP connector configured with a "*" address space is never marked as unavailable. This is logical, because such a connector is capable of handling traffic to any SMTP domain, and a failure to send to one domain (such as a.com) does not necessarily mean that the connector cannot transfer messages to other domains.

The MTA does not have the same central role in message routing in Exchange 2000/2003 as it had in previous versions. However, the MTA continues to perform this role for Exchange 5.5 servers, even in mixed- mode sites, and it is able to take advantage of the faster notification of downed connectors to make better routing decisions. Think of the new role of the MTA as the protocol interface for X.400 and the gateway (via the Store) to EDK connectors, such as those for IBM PROFS and SNADS, and the wide variety of available fax connectors. Eventually, as these connectors are phased out (and the fact that they are no longer supported by Exchange 2003 is a good signpost for the future), the MTA will gradually disappear- if Microsoft can find all the places where the MTA is referenced in the rest of the Exchange code base.

Microsoft originally intended that the Exchange routing master should automatically take over the RID master role (also known as the routing calculation master) in sites in mixed-mode organizations. The RID master is the server that builds a GWART within a site and publishes it to the other servers. Acting in this role, the routing master would be able to combine its knowledge of the routing table and GWART data provided by other Exchange 5.5 sites (obtained by replication through the site replication service and the ADC) and generate a GWART for sharing with the Exchange 5.5 servers in the site. The advantage of this mechanism is that the Exchange 5.5 servers are able to take some advantage of the dynamic nature of the LST. Exchange uses static routing to other Exchange 5.5 sites, because the GWART data from those sites remains essentially static, but the updates flowing into the link state routing table flow into the GWART generated in mixed-mode sites. One little flaw affected the plan.

Exchange 5.5 supports the concepts of subsites through locations that you assign as properties of servers. The availability of connectors can be limited by setting their scope, one of which limits a connector to the subsite or location into which you install the server. The routing master supports connector scopes, but administrative and routing groups have replaced sites, so the concept of subsites has gone away. If you use the routing master to generate the GWART, it will ignore subsite scopes, so if you use this feature in Exchange 5.5 you do not want the routing master to generate the GWART. Fortunately, you can assign any server in a mixed-mode site to act as the Exchange 5.5 routing calculation master. You set the server to act as the Exchange 5.5 routing calculation master through the Site Addressing properties of the site object, as shown in Figure 9.24.

click to expand
Figure 9.24: Exchange 5.5 routing calculation.

There is obviously a big difference between the way the MTA handles address spaces in the GWART and the new link state routing mechanism. If you have deployed connectors that limit their scope to a single site, you may find that you need to review the connectors before you start to deploy Exchange 2000/2003 servers.

9.7.1 Routing, retries, and updates

To explain what happens when a failure occurs on the network and how Exchange updates the LST, we can use the LST data outlined in Table 9.5 to follow the path of a message generated on a server in the Dublin routing group sent to a mailbox on a server in the Boston routing group. Figure 9.25 shows how the routing groups are connected. Network connectivity is similar to the type of links used in large corporate deployments. The network is organized into a series of hubs (New York, London, and Copenhagen), with the major links between New York and London and London and Copenhagen. A backup transatlantic link is available between Copenhagen and New York, but it is costed to prevent traffic from going across the link unless no other route is available.

click to expand
Figure 9.25: Link state routing and network outages.

The message starts by routing to the bridgehead server in the Dublin routing group. Exchange automatically generates a direct SMTP link to route the message from the originating server to the bridgehead, which then attempts to create a connection to the bridgehead server in the London routing group. After London successfully receives the message, Exchange analyzes the address information in the message header to determine how to route the message for the next hop to New York. London then attempts to open a connection to a bridgehead server in New York, but the attempt fails because of a network outage. If there are multiple bridgehead servers defined for New York, the London bridgehead will attempt to open a connection to each. All attempts fail.

The London bridgehead now goes into a "glitch-retry" state. This means that the server has recognized that a problem exists but will try to establish a connection in 60 seconds in case the fault is temporary. After 60 seconds, an event fires to tell the server to try again. Exchange attempts to contact each bridgehead server in New York but fails due to a continuing network problem. The London bridgehead goes through the "glitch-retry" sequence three times before applying the retry schedule set on the SMTP virtual server. The messages that caused the retry are rerouted as soon as a problem is detected and do not have to wait for the "glitch-retry" sequence to finish. The connection is then marked as "down," and the bridgehead server informs the routing group master in London about the problem.

After receiving the update, the routing group master updates its LST and sends updates to all of the other servers in the London routing group. The bridgehead server consults the updated LST (Table 9.5) and decides that an alternative, higher-cost route is available via Copenhagen. The server then sends link state updates to the other routing groups via the ESMTP X-LINK2STATE command to inform them that the London to New York link is currently unavailable. The update occurs before an attempt is made to send any other messages to prevent servers in the Dublin, Frankfurt, Paris, Copenhagen, and Stockholm routing groups afrom ttempting to send messages to London for onward processing. The routing groups that receive the link state update compare the version number on the update against the data held in their own tables. If the version number is higher, the update is applied and a new LST is created for the routing group.

Connectors are one way, so the London routing group first detects the problem when it attempts to send a message. At the other side of the Atlantic, a message sent to London will prompt the bridgehead server in New York to go through the same discovery process and update its own routing group master with a "down" status. Exchange then publishes the updated link state information to servers in the New York, Boston, and San Francisco routing groups, which proceed to update their copies of the LST.

The link between New York and Copenhagen becomes the preferred transatlantic connection until a bridgehead server in either London or New York determines that the link between the two routing groups is now available. The retry schedule on the SMTP virtual server determines when attempts are made to investigate the current status of the connector, and as soon as a connection is successful, the link is marked as "up," in which case a series of LST updates begins again to inform all routing groups that the connection is back and available for routing.

9.7.2 Looking at routing information

Short of trawling through memory and making some excellent guesses about what you find there, there is no out-of-the-box way of getting a detailed view of the LST on a server. The WinRoute tool provides the best insight (see section 9.11.1) with information displayed through the Status node in the Monitoring and Status section of ESM the next best thing. As Figure 9.26 shows, the Status option lists all the servers in the routing group that a server is connected to plus their status. In this case, our server (HPQNET-DC9) provides a view of the current routing environment as it sees it. We can see that the HPQEU-DC4 and HPQEU-DC24 servers are unavailable for some reason, possibly because the set of Exchange services is not running. In any case, Exchange cannot route messages to these servers now.

click to expand
Figure 9.26: Taking a snapshot of the routing environment.

The top portion of the display shown in Figure 9.26 lists all the connectors available to the servers. Lack of attention to naming conventions has made the output in Figure 9.26 a real mess, and there is no immediate indication of the purpose some of the connectors serve. For example, what is the difference between the "SMTP to Internet" and "External SMTP" connectors? We also seem to have two connectors for "Netherlands - Ireland" (albeit in different routing groups). However, a little work to apply a consistent naming convention and remove any unnecessary connectors will create a much more informative view.

Details of connectors are stored as AD objects, and you can rename them at any time. A short delay occurs before the rename is effective and displayed in the status window. This is because ESM keeps a cache of configuration data to stop it from having to go back to the AD each time it repaints a window. Within ten minutes of renaming your connectors, the new names should appear in the list, which is shown in Figure 9.27. By simply following a naming convention, we improve the information that an administrator sees at a glance, because all the connectors have names identifying their location and purpose. You do not have to use a strict naming convention, and it is possible to rename the connectors after creation by clicking on the name within the routing group and typing in a new name. Nevertheless, if you do not adopt a naming convention from the start, administrators will forget and will not go back and clean up the names afterward, which may result in confusion later on. According to Murphy's 233rd law of computing, that confusion will inevitably occur during a crisis, just as you are trying to debug an onerous routing problem.

click to expand
Figure 9.27: Improved naming conventions.

If you are unsure about the route that Exchange is currently using to send messages, select a message that was recently sent between two routing groups and examine the message header. All of the servers that handled the message en route are captured in the header. Figure 9.28 illustrates the point. In this instance, we are using Outlook Express to examine the properties of a message that had some delivery problems. The "details" tab of the properties reveals the route information, and it is often easier to follow this data by clicking the "Message Source" button to view the complete message in a resizable window.

click to expand
Figure 9.28: Examining the headers of a message.

^[8] . LSA is the protocol currently used by Exchange, and Microsoft may propose it in the future as the basis of an implementation for link state updates between SMTP servers to the IETF. If this proposal is accepted and turned into a formal RFC, it's likely that LSA will continue to be used in future releases. However, if another protocol is defined, it can be inserted instead of LSA. The X-LINK2STATE command is implemented as an ESTMP extension. See Figure 9.2 for a list of the ESMTP commands supported by Exchange 2000/2003.

^[9] . The WinRoute utility shows the actual link state data transmitted between servers.

< Day Day Up >