Designing Your Directory Replication System | Understanding and Deploying LDAP Directory Services (2nd Edition)

To maximize the reliability and performance of your directory service through replication, spend some time planning for both your current requirements and future expansion. To begin your planning, gather the following information:

A network map of your organization, showing all locations and the types of network connectivity between them. Label your network map in terms of zones of high connectivity and low connectivity, and label the reliability of long-haul network links.
An inventory of any directory-enabled applications you are running or intend to run. For example, LDAP-enabled messaging products such as Sun ONE Messaging Server and Enterprise Web Server use the directory for authentication and access control. If you know the physical locations of these applications and clients, note them on your network map. If you don't yet have a plan for deploying these servers and clients , that's OK; estimate as well as you can and revisit your assumptions as you gain experience through the piloting phase of your deployment.

Each vendor offers deployment and planning guides that help you plan the optimal replication design for your environment, and you should consult those guides. However, a few general principles apply to all replication designs:

Understand your zones of high connectivity and high network reliability, and how they are connected.
Place enough replicas in each high-connectivity zone to handle the load in that zone.
Try to minimize traffic over the links that connect the zones.

As we work through the design process, realize that it's easy to overengineer a replication solution. Think about airplanes: Very few have ten engines; most commercial jets these days have two because that's been found to be an optimal number, given fuel costs, plane sizes, and engine reliability. Now think about your directory: Although it's certainly possible to put three redundant directory servers on every Ethernet segment in an office building ”which would definitely increase your directory's reliability ”don't forget that someone has to set up and manage all those replicas!

In an office in which all workstations and servers have 10Mbps or better connectivity, reliability is probably so good that it's unnecessary to worry about network failures. Even if that's not a valid assumption, it's probably still cheaper to fix any network problems than it is to set up and manage many directory replicas. On the other hand, if your organization is spread over a wide geographical area and your sites are linked by slower, less-reliable network links, you should definitely pay close attention to how you allocate replicas across your network.

Finally, view this design process as iterative. Make a pass through it, thinking about maximizing reliability and performance. As your solution evolves, you may find that you want to revisit a design decision. In addition, as you think about replication, you may want to revisit some of your design decisions about directory topology and even your namespace design. Don't be afraid to do this; it will be time well spent.

Designing for Maximum Reliability

When you design for maximum reliability, you make your directory impervious to the failure of a single directory server. Then if one of your servers fails, directory clients can use another replica to obtain their directory services.

How exactly do directory clients deal with the failure of a particular server? LDAP client applications are responsible for detecting the failure of their primary server and reconnecting to an alternate server. At present, there is no standard method for locating an alternate server that can provide service, so it's useful to ask the supplier of an LDAP-based application how this is handled.

For example, client applications that use the Netscape LDAP C software development kit (SDK) can provide multiple server names when establishing a connection; if a given server is unavailable, the SDK tries another. Another option is to use a hardware failover device such as Cisco Systems' LocalDirector or Nortel Networks Alteon Link Optimizer, which can balance client load across multiple servers. It can also detect when a server has failed and avoids directing clients to that server until it is returned to service. Of course, the hardware failover device itself can fail, as can any of the network devices connecting it to the directory servers and the network. A complete solution for high availability makes all components redundant, including servers, load balancers, network switches, network routers, and power supplies . Whether the cost for this level of redundancy is justified will depend on your particular needs.

To maximize reliability, locate within each major zone of your network at least one replica connected via a network link of less than 10Mbps. (If you have a single well-connected network, you have only one zone to worry about.) For example, if your network comprises a single set of buildings connected by high-speed fiber- optic links, you might choose to deploy two replicated servers. If either server fails, client requests will be handled by the remaining server (see Figure 11.17).

Figure 11.17. Multiple Replicas Located at a Site

Suppose now that your organization consists of a central office with a series of remote offices connected by slower, 56Kbps network connections. In this case you might choose to place a replica at each remote site (see Figure 11.18) so that you can avoid wasting your scarce network bandwidth on LDAP client traffic.

Figure 11.18. Replicas Placed to Limit Client Traffic on WAN Links

Under normal circumstances, the directory clients shown in Figure 11.18 in the remote offices contact the onsite replica for directory operations. If the server fails, the remote clients can contact one of the servers in the central office. You could even place more than one replica in each remote office so that even if one of the onsite servers fails, the clients can obtain directory service without sending requests across the slower interoffice network link. If you use such a configuration, you may want to schedule replication updates to occur only during off-peak hours, if your directory server software supports that option.

Designing for Maximum Performance

When you design replication to enhance performance, you should strive to design a system that can handle your existing client load today but can be expanded easily to handle a larger load in the future.

Strive to provide a sufficient number of replica servers to handle your client load. Estimating your client load involves a bit of guesswork, but you should be able to get a good idea by understanding how often a given client makes a request and how a typical request from that client would affect the server. Then multiply this figure by the number of clients you expect to use the directory.

For example, suppose that 1,000 users will use a Web-based address book application to look up other employees in the directory. Begin by making some assumptions about how often people will use the service. (These assumptions can be verified , and adjusted if necessary, during the pilot phase of your deployment; see Chapter 14, Piloting Your Directory Service). Let's assume that each person will perform 10 lookups during the 8 A.M. “ to “5 P.M. workday ; that means you should expect to see 10,000 queries in eight hours. This translates into 1,250 queries per hour ”approximately one query every three seconds. If additional knowledge leads you to believe that usage will not be uniform, but instead will have spikes during the day, consider that as well.

Next understand the impact that the load will have on the directory server. Are the queries typically made on indexed attributes ”which perform well and place a lighter load on the server ”or might some of the queries be on unindexed attributes? Will the searches typically return a single entry or many entries? Will the retrieved entries contain large attributes, or will the amount of transmitted data be small (perhaps up to a kilobyte or two)?

Finally, determine how many replicated servers you will need to handle the load. You may be fortunate enough to have some data from the software vendor that tells you how many typical queries the software can service while running on standard hardware in a given time period. For example, Netscape Directory Server 6 can perform from several hundred to thousands of searches per second on a typical server-class computer, so a single server would be entirely capable of handling the client load in the previous example. If, however, you don't really know how many operations the server can perform per second, you should measure that factor when evaluating server software or piloting your directory.

Tip

When you're doing capacity planning, always take vendor-supplied performance figures with a grain of salt. Performance figures that appear in vendor data sheets reflect the performance measured by the vendor on one system with one particular set of assumptions. Performance may be affected by many other factors, including the operating system platform, amount of available memory, speed of disk drives , namespace design, complexity of access control rules, characteristics of LDAP clients, and much more. If possible, measure your systems under the types of loads you expect them to handle to get a better idea of individual server performance capabilities before planning your total number of replicas.

Consider write performance of your directory separately from search performance. In a replicated environment, each change made by a client needs to be made at each replica as well. For this reason, write performance in a replicated environment does not scale as well as search performance does. If your application modifies the directory frequently, you may find it advantageous to partition your directory so that each server needs to handle fewer modifications. Remember that directories are typically optimized for searching, so don't be surprised to discover that your software can perform only a few write operations per second ”even if it can perform hundreds or even thousands of indexed searches per second.

Another way to improve your directory's performance is to provide and tune dedicated servers for particular applications. For example, a busy Sun ONE Messaging Server imposes a heavy load on a server as it goes about its business of delivering mail. However, it uses only LDAP equality search filters (as opposed to substring or approximate filters). You could create a Netscape Directory Server 6 replica that is dedicated to servicing the messaging server's queries. Because you know that substring and approximate indexes are not required on the server, you could remove them from the server's configuration, therefore improving its write performance and reducing the memory requirements.

One other consideration is how your directory clients will know that multiple servers are able to handle their requests. Some directory services handle this situation automatically via their proprietary protocols, but LDAP does not currently provide a way for a server to inform clients about other replicas of it. One way to achieve load balancing in the absence of this capability is to use the Domain Name System (DNS) round-robin capability to map a given host name to all the IP addresses of hosts containing a copy of the replicated data. With DNS round- robin , the DNS server reorders the list of Internet Protocol (IP) addresses each time it responds to a query for the host name. In this way, client load can be divided among any number of servers. As previously mentioned, it is also possible to use a hardware load-balancing and failover device such as Cisco Systems' LocalDirector or Nortel Networks Alteon Link Optimizer to distribute load across multiple servers.

Using DNS round-robin to distribute clients across a set of replicas has a major drawback, however. If one of the directory servers fails, the DNS server will continue to direct clients to it. If the server will be out of commission for an extended period of time, you will need to remove the server's IP address from the DNS until the server is brought back online. Hardware load balancers typically notice when a server fails, and they stop directing clients to it until the server comes back online.

Other Considerations

In the discussion of reliability and performance, we've focused primarily on where to put replicas. However, we haven't considered some other factors that may be important when you're designing your replication system.

First you need to consider the maximum number of replicas your software can gracefully handle. This number is highly dependent on the software in use and the number of updates that your system receives. In general, it's a good idea to try to limit the number of replicas supplied by a single server to somewhere between five and ten. If you try to manage a larger number of replicas, the servers may spend so much time propagating updates that they are unable to answer client requests in a timely fashion. If your directory sees very few modifications, you can probably use more replicas; if your directory handles many modifications, you may need to use fewer replicas.

What if you find that your directory service is bogged down with synchronization traffic? One option is to partition your directory tree among a larger number of servers. When you do this, each server needs to handle fewer update requests and therefore needs to send fewer updates to replicas. Of course, the total amount of network traffic would still be the same (in fact, it might actually increase somewhat because of the partition management overhead of some directory systems), so if you find that your network is the bottleneck, a network upgrade may be in order. In practice, however, the network is rarely the bottleneck. More often the limiting factors are server CPU, disk I/O, and memory usage.

Another option for reducing the replication burden on a master server is to use a cascaded replication configuration. In a cascaded configuration , a change propagates from a supplier to a small number of consumers, and then from each of those consumers to a larger number of consumers, and so on until all replicas have been updated (see Figure 11.19). This approach lengthens the time it takes for a given change to propagate to all replicas, but it does make it possible to feed a larger number of replicas. Your directory server software may or may not allow this type of configuration, so consult your documentation.

Figure 11.19. Cascaded Replication

The second factor to consider is the overhead associated with managing a complex replication system. What if a replica goes down? How difficult is it to bring it back up? How do you monitor the system to be sure it's working properly? How difficult is it to find out whether a user 's complaint results from a replication problem? In general, use of the KISS principle (keep it simple, stupid) is a wonderful idea: The simpler you can make the replication configuration, the better off you'll be. It'll be easier to troubleshoot, simpler to fix, and probably more reliable overall. If your boss is unimpressed because the system looks too simple, you can describe it as "elegant." That usually works.

Choosing Replication Solutions

If you need to provide a reliable, high-performance directory service, you should evaluate the software you're considering in terms of its replication capabilities. The following are some general questions to ask when performing this evaluation:

Does the software support incremental updates, in which only the minimum set of changes needed to bring a replica into synchronization is sent?
Does the software support single-master replication, multimaster replication, or both?
How do client applications behave when a replica becomes unavailable? Do they automatically select a new server?
How easy is it to replace a master server that has failed?
How easy is it to manage replication? Does the software provide tools for determining the state of replication and whether replicas are up-to-date?
Does the software support the features you need? Does it, for example, allow replication to be scheduled at a particular time of day?
Can the software send updates securely (for example, in encrypted form) to prevent network eavesdropping?

More information about selecting appropriate directory server software can be found in Chapter 13, Evaluating Directory Products.