Understanding and Deploying LDAP Directory Services > 10. Replication Design > Replication Concepts |
Replication ConceptsBefore we dive into designing our replication system, we should spend some time understanding the basic issues concerning directory replication. These issues are as follows :
Each issue is discussed in the following sections. Suppliers, Consumers, and Replication AgreementsIn replication systems, we use the terms supplier and consumer to identify the source and destination of replication updates, respectively. A supplier server sends updates to another server; a consumer server accepts those changes. These roles are not mutually exclusive: A server that is a consumer may also be a supplier. The configuration information that tells a supplier server about a consumer server (and vice versa) is termed a replication agreement . This configuration information typically includes the unit of replication (discussed next ), the hostname and port of the remote server, and other information about the replication to be performed, such as scheduling information. In other words, the replication agreement describes what is to be replicated, where it is to be sent, and how it will be done. The Unit of ReplicationWhen we talk about replication, we need some common language to describe what is to be replicated. In an abstract sense, we are interested in specifying
A natural way to describe a set of entries to be replicated is to specify the distinguished name (DN) at the top of a subtree and replicate all entries subordinate to (below) it (see Figure 10.4). Figure 10.4 Replicating an entire subtree.In Figure 10.4, the complete subtree rooted at ou=Accounting, dc=airius, dc=com is being replicated. Virtually all directory server implementations support this ability to specify that a complete subtree is to be replicated. This subtree usually corresponds to a directory partition, as described in Chapter 9. We might be interested in selecting only certain entries from a subtree. A reasonable thing to do would be to select entries based on their object class. For example, we might want to replicate only those entries that represent people or organizational units (see Figure 10.5). Figure 10.5 Replication of selected entries.In Figure 10.5, the root of the replicated subtree is once again ou=Accounting, dc=airius, dc=com , but only organizationalUnit and person entries are being replicated. The X.500 standards define this ability as the specification filter component of the unit of replication. One complication that can arise from selecting only certain entries is that the replicated directory may contain "holes." In the example depicted in Figure 10.5, if entries of objectclass organizationalUnit had not been selected, the replicated tree would look like the one shown in Figure 10.6. To be a valid directory tree, every entry except the root entry must have a parent; however, the consumer's directory tree violates that rule. To remedy this situation, the supplier could create on the consumer a placeholder in place of the entry that was not replicated. The X.500 model describes a specific type of placeholder,.termed a glue entry, used for just this purpose. Figure 10.6 A hole in the directory information tree (DIT) arising from filteredIn addition to selecting only certain types of entries for replication, we might want to replicate only certain attributes. For example, when providing a publicly searchable directory of employee information outside a corporate firewall, an organization might elect to replicate only full names , email addresses, and office telephone numbers and omit all other personal information. Notice in Figure 10.7 how the copy of John Doe's entry accessible outside the firewall contains fewer attributes than the master entry inside the firewall. The X.500 standards define this as the attribute selection component of the unit of replication. Figure 10.7 Replicating only selected attributes from an entry.Consistency and ConvergenceConsistency describes how closely the contents of replicated servers match each other at a given point in time. A strongly consistent replica is one that provides the same information as its supplier at all times; that is, a change made on one server is not visible to any other client until it has been propagated and acknowledged by all replicas. On the other hand, a weakly consistent replica is permitted to diverge from its supplier for some period of time. For example, Figure 10.8 shows that there is a period of time after a supplier has been updated but before the update has been propagated to a replica; during that time the supplier and the replica contain different data. Figure 10.8 Weakly consistent replicas.We say that a supplier and a replica have converged when they contain the same data. It is important that replication systems eventually converge over time so that all clients see the same view of the directory. In a directory system that uses weakly consistent replication, directory clients should not expect their updates to be immediately reflected in the directory. For example, a directo ry application should not expect that it can update an entry and then immediately be able to read it to obtain the updated values. It may come as a surprise that all practical directory systems use weakly consistent replicas. Why? The answer has to do with performance. Imagine that a single supplier feeds three replicas, and that each of the replicas handles a large client load of search requests . If the supplier maintains strong consistency with its replicas, it must send a change to each replica and receive a positive acknowledgment before returning a result to the client that sent the change. Because each replica is heavily loaded, it may be slow in sending the result to the supplier. The supplier can therefore return a result to the client no faster than the slowest replica acknowledges the update. This can reduce performance unacceptably. Additionally, implementing strong consistency among replicas requires that replicas support a two-phase commit protocol. This is necessary so that the supplier server can back out an update if any of the consumers should fail to acknowledge the change. The supplier would then return an error code to the client, and the client would presumably retry the operation later. This means that all consumer servers must be functional for a supplier server to accept a change, which is undesirable. In addition to its lower performance, strong consistency is incompatible with scheduled replication, an advanced feature we'll discuss later in this chapter. Briefly, scheduled replication permits updates to be deferred to some particular window in time, perhaps to the middle of the night. Because a strongly consistent system requires that updates be propagated immediately, it is essentially at odds with scheduled replication. Given all these challenges, weakly consistent replication systems are much easier to implement and provide better performance at the expense of temporary inconsistencies between supplier and replica servers. For virtually all directory applications, this is perfectly acceptable and represents a well-informed compromise on the part of directory designers. Incremental and Total UpdatesTo bring two servers into synchronization, we might choose to either completely replace the contents of the consumer server or transmit only the minimum information necessary to bring the servers into synchronization. The former approach, termed a total update in X.500 parlance, is useful when initially creating a replica (you'll learn more about this creation operation later in this chapter). It is very inefficient, however, to always use a total update strategy when updating consumer servers because all entries are transmitted even if they have not been modified. In an incremental update , only the changes made to the supplier's directory are sent to the consumer server. For example, if a directory client modifies an entry by replacing its description attribute, it's necessary to perform only that same change on all replicas to bring them into synchronization. It's not necessary to send the entire entry, and it's certainly not necessary to transmit the entire contents of the database to all replicas. Incremental updates are much more efficient, and all widely used LDAP directory server software supports them. Note If a replica's directory tree is in some unknown state (perhaps it has been damaged or reloaded from an extremely out-of-date backup), it may then be desirable to wipe out any existing contents and perform a total update. This is also what is done when a replica is initially populated with data. To better understand how the incremental update process works, let's look at the process from a general view, and then we'll examine how real-world directory services perform incremental updates. Following is an outline of the incremental update process:
In this way, a supplier transmits only the minimum number of updates necessary to bring the consumer server into synchronization. To provide some more concrete examples, let's examine how two popular directory services ”Netscape Directory Server and Novell Directory Services (NDS) ”incrementally update a consumer. The Netscape Directory Server Update ProcessThe Netscape Directory Server updates consumers by replaying changes it receives. For example, if a client connects to a Netscape Directory Server and adds a new entry, the supplier connects to all of its consumers and adds the same entry. Each change, when received by the supplier, is assigned a unique changenumber ; this is then logged to a changelog , a database that records all changes made to the server. The supplier keeps track of the changes it has replayed to consumer by storing in the consumer's directory tree the number of the last change applied. Figure 10.9 illustrates the Netscape Directory Server update process. Figure 10.9 The Netscape Directory Server update process.The Netscape Directory Server performs the following steps when incrementally updating a replica:
The Novell Directory Services Update ProcessNDS servers track updates by storing along with each attribute a timestamp that indicates when that attribute was last updated. To determine which updates need to be applied to a consumer server, an NDS supplier locates all attribute values in which the timestamp is greater than the last update timestamp for the consumer. An NDS server updates a consumer server's copy of a directory partition by sending any attributes that have changed since the last replication session. The timestamp of the last update is stored in the Synchronize Up To vector (or SynchUpTo vector) on the consumer server, and it is retrieved by the supplier server at the beginning of each replication session. The NDS update process is shown in Figure 10.10. Figure 10.10 The NDS update process.An NDS server performs the following steps when sending changes to another NDS server:
As you can see, the update processes for Netscape Directory Server and NDS are quite similar. The main difference is in how the updates themselves are stored on the supplier. Netscape Directory Server stores a record of each change in the changelog as it is received and processed , and it replays these changes to consumer servers. NDS (and Microsoft Active Directory as well) places a unique, ever-increasing number on each changed attribute value and sends updates as a series of attribute values to be updated on the consumer servers. The changelog approach has the advantage that no special action needs to be taken when an entry is deleted, renamed , or moved ”the changelog simply records the operation that the client performed. On the other hand, NDS and Active Directory must create a placeholder entry (called a tombstone or obituary ) that records the previous location of the entry and any associated timestamp or sequence number values. The main disadvantage of the changelog approach is that it records all changes, even when the same attribute of the same entry is modified multiple times. If the supplier simply replays all changes in order, as is typically done, more changes might be transmitted than necessary. The approach used by Active Directory and NDS simply requires one update that reflects the final state of the attribute. Initial Population of a ReplicaWhen a consumer server is initially configured, it contains no data. The replica must somehow be populated with a consistent snapshot of the supplier's data so that it can subsequently be kept in synchronization. Or, in the event that a consumer server has become damaged, the consumer must be brought back into synchronization, usually by removing the damaged data and creating a fresh copy of the directory data from the supplier. Note Be sure that the replica does not attempt to service requests until it has been completely initialized . Were it to begin servicing requests before being completely populated, it might give erroneous results. For example, it might claim that a given entry does not exist when in fact it has not yet received the entry from the supplier. Virtually all directory server software automatically takes care of arranging for a replica to be offline during replica initialization. The replica typically issues a referral to the master server or chains the operation to the master. How is a replica initialization performed? Directory vendors accomplish this task by using various methods , although all are similar. X.500's Directory Information Shadowing Protocol (DISP) supports a total update strategy while synchronizing, which allows a supplier server to completely repopulate a unit of replication on the consumer. (An X.500-compliant server from one vendor should, in theory, be able to re-initialize a consumer server from another vendor.) NDS uses a proprietary protocol for all replication operations, including creation of a replica. Netscape Directory Server 3.0 uses LDAP itself to initialize a replica, sending a series of delete operations to remove undesired entries and a series of add operations to populate the directory. Replication StrategiesThe term replication strategy refers to the way updates flow from server to server and the way servers interact when propagating updates. But after a client has successfully modified, deleted, added, or renamed an entry, how does the server that received the change make it visible on all the other replicated servers? There are three main approaches toward solving this problem: single-master replication, floating-master replication, and multi-master replication. In single-master replication, there is one (and only one) server that contains a writable copy of a given directory entry. All other replicas contain read-only copies of the entry. Note that this does not imply that you can have only a single master server for all of your directory content. If you have divided your directory into several directory partitions, each one of them should have a supplier server feeding consumer servers. The master server is the only one that can perform write operations, whereas any server may perform a search, compare, or bind operation (see Figure 10.11). Figure 10.11 Single-master replication.Because a typical directory-enabled application performs many more search operations than modify operations, it's beneficial to use read-only replicas. The read-only replica server can handle search operations just as well the writable master server. If the client attempts to perform a write operation on the read-only server (e.g., adding, deleting, modifying, or renaming an entry), we need some way to arrange for the operation to be submitted to the read-write server. There are two ways this can be made to happen. The first way is via a referral, which is simply a way for a server to say to a client: "I cannot handle this request, but here is the location of a server that should be able to." Figure 10.12 shows the steps involved when a directory client submits a change to a read-only replica. Figure 10.12 Directing an update to a master server by using referrals.The other way to get a write operation to go to the read-write copy is by chaining the request. That is, the server resubmits the request, on behalf of the client, to the read-write copy; it then obtains the result and forwards it to the client (see Figure 10.13). Figure 10.13 Directing an update to a master server by chaining.A more thorough discussion of referrals and chaining may be found in Chapter 9, "Topology Design." Typically, all these multistep interactions between clients and servers are handled automatically by the application software. Directory client users are unlikely to witness all of this ”instead, they simply see the modify operation complete, and the change is eventually available on the replica. (Note that there is a period of time when the read-write copy of the server contains newer data than the read-only copy, as mentioned in the discussion on consistency and convergence.) The astute reader will notice that in a single-master replication system there is a single point of failure: the read-write server. There is only one server that can process write operations for a given entry; if it goes down, no client can modify that portion of the directory (although search and read operations can continue at read-only replicas). Depending on the type of directory client software and directory-enabled application in use, this may or may not be acceptable. However, single-master replication is simpler to implement than the other types of replication, so it can be found in most directory server software products on the market. One replication strategy that avoids a single point of failure is floating-master replication . This strategy still has only one writable copy at any given time. However, if the read-write server should become unavailable for some reason, a new read-write master server is selected by some algorithm ”typically a voting algorithm in which the remaining servers collectively agree on a server to become the new master (see Figure 10.14). The actual mechanism of selecting a new master server is typically complicated and beyond the scope of this book. Figure 10.14 Floating-master replication: selecting a new master.Additional complications arise when a network becomes partitioned and a new master is elected on each side of the network partition (see Figure 10.15).The procedures for reconciling what happens to the two masters when the network is rejoined can be rather complicated. Although no traditional directory products use a floating-master scheme, Microsoft Windows NT 4.0 uses this approach when designating a given domain controller as either a primary domain controller (PDC), which can be modified; or a backup domain controller (BDC), which holds a read-only copy of the NT domain controller database. Figure 10.15 Multiple masters selected in a partitioned network.In a multi-master replication system, there may be (and almost always is) more than one read-write copy available. Clients may submit a write operation to any of the read-write replicas. It then becomes the responsibility of the set of cooperating servers to ensure that changes are eventually propagated to all servers in a consistent manner. Figure 10.16 shows two replicated servers that are capable of handling client write requests. Figure 10.16 Multi-master replication.Like floating-master replication, multi-master replication eliminates the single point of failure and thus offers greater reliability for directory clients. However, allowing more than one server to accept write operations brings additional complexity, most notably the need for an update conflict resolution policy . This is used to resolve an update conflict, which can occur when an attribute of an entry is modified at the same approximate time on two different master servers. We will discuss this topic later in the next section. One obvious question might be: "If multi-master replication offers better reliability, why do most implementations use single-master replication?" In the case of X.500, the designers felt that the added complexity of conflict resolution made a multi-master approach unworkable in the globally distributed directory they were designing. As of this writing, however, this decision is being revisited, and there may emerge a multi-master version of X.500 DISP. Work has also begun to define a standard replication protocol for LDAP servers, which is likely to involve multi-master and/or floating-master replication (in addition to single-master replication). Conflict Resolution in Multi-master ReplicationIn multi-master replication systems, more than one directory server may accept modifications for a given entry. Sometimes this creates a situation in which two directory clients modify the same entry on two different servers at the same time. But what happens when the clients use a different value for the same entry (see Figure 10.17)? Figure 10.17 Setting the stage for an update conflict.In Figure 10.17, Client 1 modifies the entry cn=John Doe, dc=airius, dc=com and replaces the telephoneNumber attribute with the single value +1 408 555 1212, submitting the change to Server A. At the same time, Client 2 modifies the entry cn=John Doe, dc=airius, dc=com and replaces the telephoneNumber attribute with a different value, +1 650 555 1212, submitting the change to Server B. After these operations complete on each server, the entries are in conflict: It's impossible for both changes to be retained, so one must be discarded. Because we require that the set of cooperating servers eventually converge, we need to invent some way of resolving this conflict. Note that there isn't really any correct way to resolve the conflict; each client's change is as good as the other's. Of course, each user thinks that his or her change will be made on all replicas ”and they may be somewhat surprised to discover otherwise . All currently available multi-master directory replication systems use a "last writer wins" policy to resolve such conflicts. Every attribute is marked with a timestamp that indicates the most recent time it was modified. If, while synchronizing with another server, the synchronization algorithm detects a conflict, the attribute value with the later timestamp value is selected ”and the other value is discarded. It follows, then, that in order to implement such a policy, the system clocks on each cooperating server must be kept in close synchronization so that timestamps from different servers can be meaningfully compared. NDS has an extensive time synchronization system that keeps the NDS server clocks in synchronization. Resolving Identical Timestamps Astute readers might ask what happens if two NDS servers assign the same timestamp to the same updated entry. Which server wins? In fact, NDS timestamps are structured such that it is impossible for this to happen. An NDS timestamp, which is 64 bits in length, consists of three parts : a 32-bit quantity that represents the number of seconds since the epoch (0000 UTC on January 1, 1970); a 16-bit quantity that represents the replica number that received the update; and a 16-bit event ID field sequentially assigned by the server that allows up to 65,536 updates within a single second. Because the replica number is guaranteed to be unique (unique replica numbers are assigned by the partition master during replica creation), there can never be a timestamp collision. You might imagine other, more sophisticated conflict resolution policies that reflect some set of business rules. For example, it may make sense to have a rule stating that changes made by a person in the human resources group always take precedence over changes made by other users. Whatever the conflict resolution policy, it is critical that all cooperating servers use exactly the same policy; if different policies are in use, it cannot be guaranteed that the directory contents will eventually converge. This fact will become increasingly important as vendors standardize a vendor-independent multi-master replication protocol.
|
Index terms contained in this sectionagreementsreplication attributes selected replicating changelogs Netscape Directory Server Updates changenumbers Netscape Directory Server updates conflicts resolving multimaster replication 2nd 3rd 4th 5th consistency replication strongly consistent 2nd 3rd weakly consistent 2nd 3rd consumers replication directories replication agreements consistency 2nd 3rd 4th 5th 6th convergence DNs (distinguished names) incremental updates 2nd 3rd Netscape Directory Server updates 2nd Novell Directory Services updates 2nd 3rd 4th 5th populating replicas 2nd selected entries 2nd 3rd 4th specification filters strategies 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th 16th 17th subtrees suppliers and consumers total updates Directory Information Shadowing Protocol, see DISP DISP (Directory Information Shadowing Protocol) populating replicas DNs replication entries replicating selected entries 2nd 3rd 4th specification filters 2nd floating-master replication 2nd identical timestamps resolving conflicts incremental updates replication 2nd 3rd multimaster replication 2nd 3rd conflict resolution 2nd 3rd 4th 5th Netscape Directory Server updates replication 2nd Novell Directory Services updates replication 2nd 3rd 4th 5th populating replicas 2nd replication agreements consistency strongly consistent 2nd weakly consistent 2nd 3rd convergence DNs (distinguished names) entries selected 2nd 3rd 4th specification filters incremental updates updates 2nd 3rd Netscape Directory Server updates 2nd Novell Directory Services updates 2nd 3rd 4th 5th populating replicas 2nd strategies floating-master 2nd multimaster 2nd 3rd 4th 5th 6th 7th 8th single-master 2nd 3rd 4th 5th 6th subtrees suppliers and consumers total updates resolving conflicts multimaster replication 2nd 3rd 4th 5th selecting entries for replication 2nd 3rd for replication;selecting attributes within entries single-master replication 2nd 3rd 4th 5th 6th specification filters replication strategies replication floating-master 2nd multimaster 2nd 3rd 4th 5th 6th 7th 8th single-master 2nd 3rd 4th 5th 6th strongly consistent replicas 2nd 3rd subtrees replicating suppliers replication timestamps identical resolving total updates replication update conflict resolution policies updates replication incremental 2nd 3rd Netscape Directory Server 2nd Novell Directory Services 2nd 3rd 4th 5th total weakly consistent replicas 2nd 3rd |
2002, O'Reilly & Associates, Inc. |