Example 1: A Directory-Enabled finger Service | Understanding and Deploying LDAP Directory Services (2nd Edition)

Understanding and Deploying LDAP Directory Services > 10. Replication Design > Replication Concepts

< BACK

CONTINUE >

153021169001182127177100019128036004029190136140232051053054012006216220012223049249196

Replication Concepts

Before we dive into designing our replication system, we should spend some time understanding the basic issues concerning directory replication. These issues are as follows :

Suppliers, consumers, and replication agreements
The unit of replication
Consistency and convergence
Incremental and total updates
Initial population of a replica
Replication strategies: single-master, floating-master, and multi-master
Conflict resolution in multi-master replication

Each issue is discussed in the following sections.

Suppliers, Consumers, and Replication Agreements

In replication systems, we use the terms supplier and consumer to identify the source and destination of replication updates, respectively. A supplier server sends updates to another server; a consumer server accepts those changes. These roles are not mutually exclusive: A server that is a consumer may also be a supplier.

The configuration information that tells a supplier server about a consumer server (and vice versa) is termed a replication agreement . This configuration information typically includes the unit of replication (discussed next ), the hostname and port of the remote server, and other information about the replication to be performed, such as scheduling information. In other words, the replication agreement describes what is to be replicated, where it is to be sent, and how it will be done.

The Unit of Replication

When we talk about replication, we need some common language to describe what is to be replicated. In an abstract sense, we are interested in specifying

Which entries are to be replicated
Which attributes from those entries are to be replicated

A natural way to describe a set of entries to be replicated is to specify the distinguished name (DN) at the top of a subtree and replicate all entries subordinate to (below) it (see Figure 10.4).

Figure 10.4 Replicating an entire subtree.

In Figure 10.4, the complete subtree rooted at ou=Accounting, dc=airius, dc=com is being replicated. Virtually all directory server implementations support this ability to specify that a complete subtree is to be replicated. This subtree usually corresponds to a directory partition, as described in Chapter 9.

We might be interested in selecting only certain entries from a subtree. A reasonable thing to do would be to select entries based on their object class. For example, we might want to replicate only those entries that represent people or organizational units (see Figure 10.5).

Figure 10.5 Replication of selected entries.

In Figure 10.5, the root of the replicated subtree is once again ou=Accounting, dc=airius, dc=com , but only organizationalUnit and person entries are being replicated. The X.500 standards define this ability as the specification filter component of the unit of replication.

One complication that can arise from selecting only certain entries is that the replicated directory may contain "holes." In the example depicted in Figure 10.5, if entries of objectclass organizationalUnit had not been selected, the replicated tree would look like the one shown in Figure 10.6. To be a valid directory tree, every entry except the root entry must have a parent; however, the consumer's directory tree violates that rule. To remedy this situation, the supplier could create on the consumer a placeholder in place of the entry that was not replicated. The X.500 model describes a specific type of placeholder,.termed a glue entry, used for just this purpose.

Figure 10.6 A hole in the directory information tree (DIT) arising from filtered

In addition to selecting only certain types of entries for replication, we might want to replicate only certain attributes. For example, when providing a publicly searchable directory of employee information outside a corporate firewall, an organization might elect to replicate only full names , email addresses, and office telephone numbers and omit all other personal information. Notice in Figure 10.7 how the copy of John Doe's entry accessible outside the firewall contains fewer attributes than the master entry inside the firewall. The X.500 standards define this as the attribute selection component of the unit of replication.

Figure 10.7 Replicating only selected attributes from an entry.

Consistency and Convergence

Consistency describes how closely the contents of replicated servers match each other at a given point in time. A strongly consistent replica is one that provides the same information as its supplier at all times; that is, a change made on one server is not visible to any other client until it has been propagated and acknowledged by all replicas. On the other hand, a weakly consistent replica is permitted to diverge from its supplier for some period of time. For example, Figure 10.8 shows that there is a period of time after a supplier has been updated but before the update has been propagated to a replica; during that time the supplier and the replica contain different data.

Figure 10.8 Weakly consistent replicas.

We say that a supplier and a replica have converged when they contain the same data. It is important that replication systems eventually converge over time so that all clients see the same view of the directory.

In a directory system that uses weakly consistent replication, directory clients should not expect their updates to be immediately reflected in the directory. For example, a directo ry application should not expect that it can update an entry and then immediately be able to read it to obtain the updated values.

It may come as a surprise that all practical directory systems use weakly consistent replicas. Why? The answer has to do with performance. Imagine that a single supplier feeds three replicas, and that each of the replicas handles a large client load of search requests . If the supplier maintains strong consistency with its replicas, it must send a change to each replica and receive a positive acknowledgment before returning a result to the client that sent the change. Because each replica is heavily loaded, it may be slow in sending the result to the supplier. The supplier can therefore return a result to the client no faster than the slowest replica acknowledges the update. This can reduce performance unacceptably.

Additionally, implementing strong consistency among replicas requires that replicas support a two-phase commit protocol. This is necessary so that the supplier server can back out an update if any of the consumers should fail to acknowledge the change. The supplier would then return an error code to the client, and the client would presumably retry the operation later. This means that all consumer servers must be functional for a supplier server to accept a change, which is undesirable.

In addition to its lower performance, strong consistency is incompatible with scheduled replication, an advanced feature we'll discuss later in this chapter. Briefly, scheduled replication permits updates to be deferred to some particular window in time, perhaps to the middle of the night. Because a strongly consistent system requires that updates be propagated immediately, it is essentially at odds with scheduled replication.

Given all these challenges, weakly consistent replication systems are much easier to implement and provide better performance at the expense of temporary inconsistencies between supplier and replica servers. For virtually all directory applications, this is perfectly acceptable and represents a well-informed compromise on the part of directory designers.

Incremental and Total Updates

To bring two servers into synchronization, we might choose to either completely replace the contents of the consumer server or transmit only the minimum information necessary to bring the servers into synchronization. The former approach, termed a total update in X.500 parlance, is useful when initially creating a replica (you'll learn more about this creation operation later in this chapter). It is very inefficient, however, to always use a total update strategy when updating consumer servers because all entries are transmitted even if they have not been modified.

In an incremental update , only the changes made to the supplier's directory are sent to the consumer server. For example, if a directory client modifies an entry by replacing its description attribute, it's necessary to perform only that same change on all replicas to bring them into synchronization. It's not necessary to send the entire entry, and it's certainly not necessary to transmit the entire contents of the database to all replicas. Incremental updates are much more efficient, and all widely used LDAP directory server software supports them.

Note

If a replica's directory tree is in some unknown state (perhaps it has been damaged or reloaded from an extremely out-of-date backup), it may then be desirable to wipe out any existing contents and perform a total update. This is also what is done when a replica is initially populated with data.

To better understand how the incremental update process works, let's look at the process from a general view, and then we'll examine how real-world directory services perform incremental updates. Following is an outline of the incremental update process:

The supplier server connects to the consumer server and authenticates.
The suppler determines which updates need to be applied.
The supplier sends the updates to the consumer.
The consumer applies those updates to its copy of the directory data.
The supplier and/or the consumer save state information that records the last update applied. This information is used in step 2 of subsequent synchronization runs.

In this way, a supplier transmits only the minimum number of updates necessary to bring the consumer server into synchronization. To provide some more concrete examples, let's examine how two popular directory services ”Netscape Directory Server and Novell Directory Services (NDS) ”incrementally update a consumer.

The Netscape Directory Server Update Process

The Netscape Directory Server updates consumers by replaying changes it receives. For example, if a client connects to a Netscape Directory Server and adds a new entry, the supplier connects to all of its consumers and adds the same entry. Each change, when received by the supplier, is assigned a unique changenumber ; this is then logged to a changelog , a database that records all changes made to the server. The supplier keeps track of the changes it has replayed to consumer by storing in the consumer's directory tree the number of the last change applied. Figure 10.9 illustrates the Netscape Directory Server update process.

Figure 10.9 The Netscape Directory Server update process.

The Netscape Directory Server performs the following steps when incrementally updating a replica:

The supplier server connects to the consumer server and authenticates.
The supplier reads the copiedFrom attribute in the entry at the top of the replicated subtree. The copiedFrom attribute contains, along with other information, the number of the last change replayed to the consumer.
The supplier server replays any changes with a changenumber larger than the number read from the consumer in step 1 until it runs out of changes to replay.
The supplier server stores in the copiedFrom attribute the number of the last change it replayed. The consumer is now in sync with the supplier.

The Novell Directory Services Update Process

NDS servers track updates by storing along with each attribute a timestamp that indicates when that attribute was last updated. To determine which updates need to be applied to a consumer server, an NDS supplier locates all attribute values in which the timestamp is greater than the last update timestamp for the consumer. An NDS server updates a consumer server's copy of a directory partition by sending any attributes that have changed since the last replication session. The timestamp of the last update is stored in the Synchronize Up To vector (or SynchUpTo vector) on the consumer server, and it is retrieved by the supplier server at the beginning of each replication session. The NDS update process is shown in Figure 10.10.

Figure 10.10 The NDS update process.

An NDS server performs the following steps when sending changes to another NDS server:

The supplier server connects to the consumer server and authenticates.
The supplier requests the SynchUpTo vector from the consumer. The consumer returns this information to the supplier.
The supplier sends any needed updates (those in which the timestamps are larger that the timestamp in the SynchUpTo vector).
After all updates have been transmitted, the supplier sends an updated SynchUpTo vector to the consumer. The consumer stores this for use during the next synchronization run.

As you can see, the update processes for Netscape Directory Server and NDS are quite similar. The main difference is in how the updates themselves are stored on the supplier. Netscape Directory Server stores a record of each change in the changelog as it is received and processed , and it replays these changes to consumer servers. NDS (and Microsoft Active Directory as well) places a unique, ever-increasing number on each changed attribute value and sends updates as a series of attribute values to be updated on the consumer servers.

The changelog approach has the advantage that no special action needs to be taken when an entry is deleted, renamed , or moved ”the changelog simply records the operation that the client performed. On the other hand, NDS and Active Directory must create a placeholder entry (called a tombstone or obituary ) that records the previous location of the entry and any associated timestamp or sequence number values.

The main disadvantage of the changelog approach is that it records all changes, even when the same attribute of the same entry is modified multiple times. If the supplier simply replays all changes in order, as is typically done, more changes might be transmitted than necessary. The approach used by Active Directory and NDS simply requires one update that reflects the final state of the attribute.

Initial Population of a Replica

When a consumer server is initially configured, it contains no data. The replica must somehow be populated with a consistent snapshot of the supplier's data so that it can subsequently be kept in synchronization. Or, in the event that a consumer server has become damaged, the consumer must be brought back into synchronization, usually by removing the damaged data and creating a fresh copy of the directory data from the supplier.

Note

Be sure that the replica does not attempt to service requests until it has been completely initialized . Were it to begin servicing requests before being completely populated, it might give erroneous results. For example, it might claim that a given entry does not exist when in fact it has not yet received the entry from the supplier. Virtually all directory server software automatically takes care of arranging for a replica to be offline during replica initialization. The replica typically issues a referral to the master server or chains the operation to the master.

How is a replica initialization performed? Directory vendors accomplish this task by using various methods , although all are similar. X.500's Directory Information Shadowing Protocol (DISP) supports a total update strategy while synchronizing, which allows a supplier server to completely repopulate a unit of replication on the consumer. (An X.500-compliant server from one vendor should, in theory, be able to re-initialize a consumer server from another vendor.) NDS uses a proprietary protocol for all replication operations, including creation of a replica. Netscape Directory Server 3.0 uses LDAP itself to initialize a replica, sending a series of delete operations to remove undesired entries and a series of add operations to populate the directory.

Replication Strategies

The term replication strategy refers to the way updates flow from server to server and the way servers interact when propagating updates. But after a client has successfully modified, deleted, added, or renamed an entry, how does the server that received the change make it visible on all the other replicated servers? There are three main approaches toward solving this problem: single-master replication, floating-master replication, and multi-master replication.

In single-master replication, there is one (and only one) server that contains a writable copy of a given directory entry. All other replicas contain read-only copies of the entry. Note that this does not imply that you can have only a single master server for all of your directory content. If you have divided your directory into several directory partitions, each one of them should have a supplier server feeding consumer servers. The master server is the only one that can perform write operations, whereas any server may perform a search, compare, or bind operation (see Figure 10.11).

Figure 10.11 Single-master replication.

Because a typical directory-enabled application performs many more search operations than modify operations, it's beneficial to use read-only replicas. The read-only replica server can handle search operations just as well the writable master server.

If the client attempts to perform a write operation on the read-only server (e.g., adding, deleting, modifying, or renaming an entry), we need some way to arrange for the operation to be submitted to the read-write server. There are two ways this can be made to happen. The first way is via a referral, which is simply a way for a server to say to a client: "I cannot handle this request, but here is the location of a server that should be able to." Figure 10.12 shows the steps involved when a directory client submits a change to a read-only replica.

Figure 10.12 Directing an update to a master server by using referrals.

The other way to get a write operation to go to the read-write copy is by chaining the request. That is, the server resubmits the request, on behalf of the client, to the read-write copy; it then obtains the result and forwards it to the client (see Figure 10.13).

Figure 10.13 Directing an update to a master server by chaining.

A more thorough discussion of referrals and chaining may be found in Chapter 9, "Topology Design."

Typically, all these multistep interactions between clients and servers are handled automatically by the application software. Directory client users are unlikely to witness all of this ”instead, they simply see the modify operation complete, and the change is eventually available on the replica. (Note that there is a period of time when the read-write copy of the server contains newer data than the read-only copy, as mentioned in the discussion on consistency and convergence.)

The astute reader will notice that in a single-master replication system there is a single point of failure: the read-write server. There is only one server that can process write operations for a given entry; if it goes down, no client can modify that portion of the directory (although search and read operations can continue at read-only replicas). Depending on the type of directory client software and directory-enabled application in use, this may or may not be acceptable. However, single-master replication is simpler to implement than the other types of replication, so it can be found in most directory server software products on the market.

One replication strategy that avoids a single point of failure is floating-master replication . This strategy still has only one writable copy at any given time. However, if the read-write server should become unavailable for some reason, a new read-write master server is selected by some algorithm ”typically a voting algorithm in which the remaining servers collectively agree on a server to become the new master (see Figure 10.14). The actual mechanism of selecting a new master server is typically complicated and beyond the scope of this book.

Figure 10.14 Floating-master replication: selecting a new master.

Additional complications arise when a network becomes partitioned and a new master is elected on each side of the network partition (see Figure 10.15).The procedures for reconciling what happens to the two masters when the network is rejoined can be rather complicated. Although no traditional directory products use a floating-master scheme, Microsoft Windows NT 4.0 uses this approach when designating a given domain controller as either a primary domain controller (PDC), which can be modified; or a backup domain controller (BDC), which holds a read-only copy of the NT domain controller database.

Figure 10.15 Multiple masters selected in a partitioned network.

In a multi-master replication system, there may be (and almost always is) more than one read-write copy available. Clients may submit a write operation to any of the read-write replicas. It then becomes the responsibility of the set of cooperating servers to ensure that changes are eventually propagated to all servers in a consistent manner. Figure 10.16 shows two replicated servers that are capable of handling client write requests.

Figure 10.16 Multi-master replication.

Like floating-master replication, multi-master replication eliminates the single point of failure and thus offers greater reliability for directory clients. However, allowing more than one server to accept write operations brings additional complexity, most notably the need for an update conflict resolution policy . This is used to resolve an update conflict, which can occur when an attribute of an entry is modified at the same approximate time on two different master servers. We will discuss this topic later in the next section.

One obvious question might be: "If multi-master replication offers better reliability, why do most implementations use single-master replication?" In the case of X.500, the designers felt that the added complexity of conflict resolution made a multi-master approach unworkable in the globally distributed directory they were designing. As of this writing, however, this decision is being revisited, and there may emerge a multi-master version of X.500 DISP. Work has also begun to define a standard replication protocol for LDAP servers, which is likely to involve multi-master and/or floating-master replication (in addition to single-master replication).

Conflict Resolution in Multi-master Replication

In multi-master replication systems, more than one directory server may accept modifications for a given entry. Sometimes this creates a situation in which two directory clients modify the same entry on two different servers at the same time. But what happens when the clients use a different value for the same entry (see Figure 10.17)?

Figure 10.17 Setting the stage for an update conflict.

In Figure 10.17, Client 1 modifies the entry cn=John Doe, dc=airius, dc=com and replaces the telephoneNumber attribute with the single value +1 408 555 1212, submitting the change to Server A. At the same time, Client 2 modifies the entry cn=John Doe, dc=airius, dc=com and replaces the telephoneNumber attribute with a different value, +1 650 555 1212, submitting the change to Server B. After these operations complete on each server, the entries are in conflict: It's impossible for both changes to be retained, so one must be discarded.

Because we require that the set of cooperating servers eventually converge, we need to invent some way of resolving this conflict. Note that there isn't really any correct way to resolve the conflict; each client's change is as good as the other's. Of course, each user thinks that his or her change will be made on all replicas ”and they may be somewhat surprised to discover otherwise .

All currently available multi-master directory replication systems use a "last writer wins" policy to resolve such conflicts. Every attribute is marked with a timestamp that indicates the most recent time it was modified. If, while synchronizing with another server, the synchronization algorithm detects a conflict, the attribute value with the later timestamp value is selected ”and the other value is discarded. It follows, then, that in order to implement such a policy, the system clocks on each cooperating server must be kept in close synchronization so that timestamps from different servers can be meaningfully compared. NDS has an extensive time synchronization system that keeps the NDS server clocks in synchronization.

Resolving Identical Timestamps

Astute readers might ask what happens if two NDS servers assign the same timestamp to the same updated entry. Which server wins? In fact, NDS timestamps are structured such that it is impossible for this to happen. An NDS timestamp, which is 64 bits in length, consists of three parts : a 32-bit quantity that represents the number of seconds since the epoch (0000 UTC on January 1, 1970); a 16-bit quantity that represents the replica number that received the update; and a 16-bit event ID field sequentially assigned by the server that allows up to 65,536 updates within a single second. Because the replica number is guaranteed to be unique (unique replica numbers are assigned by the partition master during replica creation), there can never be a timestamp collision.

You might imagine other, more sophisticated conflict resolution policies that reflect some set of business rules. For example, it may make sense to have a rule stating that changes made by a person in the human resources group always take precedence over changes made by other users. Whatever the conflict resolution policy, it is critical that all cooperating servers use exactly the same policy; if different policies are in use, it cannot be guaranteed that the directory contents will eventually converge. This fact will become increasingly important as vendors standardize a vendor-independent multi-master replication protocol.

Understanding and Deploying LDAP Directory Services, 2002 New Riders Publishing

< BACK

CONTINUE >

Index terms contained in this section

agreements
replication
attributes
selected
replicating
changelogs
Netscape Directory Server Updates
changenumbers
Netscape Directory Server updates
conflicts
resolving
multimaster replication 2nd 3rd 4th 5th
consistency
replication
strongly consistent 2nd 3rd
weakly consistent 2nd 3rd
consumers
replication
directories
replication
agreements
consistency 2nd 3rd 4th 5th 6th
convergence
DNs (distinguished names)
incremental updates 2nd 3rd
Netscape Directory Server updates 2nd
Novell Directory Services updates 2nd 3rd 4th 5th
populating replicas 2nd
selected entries 2nd 3rd 4th
specification filters
strategies 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th 16th 17th
subtrees
suppliers and consumers
total updates
Directory Information Shadowing Protocol, see DISP
DISP (Directory Information Shadowing Protocol)
populating replicas
DNs
replication
entries
replicating
selected entries 2nd 3rd 4th
specification filters 2nd
floating-master replication 2nd
identical timestamps
resolving conflicts
incremental updates
replication 2nd 3rd
multimaster replication 2nd 3rd
conflict resolution 2nd 3rd 4th 5th
Netscape Directory Server updates
replication 2nd
Novell Directory Services updates
replication 2nd 3rd 4th 5th
populating
replicas 2nd
replication
agreements
consistency
strongly consistent 2nd
weakly consistent 2nd 3rd
convergence
DNs (distinguished names)
entries
selected 2nd 3rd 4th
specification filters
incremental updates updates 2nd 3rd
Netscape Directory Server updates 2nd
Novell Directory Services updates 2nd 3rd 4th 5th
populating replicas 2nd
strategies
floating-master 2nd
multimaster 2nd 3rd 4th 5th 6th 7th 8th
single-master 2nd 3rd 4th 5th 6th
subtrees
suppliers and consumers
total updates
resolving
conflicts
multimaster replication 2nd 3rd 4th 5th
selecting
entries
for replication 2nd 3rd
for replication;selecting attributes within entries
single-master replication 2nd 3rd 4th 5th 6th
specification filters
replication
strategies
replication
floating-master 2nd
multimaster 2nd 3rd 4th 5th 6th 7th 8th
single-master 2nd 3rd 4th 5th 6th
strongly consistent replicas 2nd 3rd
subtrees
replicating
suppliers
replication
timestamps
identical
resolving
total updates
replication
update conflict resolution policies
updates
replication
incremental 2nd 3rd
Netscape Directory Server 2nd
Novell Directory Services 2nd 3rd 4th 5th
total
weakly consistent replicas 2nd 3rd

2002, O'Reilly & Associates, Inc.