Looking Ahead | Understanding and Deploying LDAP Directory Services (2nd Edition)

Understanding and Deploying LDAP Directory Services > 22. Directory Coexistence > Coexistence Techniques

< BACK

CONTINUE >

153021169001182127177100019128036004029190136140232051053054012003015023077157091220071

Coexistence Techniques

As mentioned in the previous section, there are multiple ways of maintaining directory coexistence. A directory implementing any of these solutions is sometimes broadly called a metadirectory . The idea behind the term is that the directory serves as an aggregation point for information that lives in other directories. Metadirectory is not a very well-defined term , as it covers quite a lot of ground. In this section we'll explore the various meanings of metadirectory and techniques for maintaining directory coexistence.

Migration

Data migration is the most rudimentary form of coexistence. We are hard-pressed to even describe it as coexistence because it refers to a one-time event rather than an ongoing process. Nevertheless, it is a good starting point for our discussion.

Data migration is simply a way of populating your directory from an external data source. Implicit in the migration is the fact that the source is not used after its data is entered into the directory. A good time to use data migration might be when switching from one email system to another, for example. If the old email system has its own application-specific directory containing user and group information, and the new email system uses your enterprise directory for its user and group database, migration is a good way to get the data out of the old system and installed in the new system with minimal disruption and inconvenience to your users. Data migration is illustrated in Figure 22.2.

Figure 22.2 Data migration.

Most directory products come with migration tools. Some, such as Netscape's LDIF import tools, rely on you to provide a text file of information in a standard format. The system you migrate from is often able to produce a text file, either in that format or one that is easily convertible . Other tools are specific to a particular application. For example, Netscape's Messaging Server contains migration tools from a number of email packages. Other vendors provide similar tools.

One-Way Synchronization

One step up from data migration is one-way data synchronization. With this approach, the directory is periodically populated from the data source. The reverse is also possible: An external database can be populated periodically from the directory. In one-way synchronization, data is changed in the source, not in the destination. Updates are propagated either by replacing the entire contents of the directory or by applying only those changes that have occurred since the last update.

The advantage of doing a total replacement is primarily simplicity: You just delete the old data and enter the new data. Some legacy database and directory systems have no facility for tracking changes, making it difficult to generate incremental updates and easier just to perform a total replacement. The disadvantage of this method is performance: For large data sets, it can take a long time to completely re-create the entire directory each time an update occurs. The directory may even need to be down during this process.

Incremental updates typically perform much better. If only 5% of the data changes between updates, an incremental update would be cheaper than a total replacement by a factor of 20. Another advantage of incremental updating is that it can usually be done online while the directory is up and running. This tends to make the update perform better, have a smaller impact on the service, and be easier to implement. Also, because updates come in over the same mechanism as regular directory updates, there is one fewer potential security hole to worry about.

Even if the end system you synchronize with does not support incremental change reporting, consider implementing this feature yourself. For example, you could save the last full extract from the system and compare it to the next full extract, thus creating an incremental update.

One-way synchronization is often used to extract data from your directory and populate other directories. This way you retain central control over the data by making it available for use with legacy systems. One-way synchronization is also often used to extract information from corporate data sources, such as the human resources database, to populate a read-only directory. This gives directory clients access to the data they need while leveraging your corporate data management procedures already in place.

Most directory coexistence plans call for a number of different one-way synchronization relationships for various attributes. For example, a user's name and job title might be synchronized one-way from the HR database, whereas the telephone number might be one-way synchronized from the phone system database. The directory itself might one-way-synchronize its user name and email address data to a number of application-specific directories for use in email address books. A typical example of this kind of synchronization is shown in Figure 22.3.

Figure 22.3 Multiple one-way synchronization relationships.

Two-Way Synchronization

Two-way synchronization involves propagating changes to a data element in two or more directions (see Figure 22.4). Changes to the data element can be made at any of two or more locations. The changes are propagated to all data repositories participating in the synchronization effort.

Figure 22.4 Two-way synchronization.

The advantage of this scheme is that it provides maximum flexibility. There is no need to select a single owner of the data and make other repositories read-only. Instead, every repository of the data can continue to be a read-write source for the data. The need for this kind of synchronization is illustrated by the password example previously mentioned.

The disadvantage of two-way propagation is its complexity and its occasional unpredictability . With changes occurring at any number of locations, it's relatively easy for conflicts to occur. A change made to a data element in one location may be in conflict with a change made to the same data element in another location at roughly the same time. To maintain a consistent view of the data in all locations, conflicts like this must be resolved in a predictable and efficient way.

Resolving these conflicts can be difficult. Even a simple approach, perhaps involving serializing access based on synchronized time, requires an additional service to keep the times synchronized ”and ties can still occur. Other solutions, perhaps involving a policy-based conflict resolution strategy, can be simpler to implement but often result in unexpected behavior from the user's point of view. For example, if a user sees a change he made get overwritten by another change, he may not understand the conflict resolution policy that led to this behavior.

Certain circumstances may require two-way synchronization, but the added complexity and potential user confusion it causes are usually reason enough to avoid it. If you think you have a specific need for two-way synchronization, be creative about how you might avoid it. For example, you might deploy a centralized service that changes passwords in one location and uses one-way synchronization to push those changes out to all other data sources. This would probably be a minor inconvenience to users, but it would make life a lot easier for everyone in the long term. Alternatively, consider how you could intercept calls to change passwords at other data sources and reroute them to the centralized service. This approach would give the illusion of two-way synchronization without the same complexity.

N-Way Join

When synchronizing data from multiple sources, you usually want a way to match up related information in each data source. For example, if your coexistence policy calls for synchronizing names , job titles, and managers from the corporate human resources database and telephone numbers from the telephone operations database, you would like to end up with one entry in your metadirectory that contains all this information for a single person. You do not want to end up with two entries for each person ”one from the HR database and one from the telephone operations database. The process of matching up entries from disparate databases is called joining .

To join two entries in different databases, you need to have a piece of information that is common to both databases. For example, if the databases each contain a field for Social Security number (SSN), you can use it to determine which entries correspond to the same people in both databases (see Figure 22.5). Using a uniquely identifying field such as SSN is much better than using a potentially non-unique field such as name.

Figure 22.5 A join.

Using an SSN can be less than ideal for several reasons. SSNs are sensitive information, a point explained in more detail later in this chapter. SSNs are also not immutable; they can change for a number of reasons. Such changes may be unlikely , but in large-scale directories even rare exceptions can be expensive. Finally, SSNs are not universal; not everyone has one, and you may not have access to SSNs for all the people represented in your directory. Foreign employees , customers, and others may not have an SSN or want to give theirs to you. For these reasons, it's often better to use or make up another unique identifier.

You often won't have anything better than names on which to join. This can lead to reduced efficiency in your synchronization procedure, create the need for manual synchronization, or even cause incorrect joining of information. Overcoming these problems is one of the biggest challenges in providing a metadirectory service that has reliably accurate data. A join on first and last names may typically match no better than 50% of the names in your databases. The numbers get significantly worse as the number of entries in your databases increases (e.g., the chances of having two Babs Jensens in a database of 100,000 people are significantly greater than the chances of having two Babs Jensens in a database of 100 people).

The result of this inadequate matching is usually a lot of manual work. An administrator typically goes through the unmatched entries by hand, comparing other information to try to determine a match. A worse outcome could be that the wrong match is made either automatically or by a careless administrator. In this situation, one person's information can appear in another person's entry. The consequences of this kind of error can range from annoying to serious, depending on the type of information, directory, and people involved.

The ability to join entries is an important feature that is needed for efficient synchronization. When you evaluate directory coexistence software, be sure to evaluate its ability to provide this feature. Also, investigate the software's advanced abilities , such as joining across multiple attributes and the extent to which you can tune the joining algorithm. Some environments may be willing to sacrifice accuracy for less administrative work. Other environments may be unable to tolerate inaccurate joins for any reason.

Virtual Directory

A final kind of metadirectory we will discuss is the virtual directory. A relatively new addition to the directory world, the virtual directory takes a different approach to directory coexistence. Instead of synchronizing data among directories periodically, a virtual directory provides a real-time directory view of selected data from multiple data sources.

The concept is simple: A virtual directory looks to the outside world like a regular LDAP server, but it holds no data. When it receives a request, it reformats and reroutes it to the necessary back-end data sources. The answers received are collated, reformatted, and sent back to the requestor . The virtual directory system is shown in Figure 22.6.

Figure 22.6 A virtual directory.

The advantages of this scheme are several. First, the question of who updates multiple copies of data is neatly solved . The virtual directory simply routes update requests to the appropriate data sources; no copies of the data are made. Second, the propagation delays inherent in a synchronization-based approach are also avoided. No data is copied , and each query is mapped onto the source data store in real time. Finally, the virtual directory allows you to dispense with messy and costly data management procedures designed to synchronize data.

There are also several drawbacks to the virtual directory scheme. First, it can be pretty complex. The algorithms required to map queries here and there, collate results, deal with failures that may have occurred in one source database but not another, and so on can be difficult to determine and implement. Second, performance is likely to suffer compared to the centralized, synchronized directory approach. The source directories are not likely to be as fast as your native enterprise directory to begin with, and adding extra network round-trips, real-time query and data mapping, and result and error processing can reduce performance even further. A final disadvantage is that virtual directories are new, and they do not yet qualify as off-the-shelf software. You would probably be looking at a substantial development project to get a virtual directory up and working. Of course, you may already be looking at a substantial development project to get data synchronization working in the first place!

The pros and cons of a virtual directory add up to a few conclusions. Virtual directories are probably best in environments in which your goal is to provide access from a few well-known applications to an existing database or set of databases. Knowing the applications, and therefore the kinds of requests they generate, allows you to significantly reduce the scope of your virtual directory development project. Another important consideration is performance: The virtual directory would not be as fast as a native directory. You should be sure to benchmark the virtual directory early on to ensure that it meets the performance needs of your applications.

As far as we know, at the time of this writing Netscape is the only commercial vendor to support a virtual directory. Support is provided through a database plug-in feature, allowing you to write your own back-end database. You can develop a virtual directory by writing your own back end that maps queries and reformats results (see Figure 22.7).

Figure 22.7 Netscape virtual directory server architecture.

Understanding and Deploying LDAP Directory Services, 2002 New Riders Publishing

< BACK

CONTINUE >

Index terms contained in this section

coexistence (directories)
migration 2nd 3rd
N-Way joining 2nd 3rd 4th 5th 6th
one-way synchronization 2nd 3rd 4th 5th
two-way synchronization 2nd 3rd 4th 5th
virtual directories 2nd 3rd 4th 5th
data
directory coexistence
migration 2nd 3rd
N-Way joining 2nd 3rd 4th 5th 6th
one-way synchronization 2nd 3rd 4th 5th
two-way synchronization 2nd 3rd 4th 5th
virtual directories 2nd 3rd 4th 5th
databases
joining
directory coexistence 2nd 3rd 4th 5th 6th
directories
coexistence
migration 2nd 3rd
N-Way joining synchronization 2nd 3rd 4th 5th 6th
one-way synchronization 2nd 3rd 4th 5th
two-way synchronization 2nd 3rd 4th 5th
virtual directories 2nd 3rd 4th 5th
flexibility
two-way synchronization
directory coexistence
incremental updates
directory coexistence
joining
directory coexistence 2nd 3rd 4th 5th
matching
metadirectories
directory coexistence
migration 2nd 3rd
N-Way joining 2nd 3rd 4th 5th 6th
one-way synchronization 2nd 3rd 4th 5th
two-way synchronization 2nd 3rd 4th 5th
virtual directories 2nd 3rd 4th 5th
migration
directory coexistence 2nd 3rd
N-Way joining
directory coexistence 2nd 3rd 4th 5th
matching
one-way synchronization
directory coexistence 2nd 3rd 4th 5th
synchronization
directory coexistence
N-Way joining 2nd 3rd 4th 5th 6th
one-way 2nd 3rd 4th 5th
two-way 2nd 3rd 4th 5th
virtual directories 2nd 3rd 4th 5th
two-way synchronization
directory coexistence 2nd 3rd
flexibility
unpredictability
updates
directory
incremental
virtual directories
coexistence 2nd 3rd 4th 5th

2002, O'Reilly & Associates, Inc.