The Data Maintenance Policy | Understanding and Deploying LDAP Directory Services (2nd Edition)

In Chapter 7, Data Design, you learned how to identify, locate, and obtain the data needed to populate your directory. The result of that phase of your design process was a table listing all your data sources, the data you need from each source, and the procedures you will use to obtain the data. This information plays an important role in formulating your data maintenance policy.

Your data maintenance policy determines who is responsible for maintaining which attributes in the directory. If more than one entity is allowed to maintain an attribute, your data maintenance policy determines how conflicts are resolved. For example, suppose that users and the human resources databases are both allowed to update the telephoneNumber attribute. What do you do if both sources update the attribute with different values at the same time? Your data maintenance policy should outline procedures to determine the answer to this question. It should also determine the frequency of updates for pieces of information in the directory and the security those updates require.

Another important procedure determined by your data maintenance policy is how exceptions are handled. Every policy has exceptions. Do not fool yourself into thinking your data maintenance policy is the exception to this rule.

There are potentially as many reasons to make exceptions to your data maintenance policy as there are users of your directory. For example, consider an operation that obtains home address information from the official payroll database. Although this might be just fine for the vast majority of your employees , consider an employee who has his checks mailed to a location different from his home. Including this mailing address in the employee's home contact information would be just plain wrong. You may want to make an exception for this user .

How you handle data maintenance exceptions will have a great impact on the cost of maintaining your directory service. Consider the number of exceptions in relation to the projected size of your directory. Is the policy you choose applicable to 90 percent of your users? 99 percent? all of them? Make some educated guesses here, and realize that the larger your directory becomes, the more important it is that the policy be nearly universal.

For example, consider a small directory containing entries for 100 people. If your policy correctly covers 99 out of these 100 users, it is a relatively small burden to do something special for the remaining person. On the other hand, if your directory supports an e-commerce application and contains 1 million entries, 99 percent coverage is pretty close to a disaster: You would have to make special exceptions for 10,000 people, a task that can become arduous and expensive.

Consider automating the exception process itself. This may sound counterintuitive, but it is possible in some situations. For example, you might allow users to modify their own entries in a way that exempts them from the standard data policy. One way to do this would be to have the user set a flag in his or her entry. The flag would then be read by the automated data maintenance procedure, prompting special handling of the user's entry.

For the purposes of this chapter, we separate the attributes in your directory into six categories:

Attributes maintained by directory administrators . These kinds of attributes might include access control attributes, password policy information, or other attributes used in operating the directory.
Attributes maintained by directory content administrators . These attributes might include account information maintained by system administrators of other systems or services. They might be maintained by your Help Desk, departmental administrative assistants, or other agents acting as proxies for other users.
Attributes maintained by official data sources . By "official," we mean corporate data sources maintained by your organization's human resources, finance, or other departments. These attributes might include official name , work telephone number, employee identification number, title, salary, manager, and other attributes your organization maintains in other databases.
Attributes maintained by directory end users . These attributes might include home address and telephone number, description, picture, and other attributes containing personal information about the user. For an e-commerce application with self-serve account creation, users may maintain all their attributes.
Attributes maintained by directory-enabled applications . These attributes might include application-specific preference information, application or user state information, or attributes shared across multiple instances of the application.
Attributes maintained by the directory service itself . This category includes attributes such as modifiersName , modifyTimeStamp , and others that the directory maintains either as a convenience or to ensure its proper operation.

You probably don't need to worry about attributes that the directory itself maintains. Application-maintained data is not up to you to maintain, but it's important that you monitor and approve how applications use the directory. In the following discussion we also consider centrally maintained data ”that is, data maintained by administrators and other data sources ”and user-maintained data.

Application-Maintained Data

Data maintained by directory-enabled applications may end up being the majority of the data in your directory. The health of this kind of data and the applications that maintain it can have a tremendous effect on the quality and performance of your directory service. Make sure that the directory-enabled applications that you or others develop use the directory in an efficient and sensible way. Be proactive when you do this; don't wait for application developers to come to you.

There are many aspects of efficient and high-performance directory-enabled application development. A tutorial on directory-enabled application development is beyond the scope of this book, but the following list highlights some of the more important things to remember. Following these tips will go a long way toward making any directory-enabled application perform better:

Minimize connections . Both your application and your directory service will benefit if you open a connection only once and reuse it for many operations. A search request, with a single network round-trip to the directory server, can often be processed in only a few milliseconds . Opening a connection to perform the same operation can take much longer and consume far more network and server resources.
Perform only efficient searches . The capabilities of your directory software and how you configure it determine what kinds of searches your directory can handle efficiently. The difference in response time between a search your directory can respond to efficiently and one it can't is often measured in minutes. Efficient searches are important both for reducing the response time perceived by the application making the query and for increasing the overall throughput provided by your directory server. Clearly, a balance must be reached between application developers and directory administrators. Sometimes application developers can be shown a better way to do things; at other times, directory administrators may be able to reconfigure the server to better serve an application's requests .
Minimize the number of searches . Application developers often don't think in terms of consolidating operations for efficiency. As a result, multiple searches might be performed when only one would do. For example, if neighboring parts of the code call for an application to recover the mail and title attributes, an application developer might make two searches ”one for each attribute. A more efficient approach would be to do one search asking for both attributes at the same time.
Retrieve only required attributes . Another area often ripe for improvement is the number of unnecessary attributes retrieved by an application. Sometimes developers are tempted to retrieve attributes they don't really need, perhaps even every attribute in the entry. In some directory installations, this can be a lot of data! For example, you might maintain a JPEG photo or audio greeting attribute that is tens of thousands of bytes long. Transmitting such attributes needlessly wastes bandwidth and computing power and can have severe performance implications because of access control and other processing overhead. It's better to ask only for those attributes the application will use.
Minimize updates . As discussed in Chapter 1, Directory Services Overview and History, a defining characteristic of a directory is that it is better equipped to handle read operations than update operations. Typical directory implementations can handle several orders of magnitude more reads and searches than writes . The less writing an application does to the directory, the better performance everyone accessing the directory will experience. It's also important to make updates as efficient as possible. Encourage application developers to change only the minimum information necessary. A common mistake is to perform a modification by deleting the old entry and adding the modified entry even if only one attribute value in the entry has changed. This kind of behavior can make updates substantially less efficient.

There are many techniques for communicating these guidelines to application developers. Among the more effective methods we've found are the following:

Documentation . If you simply document and publicize good practices, developers will often follow them. You might do this in the form of printed or online documentation.
Training and seminars . Most inefficiencies in the way directory-enabled applications use the directory are simply due to a lack of education. Documentation can help, but sometimes developing a formal training course is called for.
Sample code . Whatever vehicle you use to distribute guidelines to developers, be sure to include plenty of sample code. Not only does providing sample code put things in a language developers can understand, but it also makes incorporation into an application easier.
Consulting . In some situations it may be worthwhile to provide one-on-one consulting to application developers. You might do this during the application's design phase, during development when you can act as a support resource, and during application testing when you can help fix any last-minute problems.
Laboratory testing . One good way to root out and correct potential problems is to host application developers and their applications in your testing laboratory. This is good for developers because it gives them a chance to test their application in a controlled, easily monitored environment. It's good for you and your directory service because it gives you a chance to correct bad behavior before it is unleashed on your production service.

You may not have the resources to implement all these techniques. Think about the kind of community you're dealing with and which techniques will give you the biggest bang for your buck. Make sure you document the policy to which applications must conform. This documentation will give you an objective tool to use in judging and correcting applications.

Centrally Maintained Data

The first thing to decide about centrally maintained data is who is the central authority responsible for the maintenance. Options include the directory administrator or a third party. In either case you'll need to think about how the data is maintained, the frequency of updates, the effect updates will have on the operation of the directory, and other issues. These choices are summarized in the following list:

Online or offline update . Do the updates come in over LDAP, or do they come in via a file or other format imported into the directory service by other means? Typically, such imports go directly into the underlying directory database. If possible, have the updates come in over LDAP. That way, you can use the normal LDAP security mechanisms to protect the directory and avoid taking the directory down.
Update security . If the information being updated is sensitive and has to travel over insecure connections on its way to the directory, you will need to consider how to protect it. You might use Secure Sockets Layer (SSL) or Transport Layer Security (TLS) to protect updates that come in over LDAP. Other solutions, such as Secure Shell (SSH), are available to protect non-LDAP data transfers.
Update process . If the update process is regular, you should almost certainly automate it as much as you can. If you can't, you need to consider alternatives. Who performs the updates? What kind of training do they need? What kind of support do you need to provide for them?
Exception handling . Any process, automated or not, has exceptions. Depending on the size of your database and the number of updates you process, these exceptions can be significant. Think about making exception handling as automated and low-cost as possible.
Update frequency . The frequency at which data is updated depends on many factors, including the volatility of the data, the timeliness required, and the consequences of out-of-date data. Updates should be scheduled with care to avoid affecting the service and inconveniencing users.
Data validation . You may be surprised at the low quality of data that creeps into some of the data sources around your organization. Use the directory update process as an opportunity to improve the quality of this data. Higher quality of the data will improve the quality of your directory service and serve as an incentive to the data source to increase the quality of its data.

Make sure that you fully understand the implications of each of your choices. For example, opting for an offline process may involve shutting down the directory during the update. Depending on the update frequency, shutting down may or may not be acceptable. Opting for an online updating process, however, may degrade the performance of the directory. An online updating process may not complete as quickly as an offline process.

Update security may have implications as well. Can you arrange to give the updating entity access to only the fields it is allowed to update? Or do you need to give it more access than it really should have? If the update is accomplished offline, consider the implications for the security of your system. For example, you may need to provide physical access to the machine. If this is not acceptable, you may need to act as the update agent.

You'll probably want to develop as automated a process as possible to save staff costs. How much will the development of this system cost you? If you opt to protect the security of online updates via technologies such as SSL or TLS, consider the effect that this process will have on your service. How much will service be degraded? Will you need hardware acceleration to get acceptable performance? These considerations are all implications of the update process.

Update frequency is another choice that can have many implications. There is a conflict between wanting the data in your directory to be as up-to-date as possible and wanting the service to always be up and performing at peak capacity. For example, an offline update process may force you to take the system down while you import data. This is typically the fastest way to get data into your directory, but it means that the server being updated is totally inaccessible during the update. This lack of access during updates limits how often you can perform such an update. Extracting update data from the data source might also be an expensive process that cannot be performed often.

For an online update process, different concerns must be addressed. Depending on the number and complexity of updates being applied, online updates can significantly degrade your directory's performance. For example, consider updates from a database with 100,000 entries, each of which changes once an hour. Keeping absolutely up-to-date with these changes requires your directory to process 100,000 updates per hour . Multiply that by the number of replicas in your system; depending on the capabilities of your directory software and the load of other queries on the system, this may be too much. A more prudent approach may be to update the directory only once a day or even once a week. Make sure that you understand how up-to-date the information needs to be to be useful.

In general, we have found that the following principles help make centralized update processes safe and efficient:

Automate as much as possible . Automation reduces staff costs for updating, which can be one of the most expensive aspects of running your directory service.
Be prepared to handle exceptions and errors . No process, no matter how foolproof it seems, will be able to handle everything. Make sure that you train a group of people to field exception reports and take appropriate action.
Use an online update process whenever possible . Updating online maximizes directory availability and allows you to use the directory's built-in security features to protect the updates.
Keep logs so that you can figure out what went wrong . Logs are invaluable when you are first developing and testing the system. Keeping detailed logs of the update process will continue to be valuable as your service moves into production.

Here's one final word of advice: You can save yourself a lot of work and potential for trouble if you avoid centralized update processes altogether. The closer you can push update responsibility to the owners of the data being updated ”be they human resource administrators or end users ”the better off you will be.

User-Maintained Data

Some data contained in your directory is best maintained by the users to which it pertains. This kind of data includes things such as home address and phone number, vehicle license number, user-owned mailing lists, and other information that the user has the most interest in seeing up-to-date. Having users update this information themselves can be a benefit to both administrators and users for the following reasons:

Accuracy of the data . For many kinds of user information, the most accurate source is the users themselves. This is true of information such as home addresses and telephone numbers , which users know better than you do. It's also true of other information ”such as title, salary, manager, and so on ”that you might think would be better known by corporate sources. Rarely do you want to allow users to change this information, but you should make it easy for them to request that changes be made.
Less work for you . There is no question that maintaining data can be a labor- intensive process; even automated systems do not relieve you of all this burden. However, these systems must be developed, maintained, and monitored, so the more you can distribute this task across your entire user population, the better.
Empowerment of your users . The more empowered users feel, the happier they are likely to be. This is especially true today because users are inundated with information. Personal information is maintained in literally dozens, if not hundreds, of databases across the globe, usually with little or no opportunity for it to be corrected when it's wrong. Although your directory service cannot address the larger problem, it can provide users with the opportunity and the means to correct mistakes they notice. Make sure that you ask users to update only the data for which they are the best source.

Despite these advantages, there are still problems to be overcome before you enable your users to update their own information. Although user updating of data has many long- term benefits to you, your directory service, and your users, you will have to make an investment up front to enable this process. In most cases, however, this investment is well worth the cost.

User-maintained data is not a panacea; it comes with its own set of problems. For example, users often do not have the motivation or expertise required to perform updates. They might not realize the consequences of out-of-date information, or they might not have the time to take care of it. You will probably want to take control of information that you deem critically important to providing a good directory service. Another good approach is to train a small group of responsible users, such as administrative assistants or system administrators, to perform the updates.

Update-Capable Clients

The first thing a user requires to update his own entry is a client capable of performing the update. Depending on the type and capabilities of the service you have deployed, such clients may already be available. If not, consider three alternatives:

General-purpose client . A general-purpose client can be one of the many generic directory clients on the market. The advantage is that such clients come ready to go, requiring of you only configuration, distribution, and training. The costs involved can still be substantial, however. Make sure that the client you select has the functionality you require but is not so general-purpose as to be overwhelming to users. Users may find it frustrating if the client has capabilities they cannot use because of access control or other restrictions. Sometimes clients allow configuration changes to remove such extra options.
Special-purpose client . This might be an application you distribute to each user's desktop or a centralized Web application accessible from any Web browser. The latter option is attractive in terms of development and distribution costs.
Online request system . With this option, users don't directly make changes to their entries in real time. Instead they access the online request system to submit their changes. The request is then handled by administrative staff or an automated offline process. Such changes may be applied to the directory itself or to the source databases (an attractive feature). Another advantage is the human verification of update requests that can be performed if this method is being used. These advantages need to be weighed against the lack of instant update capability, which may confuse users.

See Chapter 21, Developing New Applications, for tips and techniques you can use to develop some of these applications yourself.

Authentication and Security

If your users have an update-capable client in their hands, they'll need to authenticate to the directory before it allows them to make changes. This is an important step toward protecting the security and integrity of information in the directory.

Chapter 12, Privacy and Security Design, discussed options for authentication in detail, so by this point you should already have an authentication and security plan. Part of your data maintenance plan must include provisions for distributing and maintaining authentication credentials. These provisions might involve passwords, certificates, or another form of credentials.

As soon as users have authenticated, you need to be concerned about the security implications of the updates they make. Again, this topic was covered in detail in Chapter 12. We will not repeat that discussion here, except to recall that different kinds of information require different kinds of protection. For example, public information needs its integrity protected, but privacy is not generally a concern. Sensitive information needs both privacy and integrity protection.

You should also apply access controls to your directory, again as described in Chapter 12. These controls ensure that users are able to update the fields you want them to update. Even more importantly, access controls ensure that users are unable to update data that you don't want them to update.

It's often important to maintain an audit trail when allowing users to update their own entries. Problems are inevitable, and you need to be able to distinguish among user errors, directory system errors, and security breaches. Maintaining an adequate audit trail showing who updated what and when they did it can go a long way toward resolving these problems. Most directory server software supports some kind of audit capability.

Training and Support Costs

One of the few downsides of allowing users to update their own entries is the extra cost of training and support that may be required. You need to balance the extra cost against the savings you get by not having to spend administrative staff time performing updates.

Training costs depend on the complexity of the update process you devise and the sophistication of your user community. If your update process is embodied in a self-explanatory Web application, your training costs will be minimal. If your update process is embodied in a more complicated standalone application, more extensive training may be required.

Support costs also can vary depending on the update process you devise. In addition to Help Desk calls received from users asking how to use the update applications, you can expect to receive calls from people wanting to update fields that you don't want to allow them to update. Be prepared to explain your data maintenance policy; if possible, publish the policy as widely as possible.

One good way to lower support costs is to be proactive about giving users as much information as possible. For example, if a user's manager is responsible for updating certain fields for the Human Resources department before the changes are reflected in the directory, explain this in your online documentation. A user reading this documentation may then go directly to her manager rather than bothering the Help Desk, which would not have the relevant information to help her.

System Effects

Just as with centrally maintained data, you need to consider the effects of your user-maintained data policies on the directory service. The effects concern your directory's performance, its replication system, the quality of data, and other factors.

Directory performance can be affected by user updates just as it can by other updates. This type of traffic tends to slow down the directory, and many updates coming in unchecked could even result in a denial of service to other users of the directory. This might happen because of a malicious user or simply because of an error in the way someone is using the directory. For example, consider a user who tries to write a little program to update his directory entry several times a day. If this program becomes caught in some kind of loop, the load on the directory can be substantial. Use your audit trail and normal directory performance monitoring methods to guard against problems like this.

Replication performance is also affected by the number of updates your directory processes. Too many updates can cause replication to become bottlenecked. The result may be long delays during which different copies of your directory contain different information. Although loose consistency like this is a fundamental characteristic of most directories, taken to extremes it can lead to user confusion and more calls to the Help Desk.

User-maintained data does not always cause more updates to the directory. It may well be that the same or even fewer updates result. The question is one of control. With user-maintained data, you have no control over how many updates are performed, when they are performed, and by whom. You should guard against an out-of-control user update problem, even if it is unlikely to occur. Also guard against users not updating their information in a timely manner, which leads to stale data in the directory.

Data Validation

Quality of data is another concern when you're allowing user updates. We stated that improving data quality is one of the main reasons for introducing user updates in the first place, so this may come as a bit of a surprise. Keep in mind, however, that although users may be able to provide up-to-date and accurate information, they are, after all, only human. Unintentional mistakes are often made, and you should do what you can to guard against such mistakes. Also keep in mind that user motivation to update data may be highly variable.

One good method of guarding against data quality problems is to screen the data entered by users. With data screening, a process inspects the data to be added or changed in the directory. If the data is found to be faulty, the update is not made. You can often catch syntactic errors this way, although rarely semantic ones. Examples of syntactic errors include entering an e-mail address that contains spaces, a telephone number with nonprintable characters , a JPEG photograph that does not conform to the JPEG standard, and so on. Examples of semantic errors include entering a valid but incorrect e-mail address, somebody else's telephone number, and more. Some semantic checks can be performed. For example, you can perform minimal verification of an e-mail address by looking up mail exchange (MX) or address (A) records in the Domain Name System (DNS). If a user enters the telephone number of an existing user, a simple directory search can be used to detect this duplication and reject the change.

You can screen users' data in two places. First, you can modify the clients used to update users' entries to do the screening, if you have access to the source code and have the in-house expertise. Some off-the-shelf clients already provide support for this kind of data validation, although most do not. This method has the disadvantage that no checking is done if a user finds a different way to update her entry, perhaps by using a different client.

Another option is to enable data validation in the directory servers themselves. This approach eliminates the possibility of a user bypassing the validation because all updates come to the directory servers. As you learned in Chapter 2, Introduction to LDAP, attributes have a syntax associated with them that determines the kind of information their values can contain. Many directory servers provide validation at this basic syntax level. This is enough to keep nonprintable characters out of phone numbers and e-mail addresses, but it's probably not enough to check the syntax of an e-mail address, and it is certainly not enough for checking something like a special employee number.

To perform more elaborate data validation checks such as those shown in the last few examples, you need the ability to change the directory's behavior. There are different ways of doing this, although few directory software products provide this capability. Netscape Directory Server 6 provides a set of plug-in interfaces that allow you to write a bit of code to perform a validation. (The Value Constraint plugin example from Chapter 4, Overview of Netscape Directory Server, is an example of code that uses this set of interfaces.) This code then gets plugged into the directory server and called before an update operation is executed. Your plug-in has the option of refusing the operation (if the data is really messed up) or changing the data to an acceptable format (to remove dashes from an employee number, for example). Make sure that your directory software supports the kind of validation you need.