Understanding and Deploying LDAP Directory Services > 17. Maintaining Data > The Data Maintenance Policy |
The Data Maintenance PolicyIn Chapter 6,"Data Design," you learned how to identify, locate, and obtain the data needed to populate your directory. The result of that phase of your design process was a table listing all your data sources, the data you need from each, and the procedures you will use to obtain the data. This information played an important role in formulating your data maintenance policy. Your data maintenance policy determines who is responsible for maintaining which attributes in the directory. If more than one entity is allowed to maintain an attribute, your data maintenance policy determineshow conflicts are resolved. For example, suppose that users and the human resources databases are both allowed to update the telephoneNumber attribute. What do you do if both sources update the attribute with different values at the same time? Your data maintenance policy should outline procedures to determine the answer to this question. It also should determine the frequency of updates for pieces of information in the directory and the security those updates require. Another important procedure determined by your data maintenance policy is how exceptions are handled. Every policy has exceptions. Do not fool yourself into thinking your data maintenance policy is the exception to this rule. There are potentially as many reasons to make exceptions to your data maintenance policy as there are users of your directory. For example, consider an operation that obtains home address information from the official payroll database. Although this might be just fine for the vast majority of your employees , consider an employee who has his checks mailed to a location different from his home. Including this address in the employee's home contact information would just be plain wrong. You may want to make an exception for this user . How you handle data maintenance exceptions will have a great impact on the cost of maintaining your directory service. Consider the number of exceptions in relation to the projected size of your directory. Is the policy you choose applicable to 90% of your users? 99%? More? Make some educated guesses here, and realize that the larger your directory gets, the more important it is that the policy be nearly universal. For example, consider a small directory containing entries for 100 people. If your policy correctly covers 99 out of these 100 users, it is a relatively small burden to do something special for the remaining person. On the other hand, if your directory contains entries for 100,000 people, 99% coverage is pretty close to a disaster: You would have to make specialexceptions for 1,000 people, a task that can become quite arduous and expensive. Consider automating the exception process itself. This may sound counterintuitive, but it is possible in some situations. For example, you might allow users to modify their own entries in a way that excepts them from the standard data policy. This might be accomplished by the user setting a flag in his or her entry. The flag is then read by the automated data maintenance procedure, prompting special handling of the user's entry. For the purposes of this chapter, we'll separate the attributes in your directoryinto six categories:
You probably don't need to worry about attributes the directory itself maintains. Application-maintained data is often not up to you to maintain, but it's important that you monitor and approve how applications use the directory. In the discussion that follows , we will also consider data maintained by administrators and other data sources, or centrally maintained data; and user-maintained data. Application-Maintained DataData maintained by directory-enabled applicationsmay end up being the majority of data in your directory. The health of this kind of data andthe applications that maintain it can have a tremendous effect on the quality and performance of your directory service. Make sure the directory-enabled applications developed by you or others use the directory in an efficient and sensible way. Be proactive when you do this; don't wait for application developers to come to you. There are many aspects of efficient and high-performance directory-enabled application development. It's beyond the scope of this book to give a tutorial on directory-enabled application development, but the following list highlights some of the more important things to remember. Following these tips will go a long way toward making any directory-enabled application perform better:
There are many techniques for communicating these guidelines to applicationdevelopers. Among the more effective methods we've found are the following:
You may not have the resources to implement all these techniques. Think about what kind of community you are dealing with and which techniques will give you the biggest bang for your buck. Make sure you document the policy that applications must conform to. This will give you an objective tool to use in judging and correcting applications. Centrally Maintained DataThe first thing todecide about centrally maintained data is who is the central authority responsible for the maintenance. This might be the directory administrator or some third party. In either case, you'll need to think about how the data is maintained, the frequency of updates, the effect updates will have on the operation of the directory, and other issues. These choices are summarized in the following list:
Make sure you fully understand the implications of each of your choices. For example, opting for an offline process may involve shutting down the directory during the update. Depending on the frequency of update, this may or may not be acceptable. Opting for an online updating process, however, may degrade the performance of the directory. An online updating process may not complete as quickly as an offline one. Update security may have implications as well. Can you arrange to give the updating entity access only to the fields it is allowed to update? Or, do you need to give it more access than it really should have? Ifthe update is accomplished via an offline process, consider the implications for the security of your system. For example, it may mean giving physical access to the machine. If this is not acceptable, it may mean that you need to act as the update agent. You'll probably want to develop as automated a process as possible to save staff costs. How much will the development of this system cost you? If you opt to protect the security of online updates via some technology such as SSL, consider the effect this will have on your service. How much will service be degraded? Will you need hardware acceleration to get acceptable performance? These considerations are all implications of the update process. Update frequency is another choice that can have many implications. There is a conflict between wanting the data in your directory to be as up-to-date as possible and wanting the service to always be up and performing at peak capacity. For example, an offline update process may cause you to have to take the system down while you import data. This is typically the fastest way to getdata into your directory, but it means that the server being updated is totally inaccessible during the update. This limits how often you can perform such an update. Extracting update data from the data source might also be an expensive process that cannot be performed often. For an online update process, different concerns must be addressed. Depending on the number and complexity of updates being applied, online updates can significantly degrade your directory's performance. For example, consider updates from a database with 100,000 entries, each of which changes once an hour. Keeping absolutely up-to-date with these changes requires your directory to process 100,000 updates per hour . Multiply that by the number of replicas in your system; depending on the capabilities of your directory software and the load of other queries on the system, this may be too much. A more prudent approach may be to update the directory only once a day or even once a week. Make sure you understand how up-to-date the information needs to be in order to be useful. In general, we have found that the following principles help make centralized update processes safe and efficient:
One final word of advice: You can save yourself a lot of work and potential for trouble if you avoid centralized update processes altogether. The closer you can push update responsibility to the owners of the data being updated ”be they human resource administrators or end users ”the better off you are. User-Maintained DataSome of the datacontained in your directory is best maintained by the users it pertains to. This kind of data includes things such as home address and phone number, vehicle license number, user-owned mailing lists, and other information that the user has the most interest in seeing up-to-date. Having users update this information themselves can be a benefit to both administrators and users for the following reasons:
Despite these advantages, there are still problems to be overcome before you enable your users to update their own information. Although user updating of data has many long- term benefits to you, your directory service, and your users, you will have to make an investment up front to enable this process. In most cases, however, this investment is well worth this cost. User-maintained data is not a panacea ”it comes with its own set of problems. For example, users often do not have the motivation orexpertise required to perform updates. They might not realize the consequences of out-of-date information, or they might not have the time to take care of it. You probably want to take control of information that you deem critically important to providing a good directory service. Another good approach is to train a small group of responsible users, such as administrative assistants or system administrators, to perform the updates. Update-Capable ClientsThe first thinga user requires to update his or her own entry is a client capable of performing the update. Depending on the type and capabilities of the service you have deployed, such clients may already be available. If not, there are three alternatives you should consider:
See Chapter 20, "Developing New Applications," for tips and techniques you can use to develop some of these applications yourself. Authentication and SecurityIf your users have an update-capable client in their hands, they'll need to authenticate to the directory before it allows them to make changes. This is an important step toward protecting the security and integrity of information in the directory. Chapter 11, "Privacy and Security Design," discussed options for authentication in detail, so by this point you should already have an authentication and security plan. Part of your data maintenance plan must include provisions for distributing and maintaining authentication credentials. These provisions might involve passwords, certificates, or some other form of credentials. As soon as users have authenticated, you need to be concerned about the security implications of the updates they make. Again, this was covered in detail in Chapter 11. We will not repeat that treatment here, except to recall that different kinds of information require different kinds of protection. For example, public information needs its integrity protected, but privacy is not generally a concern. Sensitive information needs both privacy and integrity protection. You should also apply access controls to your directory, as described in Chapter 11. These controls ensure that users are able to update those fields you want them to update. Even more importantly, access controls ensure that users are unable to update data that you don't want them to update. It's often important to maintain an audit trail when allowing users to update their own entries. Problems will inevitably occur, and you need to be able to distinguish between user errors, directory system errors, and security breaches. Maintaining an adequate audit trail showing who updated what and when they did it can go a long way toward resolving these problems. Most directory server software supports some kind of audit capability. Audit logs were discussed in detail in Chapter 11. Training and Support CostsOne of the few downsides of allowing users to update their own entries is the extra training and support costs that may be required. You need to balance these costs against the savings you get by not having to spend administrative staff time performing updates. Training costs depend on the complexity of the update process you devise and the sophistication of your user community. If your update process is embodied in a self-explanatory Web application, your training costs will be minimal. If your update process is embodied in a more complicated standalone application, there may be more extensive training required. Support costs also can vary quite a bit depending on the update process you devise. In addition to help desk calls received from users asking how to use the update applications, you can expect to receive calls from people wanting to update fields that you don't want to allow them to update. Be prepared to explain your data maintenance policy; if possible, publish the policy as widely as possible. One good way to lower support costs is to be proactive about giving users as much information as possible. For example, if a user's manager is responsible for updating certain fields for the human resources department before the changes are reflected in the directory, explain this in your online documentation. A user reading this may then go directly to his or her manager rather than bothering the help desk, which would not have the relevant information needed to help the user. System EffectsJust as with centrallymaintained data, you need to consider the effects of your user-maintained data policies on the directory service. The effects concern your directory's performance, its replication system, the quality of data, and other factors. Directory performance can be affected by user updates just as it can by other updates. This type of traffic tends to slow the directory down, and many updates coming in unchecked could even result in a denial of service to other users of the directory. This might happen because of a malicious user or simply because of an error in the way someone is using the directory. For example, consider a user who tries to write a little program to update his directory entry several times a day. If this program becomes caught in some kind of loop, the load on the directory can be substantial. Use your audit trail and normal directory performance monitoring methods to guard against problems like this. Replication performance is also affected by the number of updates your directory processes. Too many updates can cause replication to get backed up. This can cause long delays during which different copies of your directory contain different information. Although loose consistency like this is a fundamental characteristic of most directories, taken to extremes it can lead to user confusion and more calls to the help desk. It is not always true that user-maintained data causes more updates to the directory. It may well be the case that the same or even fewer updates result. The question is one of control. With user-maintained data, you have no control over how many updates are performed, when they are performed, and by whom. You should guard against an out-of-control user update problem, even if it is unlikely to occur. You should also guard against users not updating their information in a timely manner, which leads to stale data in the directory. Data ValidationQuality ofdata is another concern when allowing user updates. We stated that improving data quality is one of the main reasons for introducing user updates in the first place, so this may come as a bit of a surprise. Keep in mind that although users may be able to provide up-to-date and accurate information, they are, after all, only human. Unintentional mistakes are often made, and you should do what you can to guard against such mistakes. Also, keep in mind that user motivation to update data may be highly variable. One good method of guarding against data quality problems is to screen the data entered by users. With data screening, some process inspects the data to be added or changed in the directory. If the data is found to be faulty, the update is not made. You can often catch syntactic errors this way, although rarely semantic ones. Examples ofsyntactic errors include entering an email address that contains spaces, a telephone number with nonprintable characters , a JPEG photograph that does not conform to the JPEG standard, and so on. Examples of semantic errors include entering a valid but incorrect email address, somebody else's telephone number, and more. Some semantic checks can be performed. For example, minimal verification of an email address can be performed by looking up mail exchange (MX) or address (A) records in the DNS. If a user enters the telephone number of an existing user, a simple directory search can be used to detect this and reject the change. There are two places where you can screen users' data. First, you can modify the clients used to update their entries to do the screening. Some clients already provide support for this kind of data validation. You may be able to use this support, although if you have any special requirements you may be out of luck. An example of a special requirement might be an employee number field whose values must have a specific format. Most clients don't provide generic data validation support. This method also has the disadvantage that no checking is done if a user finds a different way to update her entry, perhaps by using a different client. Another option is to enable data validation in the directory servers themselves. This eliminates the possibility of a user bypassing the validation because all updates come to the directory servers. As we learned in Chapter 3, "An Introduction to LDAP," attributes have a syntax associated with them that determines the kind of information its values can contain. Many directory servers provide validation at this basic syntax level. This is enough to keep nonprintable characters out of phone numbers and email addresses, but it's probably not enough to check the syntax of an email address and certainly not for something like a special employee number. To perform more elaborate data validation checks such as these, you need the ability to change the directory's behavior. There are different ways of doing this, although few directory software products provide this capability. Netscape Directory Server provides a set of plug-in interfaces that allow you to write a little bit of code to perform a validation. This code then gets plugged into the directory server and called before an update operation is executed. Your plug-in has the option of refusing the operation (if the data is really messed up) or changing the data to an acceptable format (to remove dashes from an employee number, for example). Check to make sure your directory software supports the kind of validation you need.
|
Index terms contained in this sectionaccuracyuser-maintained data administrators attributes maintained by data validation exception handling maintenance policies online or offline updates 2nd 3rd update frequency 2nd update processes update security 2nd applications attributes maintained by communicating guidelines to developers maintenance policies minimizing connections minimizing searches minimizing updates performing only efficient searches retrieving only required attributes attributes application-maintained data communicating guidelines to developers minimizing connections minimizing searches minimizing updates performing only efficient searches retrieving only required attributes centrally maintained data data validation exception handling online or offline updates 2nd 3rd update frequency 2nd update processes update security 2nd data maintenance policies applications content administrators directory administrators directory services end users official data sources resolving conflicts user-maintained data accuracy authentication 2nd less work for you performance effects 2nd 3rd training and support costs 2nd update problems update-capable clients 2nd 3rd user empowerment validation 2nd 3rd 4th 5th authentication user updates 2nd automating centralized updated processes exceptions data maintenance policies clients update-capable 2nd 3rd general-purpose onlie request systems special-purpose conflicts resolving data maintenance connections minimizing application-maintained data consulting data maintenance policies content administrators attributes maintained by maintenance policies data application-maintained communicating guidelines to developers minimizing connections minimizing searches minimizing updates performing only efficient searches retrieving only required attributes centrally maintained data validation exception handling online or offline updates 2nd 3rd update frequency 2nd update processes update security 2nd maintenance policies 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th 16th 17th 18th 19th 20th 21st user-maintained accuracy authentication 2nd less work for you performance effects 2nd 3rd training and support costs 2nd update problems update-capable clients 2nd 3rd user empowerment validation 2nd 3rd 4th 5th directories data maintenance policies 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th 16th 17th 18th 19th 20th 21st documentation data maintenance policies end users attributes maintained by maintenance policies exceptions data maintenance policies 2nd automating handling centralized update processes centrally maintained data frequency updates centrally maintained data 2nd general-purpose clients user updates handling exceptions centrally maintained data laboratory testing data maintenance policies logs centralized update processes maintenance data policies 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th 16th 17th 18th 19th 20th 21st minimizing connections application-maintained data searches application-maintained data updates application-maintained data offline updates centrally maintained data 2nd 3rd online request systems user updates online update processes online updates centrally maintained data 2nd 3rd performance user update effects 2nd 3rd policies data maintenance 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th 16th 17th 18th 19th 20th application maintained data attribute categories centrally maintained data 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th 16th 17th 18th 19th 20th 21st 22nd 23rd 24th exceptions 2nd 3rd resolving conflicts processes update centrally maintained data sample code data maintenance policies searches minimizing application-maintained data performing only efficient application-maintained data security updates centrally maintained data 2nd seminars data maintenance policies services attributes maintained by maintenance policies sources (data) attributes maintained by maintenance policies special-purpose clients user updates testing laboratory data maintenance policies training data maintenance policies training costs user updates 2nd updates by users authentication 2nd performance effects 2nd 3rd training and support costs 2nd update-capable clients 2nd 3rd validation 2nd 3rd 4th 5th frequency centrally maintained data 2nd minimizing application-maintained data online or offline centrally maintained data 2nd 3rd processes centrally maintained data security centrally maintained data 2nd user-maintained data users attributes maintained by accuracy authentication 2nd less work for you maintenance policies performance effects 2nd 3rd training and support costs 2nd update problems update-capable clients 2nd 3rd user empowerment validation 2nd 3rd 4th 5th validation centrally maintained data user updates 2nd 3rd 4th 5th |
2002, O'Reilly & Associates, Inc. |