In Chapter 7, Data Design, you learned how to identify, locate, and obtain the data needed to populate your directory. The result of that phase of your design process was a table listing all your data sources, the data you need from each source, and the procedures you will use to obtain the data. This information plays an important role in formulating your data maintenance policy. Your data maintenance policy determines who is responsible for maintaining which attributes in the directory. If more than one entity is allowed to maintain an attribute, your data maintenance policy determines how conflicts are resolved. For example, suppose that users and the human resources databases are both allowed to update the telephoneNumber attribute. What do you do if both sources update the attribute with different values at the same time? Your data maintenance policy should outline procedures to determine the answer to this question. It should also determine the frequency of updates for pieces of information in the directory and the security those updates require. Another important procedure determined by your data maintenance policy is how exceptions are handled. Every policy has exceptions. Do not fool yourself into thinking your data maintenance policy is the exception to this rule. There are potentially as many reasons to make exceptions to your data maintenance policy as there are users of your directory. For example, consider an operation that obtains home address information from the official payroll database. Although this might be just fine for the vast majority of your employees , consider an employee who has his checks mailed to a location different from his home. Including this mailing address in the employee's home contact information would be just plain wrong. You may want to make an exception for this user . How you handle data maintenance exceptions will have a great impact on the cost of maintaining your directory service. Consider the number of exceptions in relation to the projected size of your directory. Is the policy you choose applicable to 90 percent of your users? 99 percent? all of them? Make some educated guesses here, and realize that the larger your directory becomes, the more important it is that the policy be nearly universal. For example, consider a small directory containing entries for 100 people. If your policy correctly covers 99 out of these 100 users, it is a relatively small burden to do something special for the remaining person. On the other hand, if your directory supports an e-commerce application and contains 1 million entries, 99 percent coverage is pretty close to a disaster: You would have to make special exceptions for 10,000 people, a task that can become arduous and expensive. Consider automating the exception process itself. This may sound counterintuitive, but it is possible in some situations. For example, you might allow users to modify their own entries in a way that exempts them from the standard data policy. One way to do this would be to have the user set a flag in his or her entry. The flag would then be read by the automated data maintenance procedure, prompting special handling of the user's entry. For the purposes of this chapter, we separate the attributes in your directory into six categories:
You probably don't need to worry about attributes that the directory itself maintains. Application-maintained data is not up to you to maintain, but it's important that you monitor and approve how applications use the directory. In the following discussion we also consider centrally maintained data ”that is, data maintained by administrators and other data sources ”and user-maintained data. Application-Maintained DataData maintained by directory-enabled applications may end up being the majority of the data in your directory. The health of this kind of data and the applications that maintain it can have a tremendous effect on the quality and performance of your directory service. Make sure that the directory-enabled applications that you or others develop use the directory in an efficient and sensible way. Be proactive when you do this; don't wait for application developers to come to you. There are many aspects of efficient and high-performance directory-enabled application development. A tutorial on directory-enabled application development is beyond the scope of this book, but the following list highlights some of the more important things to remember. Following these tips will go a long way toward making any directory-enabled application perform better:
There are many techniques for communicating these guidelines to application developers. Among the more effective methods we've found are the following:
You may not have the resources to implement all these techniques. Think about the kind of community you're dealing with and which techniques will give you the biggest bang for your buck. Make sure you document the policy to which applications must conform. This documentation will give you an objective tool to use in judging and correcting applications. Centrally Maintained DataThe first thing to decide about centrally maintained data is who is the central authority responsible for the maintenance. Options include the directory administrator or a third party. In either case you'll need to think about how the data is maintained, the frequency of updates, the effect updates will have on the operation of the directory, and other issues. These choices are summarized in the following list:
Make sure that you fully understand the implications of each of your choices. For example, opting for an offline process may involve shutting down the directory during the update. Depending on the update frequency, shutting down may or may not be acceptable. Opting for an online updating process, however, may degrade the performance of the directory. An online updating process may not complete as quickly as an offline process. Update security may have implications as well. Can you arrange to give the updating entity access to only the fields it is allowed to update? Or do you need to give it more access than it really should have? If the update is accomplished offline, consider the implications for the security of your system. For example, you may need to provide physical access to the machine. If this is not acceptable, you may need to act as the update agent. You'll probably want to develop as automated a process as possible to save staff costs. How much will the development of this system cost you? If you opt to protect the security of online updates via technologies such as SSL or TLS, consider the effect that this process will have on your service. How much will service be degraded? Will you need hardware acceleration to get acceptable performance? These considerations are all implications of the update process. Update frequency is another choice that can have many implications. There is a conflict between wanting the data in your directory to be as up-to-date as possible and wanting the service to always be up and performing at peak capacity. For example, an offline update process may force you to take the system down while you import data. This is typically the fastest way to get data into your directory, but it means that the server being updated is totally inaccessible during the update. This lack of access during updates limits how often you can perform such an update. Extracting update data from the data source might also be an expensive process that cannot be performed often. For an online update process, different concerns must be addressed. Depending on the number and complexity of updates being applied, online updates can significantly degrade your directory's performance. For example, consider updates from a database with 100,000 entries, each of which changes once an hour. Keeping absolutely up-to-date with these changes requires your directory to process 100,000 updates per hour . Multiply that by the number of replicas in your system; depending on the capabilities of your directory software and the load of other queries on the system, this may be too much. A more prudent approach may be to update the directory only once a day or even once a week. Make sure that you understand how up-to-date the information needs to be to be useful. In general, we have found that the following principles help make centralized update processes safe and efficient:
Here's one final word of advice: You can save yourself a lot of work and potential for trouble if you avoid centralized update processes altogether. The closer you can push update responsibility to the owners of the data being updated ”be they human resource administrators or end users ”the better off you will be. User-Maintained DataSome data contained in your directory is best maintained by the users to which it pertains. This kind of data includes things such as home address and phone number, vehicle license number, user-owned mailing lists, and other information that the user has the most interest in seeing up-to-date. Having users update this information themselves can be a benefit to both administrators and users for the following reasons:
Despite these advantages, there are still problems to be overcome before you enable your users to update their own information. Although user updating of data has many long- term benefits to you, your directory service, and your users, you will have to make an investment up front to enable this process. In most cases, however, this investment is well worth the cost. User-maintained data is not a panacea; it comes with its own set of problems. For example, users often do not have the motivation or expertise required to perform updates. They might not realize the consequences of out-of-date information, or they might not have the time to take care of it. You will probably want to take control of information that you deem critically important to providing a good directory service. Another good approach is to train a small group of responsible users, such as administrative assistants or system administrators, to perform the updates. Update-Capable ClientsThe first thing a user requires to update his own entry is a client capable of performing the update. Depending on the type and capabilities of the service you have deployed, such clients may already be available. If not, consider three alternatives:
See Chapter 21, Developing New Applications, for tips and techniques you can use to develop some of these applications yourself. Authentication and SecurityIf your users have an update-capable client in their hands, they'll need to authenticate to the directory before it allows them to make changes. This is an important step toward protecting the security and integrity of information in the directory. Chapter 12, Privacy and Security Design, discussed options for authentication in detail, so by this point you should already have an authentication and security plan. Part of your data maintenance plan must include provisions for distributing and maintaining authentication credentials. These provisions might involve passwords, certificates, or another form of credentials. As soon as users have authenticated, you need to be concerned about the security implications of the updates they make. Again, this topic was covered in detail in Chapter 12. We will not repeat that discussion here, except to recall that different kinds of information require different kinds of protection. For example, public information needs its integrity protected, but privacy is not generally a concern. Sensitive information needs both privacy and integrity protection. You should also apply access controls to your directory, again as described in Chapter 12. These controls ensure that users are able to update the fields you want them to update. Even more importantly, access controls ensure that users are unable to update data that you don't want them to update. It's often important to maintain an audit trail when allowing users to update their own entries. Problems are inevitable, and you need to be able to distinguish among user errors, directory system errors, and security breaches. Maintaining an adequate audit trail showing who updated what and when they did it can go a long way toward resolving these problems. Most directory server software supports some kind of audit capability. Training and Support CostsOne of the few downsides of allowing users to update their own entries is the extra cost of training and support that may be required. You need to balance the extra cost against the savings you get by not having to spend administrative staff time performing updates. Training costs depend on the complexity of the update process you devise and the sophistication of your user community. If your update process is embodied in a self-explanatory Web application, your training costs will be minimal. If your update process is embodied in a more complicated standalone application, more extensive training may be required. Support costs also can vary depending on the update process you devise. In addition to Help Desk calls received from users asking how to use the update applications, you can expect to receive calls from people wanting to update fields that you don't want to allow them to update. Be prepared to explain your data maintenance policy; if possible, publish the policy as widely as possible. One good way to lower support costs is to be proactive about giving users as much information as possible. For example, if a user's manager is responsible for updating certain fields for the Human Resources department before the changes are reflected in the directory, explain this in your online documentation. A user reading this documentation may then go directly to her manager rather than bothering the Help Desk, which would not have the relevant information to help her. System EffectsJust as with centrally maintained data, you need to consider the effects of your user-maintained data policies on the directory service. The effects concern your directory's performance, its replication system, the quality of data, and other factors. Directory performance can be affected by user updates just as it can by other updates. This type of traffic tends to slow down the directory, and many updates coming in unchecked could even result in a denial of service to other users of the directory. This might happen because of a malicious user or simply because of an error in the way someone is using the directory. For example, consider a user who tries to write a little program to update his directory entry several times a day. If this program becomes caught in some kind of loop, the load on the directory can be substantial. Use your audit trail and normal directory performance monitoring methods to guard against problems like this. Replication performance is also affected by the number of updates your directory processes. Too many updates can cause replication to become bottlenecked. The result may be long delays during which different copies of your directory contain different information. Although loose consistency like this is a fundamental characteristic of most directories, taken to extremes it can lead to user confusion and more calls to the Help Desk. User-maintained data does not always cause more updates to the directory. It may well be that the same or even fewer updates result. The question is one of control. With user-maintained data, you have no control over how many updates are performed, when they are performed, and by whom. You should guard against an out-of-control user update problem, even if it is unlikely to occur. Also guard against users not updating their information in a timely manner, which leads to stale data in the directory. Data ValidationQuality of data is another concern when you're allowing user updates. We stated that improving data quality is one of the main reasons for introducing user updates in the first place, so this may come as a bit of a surprise. Keep in mind, however, that although users may be able to provide up-to-date and accurate information, they are, after all, only human. Unintentional mistakes are often made, and you should do what you can to guard against such mistakes. Also keep in mind that user motivation to update data may be highly variable. One good method of guarding against data quality problems is to screen the data entered by users. With data screening, a process inspects the data to be added or changed in the directory. If the data is found to be faulty, the update is not made. You can often catch syntactic errors this way, although rarely semantic ones. Examples of syntactic errors include entering an e-mail address that contains spaces, a telephone number with nonprintable characters , a JPEG photograph that does not conform to the JPEG standard, and so on. Examples of semantic errors include entering a valid but incorrect e-mail address, somebody else's telephone number, and more. Some semantic checks can be performed. For example, you can perform minimal verification of an e-mail address by looking up mail exchange (MX) or address (A) records in the Domain Name System (DNS). If a user enters the telephone number of an existing user, a simple directory search can be used to detect this duplication and reject the change. You can screen users' data in two places. First, you can modify the clients used to update users' entries to do the screening, if you have access to the source code and have the in-house expertise. Some off-the-shelf clients already provide support for this kind of data validation, although most do not. This method has the disadvantage that no checking is done if a user finds a different way to update her entry, perhaps by using a different client. Another option is to enable data validation in the directory servers themselves. This approach eliminates the possibility of a user bypassing the validation because all updates come to the directory servers. As you learned in Chapter 2, Introduction to LDAP, attributes have a syntax associated with them that determines the kind of information their values can contain. Many directory servers provide validation at this basic syntax level. This is enough to keep nonprintable characters out of phone numbers and e-mail addresses, but it's probably not enough to check the syntax of an e-mail address, and it is certainly not enough for checking something like a special employee number. To perform more elaborate data validation checks such as those shown in the last few examples, you need the ability to change the directory's behavior. There are different ways of doing this, although few directory software products provide this capability. Netscape Directory Server 6 provides a set of plug-in interfaces that allow you to write a bit of code to perform a validation. (The Value Constraint plugin example from Chapter 4, Overview of Netscape Directory Server, is an example of code that uses this set of interfaces.) This code then gets plugged into the directory server and called before an update operation is executed. Your plug-in has the option of refusing the operation (if the data is really messed up) or changing the data to an acceptable format (to remove dashes from an employee number, for example). Make sure that your directory software supports the kind of validation you need. |