Understanding and Deploying LDAP Directory Services > 17. Maintaining Data > Checking Data Quality |
Checking Data QualityThe purpose of data maintenance is to ensure that the data in your directory service has the highest “possible quality. Quality of datahas several aspects, but we will focus primarily on the accuracy and timeliness of data. Naturally, you will want to check the quality of your data both to monitor how well your data maintenance procedures are working and to get an idea of the kind of service you are providing to the users of your directory. Bad data can creep into your directory service from a number of directions, including the following:
Methods of Checking QualityThere are several methods you can use to check the quality of data in your directory. The following are three common methods:
Source of truth and spot checking can be used to check the syntactic validity of information even when no source of truth database exists or is accessible. For example, you could read all (or a sampling) of the email address attributes in the directory and determine whether they are syntactically valid. Implications of Checking QualityIt's important to consider the implications of your data quality checking methods for the operation of your directory service. Be sure to choose a method that does not significantly reduce directory performance. Depending on the method you choose, you may have to make a trade “off between how often you check for quality and the accuracy of your checking methods. The main concerns in this area are methods that cause an excessive load on the directory or cause the directory to be unavailable. For example, consider a method that requires reading over LDAP all the entries in your directory. Your directory might have the capacity to respond to this kind of request without degrading performance for other users, but then again it might not. If you use a method like this, you can run the check at night or another off-peak time when the directory has plenty of extra capacity to respond to the data-checking requests . This may be difficult if your directory operates in a global environment in which there is no off-peak time. Another approach then is to create a dedicated directory replica that does nothing but process these data-verification tasks . Consider also a method that requires you to dump your directory's data to a file. Some directory server software allows you to perform this operation without taking the service down, but some does not. If you are planning to use this method, be sure the software you choose supports online production of the necessary extracts or that your service can tolerate the downtime. Remember that you have replication to help with the availability problem, so consider taking down a replica to produce the extract instead of taking down the master server. Also, consider producing your own extract over LDAP ”but be careful you don't degrade performance as discussed earlier. Correcting Bad DataWhatever method you use to check the quality of your data, be sure to investigate the cause any time you encounter an error. This will help you correct problems with the system that produced the bad data. Although this kind of investigation can be time-consuming and expensive, it's usually well worth it. You'll often find that many errors are caused by the same underlying problem. Fixing that problem can dramatically increase the quality of your data. Many underlying problems can cause bad data, some of which were already discussed briefly . Systematic errors in programs or procedures should be treated as bugs and corrected. Bad data introduced through human error might be the result of inadequate training or documentation for either users or administrators; increasing the quality and coverage of this training and documentation can cause corresponding quality increases in your data. Human error can also be the result of poor software design. Spend time with users and administrators responsible for updating the directory and observe the steps they take when maintaining data. This can often point out flaws in the software and procedures they use. Finally, even if you can't eliminate poor data coming into your directory, you can mitigate the damage by installing data-validation filters. As mentioned earlier, these filters can be installed in directory clients that users and administrators use to update the directory, or they can be installed in the directory service itself.
|
Index terms contained in this sectionadministratorserrors checking quality bad data correcting checking quality 2nd checking quality data maintenance 2nd 3rd 4th 5th 6th correcting bad data checking quality 2nd data maintenance quality-checking 2nd 3rd 4th 5th 6th directories data maintenance quality-checking 2nd 3rd 4th 5th 6th errors systematic checking quality user/administrator checking quality maintenance data quality-checking 2nd 3rd 4th 5th 6th quality data maintenance 2nd 3rd 4th 5th 6th bad source data correcting bad data 2nd sources of truth spot checks systematic errors user or administrator errors source of truth checking quality sources bad data checking quality spot checks checking quality systematic errors checking quality users errors checking quality |
2002, O'Reilly & Associates, Inc. |