Understanding and Deploying LDAP Directory Services > 24. Case Study: A Large University > Maintenance |
MaintenanceMaintenance of the Big State directory was relatively simple, but not without its problems. Some of the more tedious and time-consuming aspects of system maintenance are covered in the following sections. Data Backup and Disaster RecoveryThree approaches are taken to provide backup and disaster recovery for the directory service. First, each machine providing service is backed up to magnetic tape regularly. These tapes are saved for six weeks and rotated periodically. This kind of backup is used primarily to guard against any of the single systems becoming corrupted. The system's configuration can be restored from the backup tape, and its directory can be repopulated using one of the techniques described later in this section. Second, directory replicas are also used as a backup service. If a secondary replica becomes corrupted, directory data can be restored from the master replica. If the master directory server becomes corrupted, it can be restored from the database of the most up-to-date replica server. This kind of procedure often provides more timely backups than the magnetic tape method. The tapes are typically updated with an incremental backup once per day, whereas directory replicas are constantly updated and kept in sync with the master. Third, the directory data is dumped to a text file and transferred to a different, secure machine every night. If the master and all replica directories become corrupt, they can be restored from this saved file, losing only the intervening changes. This kind of backup is helpful in recovering from such problems as an out-of-control administrative procedure that mistakenly deletes a bunch of entries from the directory. It is also helpful in recovering data mistakenly deleted or modified by a user .
Maintaining DataOne of the most trouble-prone and time-consuming tasks associated with the Big State directory service is data maintenance. There are several procedures related to this task:
The Big State directory is not unusual in the occasionally poor quality of its data. This problem leads to user complaints, which in turn lead to manual work by directory maintenance staff to correct the problems. These manual tasks can become quite burdensome and expensive. MonitoringBig State has an extensive monitoring system in place that focuses on network devices such as routers, hubs, and server network interfaces. As is the case in many organizations, the group that provides this monitoring is distinct from the group that deployed the directory. The Big State directory designers wanted to leverage this existing monitoring infrastructure as much as possible when monitoring the directory system. The Big State monitoring system provides a mechanism for calling out to user-developed code to perform certain tests. The directory deployment team worked with the monitoring system maintenance staff to incorporate plug-in programs it wrote to perform a number of directory tests. These allowed directory alerts to be displayed on the monitoring system's trouble board, to be dealt with by monitoring system staff when appropriate. The directory deployment team developed and documented procedures to help the monitoring staff know what to do in case of a directory alert. Initially, these procedures usually specified paging or emailing a directory team member, depending on the severity of the event. As both the monitoring and directory teams became more comfortable with the service, procedures were updated to allow the monitoring staff to troubleshoot certain problems. Some alerts were even automated, causing directory team members to be automatically paged in the event of a serious failure, such as the directory becoming unreachable or directory replication or email queues becoming inordinately large. Another aspect to the Big State monitoring system is log analysis. The directory team developed log analysis software to produce daily and weekly summaries of directory-related activities. This includes the number and types of operations the directory servers themselves handle, as well as statistics on important directory-enabled applications. For example, the periodic reports detailing usage of the phone book and email applications is invaluable for predicting capacity problems, justifying funding expenditures, and general public relations.
TroubleshootingBig State developed a number of troubleshooting procedures for dealing with directory problems. Following are some of the more interesting problems that led to the development of these procedures:
Several troubleshooting procedures developed from these problems. Perhaps the most important is related to problems with the email service. Step one in dealing with any email-related problem is to turn off email service. No harm is done by this because undeliverable mail is queued, usually for up to three days. Taking this step avoids the more serious error of bouncing email and buys the directory maintenance team valuable time to figure out and correct the problem. The mail loop problem has never been fully resolved, but a number of steps were taken to mitigate the problem. First, the mail delivery software was modified to detect and reject the simplest forms of mail loops that it could detect. Second, an automated process was developed to trawl the directory for other situations that were likely to cause mail loops, and administrators were alerted of suspected problems. Third, the directory monitoring software was improved to more quickly and accurately detect mail loops when they occur. Finally, better tools were developed to recover from mail loop disasters. These tools make it easier to hunt down and delete the loop-generating messages clogging the mail system. Other tools were developed to help detect data feed disasters before serious damage can occur. These include changes to the munge program to look for large, unexpected changes in the user population during the monthly data merge. Large changes are reported to administrators who can ensure they are legitimate . Tools were also developed to detect and recover from replication failures. The directory monitoring system has been augmented with tests to alert administrators if replication appears stuck (for example, if the replication queue gets too big). Tools were also developed to make the process of recovering from a replication failure easier. These include scripts that automate the process of creating one directory replica from another. The process of clearing out "bad" changes in a replication log remains manual, although more recently directory administrators have created surgical techniques for repairing entries damaged by replication errors without causing service interruptions.
|
Index terms contained in this sectionautomated data source loadingBig State University case study backups Big State University case study 2nd 3rd 4th Big State University case study maintenance 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th backups 2nd 3rd 4th data 2nd 3rd disaster recovery 2nd monitoring 2nd 3rd 4th troubleshooting 2nd 3rd 4th 5th bulk data source loading Big State University case study case studies Big State University maintenance 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th content directory administration Big State University case study data maintenance Big State University case study 2nd 3rd directories case studies Big State University 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th content administration Big State University case study disaster recovery Big State University case study 2nd infinite loops troubleshooting 2nd failures replication troubleshooting 2nd loops troubleshooting 2nd maintenance Big State University case study 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th backups 2nd 3rd 4th disaster recovery 2nd monitoring 2nd 3rd 4th troubleshooting 2nd 3rd 4th 5th data Big State University case study 2nd 3rd zsee also disaster recovery troubleshooting monitoring Big State University case study 2nd 3rd 4th performance monitoring Big State University case study 2nd 3rd 4th replciation failures troubleshooting 2nd troubleshooting Big State University case study 2nd 3rd 4th 5th data feed disasters 2nd infinte email loops 2nd replication failures 2nd |
2002, O'Reilly & Associates, Inc. |