|    Hardware and human-caused errors are unavoidable. There may be occasions when you need to recover lost DS data, such as group membership information or user  Home Directory  attribute values. There may be other times when the hard drive hosting NDS/eDirectory fails and you need to recover as fast and as completely as possible. The following sections cover the following scenarios:    -  
 Group membership recovery     -  
 Home Directory attribute recovery     -  
 Recovering from a server crash     -  
 Loss of all replicas except for subrefs in a replica ring        Group Membership Recovery   One situation that we have seen occur is that an administrator accidentally deleted group memberships from a large number of users. In this circumstance, the administrator was attempting to add a number of users to a group by using the UImport utility. Unfortunately for the site in question, the administrator used a control file that specified  REPLACE VALUE=Y  , resulting in the new group membership being added but original group memberships being deleted. Because these group memberships were used to assign rights in the file system and determine which applications are available to each user, this became a big problem very quickly.    Fortunately, the change was made off-hours, so the immediate impact was minimal. Of even more importance was the backup of the DS tree that had been made several weeks earlier. Although it is true that in many cases backups of DS are not of much use, in this case, the backup did contain a large percentage of the users in the tree and the information necessary to rebuild the majority of the users' group memberships.    The following tools were used for this recovery:     The first step in this recovery was the restoration of the old DS group information. The backup product used was only capable of restoring to a server named the same as the server the backup was taken from. In order to accommodate this limitation, we took a lab server from our isolated network and renamed it. Next, the DS tree was restored to that server. To ensure that the dependencies for group memberships were restored properly, we restored the data twice.    While the DS tree was being restored on the isolated network, two  awk  scripts were developed. The first script was designed to create a batch file to list the group memberships for each user listed in the original UImport data file. Because the number of users affected was about 100 out of 5,000, it did not make sense to restore group membership information for all users. Instead, a text file with the desired list of users and their contexts was created, using the following format:    .UserID1.Context1 .UserID2.Context1 .UserID3.Context2      The following  awk  script was used to parse the preceding information into a batch file:    BEGIN { print "del grpinfo.txt" } {      count = split(  BEGIN { print "del grpinfo.txt" } { count = split($0, object, ".") printf("cx ") for (x=3; x<= count; x++) printf(".%s", object[x]) printf("\n") printf("nlist user = " object[2]) printf(" show \"group membership\" >> grpinfo.txt\n") }  , object, ".")      printf("cx ")      for (x=3; x<= count; x++)           printf(".%s", object[x])      printf("\n")      printf("nlist user = " object[2])      printf(" show \"group membership\" >> grpinfo.txt\n") }     The resulting batch file looks like this:    del grpinfo.txt cx .Context1 nlist user = UserID1 show "group membership" >> grpinfo.txt cx .Context1 nlist user = UserID2 show "group membership" >> grpinfo.txt cx .Context2 nlist user = UserID3 show "group membership" >> grpinfo.txt      When the DS restore finished, the batch file was run to generate a file called  GRPINFO.TXT  , showing all group memberships for the user objects in question. This  GRPINFO.TXT  file was in the following format:    Object Class: User Current context: Context1 User: userID1         Group Membership: Group1.Admin.Groups.Admin...         Group Membership: Group2.XYZCorp One User object was found in this context One User object was found. Object Class: User Current context: Context1 User: userID2         Group Membership: Group3.Admin.Groups.Admin...         Group Membership: Group4.XYZCorp One User object was found in this context One User object was found. Object Class: User Current context: context2 User: userID3         Group Membership: Group1.Admin.Groups.Admin...         Group Membership: Group2.XYZCorp One User object was found in this context      This file was then parsed, using a second  awk  script, to create the final data file used for the new run of UImport. This data file is in a format that is usable by UImport:    ".userID1.context1",".Group1.Admin.Groups.Admin" ".userID1.context1",".Group2.XYZCorp" ".userID2.context1",".Group3.Admin.Groups.Admin" ".userID2.context1",".Group4.XYZCorp" ".userID3.context2",".Group1.Admin.Groups.Admin" ".userID3.context2",".Group2.XYZCorp"      You should note a couple things about the data file created. First, the user ID contains a leading dot. This is done so the script can be run from any context and so the input is valid. The second thing you should notice is that there are multiple entries for a given user ID, but UImport handles these entries just fine.    The challenge is in parsing the trailing dots on the group memberships in  GRPINFO.TXT  and coming up with a script that works reliably to perform the conversion. The following is the  awk  script that does this:    /Current context:/ { cx =  } /User:/ {cn = } /Group Membership:/ {      printf("\".%s.%s\",", cn, cx)      gsub(/\tGroup Membership: /, "")      grptmp =  /Current context:/ { cx = $3 } /User:/ {cn = $2} /Group Membership:/ { printf("\".%s.%s\",", cn, cx) gsub(/\tGroup Membership: /, "") grptmp = $0 num = split(cx, tmpcx, ".") counter = 1 while (substr(grptmp, length(grptmp)) == ".") { counter++ sub(/\.$/,"",grptmp) } printf("\".%s", grptmp) for (y=counter;y<=num;y++) { printf(".%s", tmpcx[y]) } printf("\"\n") }  num = split(cx, tmpcx, ".")      counter = 1      while (substr(grptmp, length(grptmp)) == ".")      {           counter++           sub(/\.$/,"",grptmp)      }      printf("\".%s", grptmp)      for (y=counter;y<=num;y++)      {           printf(".%s", tmpcx[y])      }      printf("\"\n") }     This script counts the number of trailing dots and compares that to the number of parts in the current context. It then removes the leading portions of the current context until it runs out of dots at the end of the group name . Next, it concatenates the group name to the remaining portion of the context, which results in the correct context for the group.    When the new data file is created, we created a control file that uses two fields: one for the user login ID and one for the group membership being processed . Upon watching the run of UImport, we were able to determine which user IDs had been moved or deleted. Even though not all the users were covered in this fix, there were sufficient users fixed to prevent a major outage the following day. In total, out of 100 users, only about 10 had to be modified.     NOTE     This example serves as a reminder that a disaster recovery solution need not be a 100% solution; if you can automate a large portion of the work in a reasonable amount of time, any remnants can be handled by hand or on a case-by-case basis.             REAL WORLD: Programmatically Adding a User to a Group    If instead of using an existing application such as UImport to add users to a group, you are developing your own application, you should be aware of a few things. The action of adding a user to a group involves a total of four major changes in   DS  :    -  
  Add the user's   DN   to the group's   Member   attribute.      -  
  Add the user's   DN   to the group's   Equivalent to Me   attribute.      -  
  Add the group's   DN   to the user's   Group Membership   attribute.      -  
  Add the group's   DN   to the user's   Security Equals   attribute.          The current   DS   module does not automatically make these four changes happen simultaneously . Therefore, if you are writing a program to accomplish this task, you must make all four of these changes in your program's code. If you use the NWUsrGrp ActiveX control in the Novell Developer Kit (  NDK  ), it performs the four necessary steps for you when it adds a user to a group or deletes a user from a group. However, if you use the NWDir or NWIDir controls, you need to code the four steps as part of your program logic.     |          Home Directory Attribute Recovery   It is a fairly common occurrence that upon fixing certain DS- related issues, the Home Directory attributes of User objects are lost. As discussed earlier in this chapter, in the "Unknown Objects" section, when an object that is referenced by any DS attribute is removed from the tree, that DS attribute's value is automatically cleared. Because Home Directory is a single-valued attribute, clearing its value means deleting the attribute.     TIP     The procedures discussed here can also be used to update existing   Home Directory   values when you physically move the folders from one volume or server to another.         The  Home Directory  attribute uses the  SYN_PATH  syntax and references a  Volume  object in its value. If, for any reason, that  Volume  object is removed from the tree, the  Home Directory  attribute is cleared. You can repopulate this value fairly easily by using one of the following methods :    -  
 Generate a text file that contains the username and home directory information and then use UImport to update the  User  objects. The text file would look something like this:    ".  userID.context  ", ".  volume_object.context  :\  path  "       -  
 Generate a text file that contains the username and home directory information and then use Import Convert Export (ICE) to update the  User  objects via LDAP. The LDIF file would look something like this:    version: 1 dn: cn=  username  ,ou=  context  ,o=  context  changetype: modify ndshomedirectory: cn=  vol_object  ,ou=  context  ,o=  context  #0#\users       -  
 The preceding two solutions require you to create a separate record for each user because the path of the home directory is unique for every user. An easy alternative is to use Homes (www.novell.com/coolsolutions/tools/1568.html), with which you can simply select a starting context and set the  Home Directory  attribute for all users inside a container (see Figure 11.18).     Figure 11.18. Setting home directory information by using Homes.               Recovering a Crashed SYS Volume or Server   One of the most-asked questions in any network is, "How do I correctly recover from a crashed server?" For those of you who have worked with NetWare 3, you know it's quite straightforward: Install a new server, restore the bindery from a backup, and restore your file system. In the case of a single-server NDS/eDirectory network, the process is pretty much the same as that with NetWare 3: Install a new server, restore DS from a backup, and then restore your file system. Because of the distributed nature of DS, however, things are a little more interesting when you have a multiserver NDS/eDirectory network.    To successfully recover from a lost server in a multiserver environment, it is essential that you maintain a regular backup of the server-specific information (SSI) files for all the DS servers on your network. (Chapter 8 discusses the situation for eDirectory.) It would also be helpful if you have up-to-date documentation about your DS tree, such as where  NCP Server  and  Volume  objects are located. You should also have a record of the partitions and a list of servers where the Master and various other replicas are stored. Finally, you should have the correct license file(s) for the crashed server.     NOTE     The process for recovering from a crashed hard drive where   NDS   /eDirectory resides (such as the   SYS   volume on NetWare) is the same as having a dead server because your   DS   is gone.          NOTE     For more information about   SSI   files and their purposes, see Chapter 8  .        The following are the steps you need to take to restore a crashed server or a  SYS  volume in a multiserver DS environment when you  don't  have a current set of SSI data available:    -    Don't panic! 
        -   Reconfigure time synchronization configuration in the tree, if necessary. 
       -   Create a  Computer  object in the tree to act as a placeholder for server references. 
       -   Use SrvRef (see ftp://ftp.dreamlan.com/srvref.zip) to replace server references in the tree (see Figure 11.19). 
        Figure 11.19. Replacing server references.              -   Delete from the tree the old  NCP Server  object for the failed server. Do not delete the associated  Volume  objects, however. Leave them intact to preserve references that other objects (such as  Directory Map  objects) may have to these objects as well as any DS trustee assignments made. 
       -   If the failed server held a Master replica of any partition, go to another server in the replica ring that has either a Read/Write or Read-Only replica and use DSRepair to promote that replica to a Master. Repeat this step for every master replica stored on the failed server. Then clean up the replica rings to remove the downed server from the lists. (See the "Replica Ring Inconsistency" section, earlier in this chapter, for details.) 
        TIP       After your replica ring cleanup, you should spot-check the DSTrace output on a number of servers to see whether the replica rings are okay and that everything is synchronizing correctly. You   do not   want to install a server into a tree that's not fully synchronized.         -   Rebuild the crashed server by using existing documentation. Ensure that the same server name, volume names , IPX/IP addresses, and so on are used. Install the server into a separate temporary tree. 
       -   If you are just recovering a lost  SYS  volume, load  DSREPAIR.NLM  with the  -XK6  switch (which deletes all volume trustees) and then perform a Check Volume Objects and Trustees operation. When prompted to make the change on the  SYS  volume, answer No; for all other volumes , answer Yes. See TID #10013535 for details on this step. (This is Step 22 in TID #10013535.) 
       -   Remove NDS from the rebuilt server. 
       -   Reconfigure the time synchronization setting on the rebuilt server, if necessary. 
       -   Install the rebuilt server back into the production tree, using the same context the original server was installed in. 
       -   Use SrvRef to restore server references in the tree. 
       -   Restore data and trustee information to the server. You should be careful when restoring the  SYS  volume data so that you don't overwrite any new support pack files with older ones. If you've made modifications to your  AUTOEXEC.NCF  file, you should ensure that the older copy from your backup does not overwrite it. 
       -   Reestablish replica information by using ConsoleOne. (You might want to wait until after-hours and after the data restoration has completed.) 
       -   Reinstall licenses, if necessary. 
       -   Reinstall any server-based applications, such as BorderManager. 
       -   Reissue any SSL certificates for the recovered server, as necessary. 
       -   Delete the temporary  Computer  placeholder object from the tree. 
          When restoring files to a volume that was nearly full during the backup, you might run into insufficient disk space issues. This is especially true when volume compression is used. Although SMS-compliant backup software can back up and restore a compressed file in its compressed format, that's not the default in most backup software; therefore, chances are good that you'll restore previously compressed files in their uncompressed format. And because compression is a background operating system process, files are not compressed until the compression start time is reached. You can, however, flag files as immediate compress, but that's an extra manual step you have to take. And afterward, you have to remember to undo the flag or else the files will always be compressed again after access, causing unnecessarily high server utilization.    Another volume-related issue that you can get caught with during a restoration is suballocation. Again, because it is a background process, files are not suballocated as they are restored; therefore, if you're restoring many (small) files, you can run out of disk space before the complete restoration is done.    To work around these two disk space problems, it is best that you try to maintain at least 15% to 20% free disk space on each volume. Even better, you should make certain that the replacement drive capacity is larger.    After the restoration of the file system is complete, you should restart the server yet one more time to ensure that the restoration didn't overwrite any important system files. Then you should perform a spot-check on some of the restored directories and files to check for correct trustee assignments, file ownerships, and so on. You should also spot-check DS objects to ensure that you don't have any  Unknown  or renamed objects.    Subordinate References Only in the Replica Ring   The steps discussed in the section "Recovering a Crashed SYS Volume or Server" work well when you have replicas on other servers to recover DS information from; however, there is also the (very) unlikely situation where you lose one partition within the tree and, for some reason, no replica of that partition exists. What can you do? First of all, take a deep breath and don't panic! Depending on the partition location within the tree structure, all may not be lost.    Consider the sample DS tree shown in Figure 11.20. Two of the servers in this tree contain the following replicas:     Figure 11.20. If  FS2  is lost, a hole exists in the DS tree between  OU=B  and  OU=E  .                 |     Server   FS1     |      Server   FS2     |       |    Master of  [Root]     |      |     |    Master of B    |      |     |    SubRef of C    |     Master of C    |     |    Read/Write of E    |     Master of E    |           NOTE     Because Server   FS1   has a copy of B (the parent) but not C (the child),   DS   automatically placed a SubRef replica of C on server   FS1  .        If Server  FS2  is lost due to hardware failure and no other servers hold a replica of C, you lose the only full replica of the C partition. (SubRef replicas are not full replicas, and they contain only enough information to locate other replicas and track synchronization.) When this happens, you have a hole in the DS tree between  OU=B  and  OU=E  . You can't use any of the procedures discussed earlier in this chapter to recover the C partition because no other full replicas exists.    In this scenario, where a SubRef replica of the lost partition exists, it is possible to rebuild the links to the lost portion of the DS tree and then perhaps restore the objects from a recent backup. The following procedure explains how you may recover from the loss of a single partition in a multipartition tree and have no full replicas of that partition:     WARNING     The following procedure may not work for all cases and, therefore, you should consider acquiring the assistance of Novell Support to rebuild the links to the missing partition in your tree. At the very least, you should test the procedure in a lab environment before ever using it in a production environment.         -    Don't panic! Don't attempt any DS recovery or repair procedures. 
        -   Follow the steps outlined in the section "Replica Ring Inconsistency," earlier in this chapter, to clean up the replica rings for other partitions that have replicas on this crashed server, and make sure your other partitions are synchronizing without errors. 
       -   If more than one server has a SubRef replica of the lost partition, choose one to work with. The best choice would be a server that has the least number of replicas on it. 
       -   On the server chosen in step 3, load DSRepair with the  -A  command-line switch and promote the SubRef replica to a Master by using the steps outlined earlier in this chapter, in the "Replica Ring Inconsistency" section. This changes the SubRef replica into a real replica; however, because a SubRef replica doesn't contain any object information, the recovered replica will be empty. 
    Depending on your replica placement of this lost partition, SubRef replicas of this partition on other servers may be upgraded to Read/Write replicas.        -   Use DSTrace to check that this partition is synchronizing correctly. If it is not, you should consider opening an incident with Novell for further assistance. 
       -   When the replica ring is synchronizing, use your most recent backup to perform a selective restoration of the DS objects that were in the lost partition. Take note of any objects in other parts of the tree that may have turned into  Unknown  objects due to loss of their mandatory attributes. You may need to do a selective restoration on those objects or re-create them. 
    Re-create any bindery objects and DS objects (such as print queues) that depend on object IDs. Reassign DS object trustee assignments, if necessary.           If you don't have a SubRef replica to work with, you need to first make sure no one attempts  any  repair operations because they could make a bad situation worse . Then you should open a call with Novell Support for assistance.    |