Data Inconsistencies

Missing or lost essential attribute values can lead to data inconsistencies. Sometimes missing or lost values are due to human error (deleting a value or an object when not supposed to), but they can also be a result of system failure (such as time synchronization error or communication issues). The following sections examine the causes and possible solutions to the following situations:

Unknown objects
Renamed objects
Replica ring inconsistency
Replica inconsistency
Stuck obituaries

Unknown Objects

Novell management tools such as ConsoleOne use two different icons to represent unknown objects in DS: a yellow circled question mark and a cube with a black question mark beside it, as illustrated in Figure 11.13.

Figure 11.13. The two different types of unknown objects in ConsoleOne.

The cause for the white cubed question mark is completely different from that of the yellow circled question mark. The white cubed question mark means that ConsoleOne could not find the correct snap-in to associate the object with the necessary icon. Essentially , it is saying that the object is perfectly legitimate in DS, but the configuration of ConsoleOne is not correct; consequently, the object is "unmanageable." On the other hand, the yellow circled question mark is usually bad news.

The yellow circled question mark generally means that one or more of the mandatory attributes of the object are missing. When a mandatory attribute is missing from an object, NDS/eDirectory automatically changes the object's class to Unknown but leaves the name unchanged.

There are two conditions under which the presence of Unknown objects is normal and transitory . The first situation, which is related to timing, happens during replica synchronization. A new replica being added to a server when objects are still being updated from one replica to another can cause Unknown objects. Some objects may start as Unknown objects (when viewed from ConsoleOne, if the timing is right), but when the synchronization process is complete, they are updated with all the information they need and are turned into real objects. Depending on what you are doing, timing can sometimes make Unknown objects go away.

The other situation under which Unknown objects appear is during a DS restore. Because the objects are restored in the order in which they are backed up, some objects (such as a Volume object) may be restored before the objects (such as the NCP Server object) that define their mandatory attributes ( Host Server , in the case of Volume objects) are restored. However, when all the objects are restored, the Unknown objects should turn into known objects. For example, if a group is restored but all its members ( User objects) do not yet exist in the tree, placeholder ( Unknown ) objects are created until the User objects are restored. At that time, the placeholder objects become real User objects, and the User and Group objects are fully functional.

If you have not done any of the previously mentioned operations and you have an Unknown object, you can delete it and then re-create if it is replaceable . Before you do that, however, you should be familiar with the following repercussions :

When the Unknown object is a volume, deleting the object causes any user who has a Home Directory attribute pointing to that volume to lose its mapping ”that is, the Home Directory attribute value is cleared.
When the Unknown object is a user, deleting the object results in the user losing his or her specific trustee assignments (both file system and DS assignments).
When the Unknown object is a server, deleting the object causes the server to be deleted from the tree, and all DS references to that specific server are lost. This type of Unknown object should not be deleted casually because such deletion can also lead to inconsistent replica rings.

In most cases, however, an Unknown object can be deleted and re-created. Anytime an NCP Server object or something of importance (such as the Admin User object or a Volume object) is turned into an Unknown object, however, you should consider the consequences of your actions before proceeding.

Before you delete an Unknown object and re-create the real object, you need to check whether other replicas have good copies of the Unknown object. If they do, you can rescue this object without having to re-create it. You can easily accomplish this by using a combination of NDS iMonitor and DSBrowse or DSRepair.

You can use NDS iMonitor to browse for the object. You'll notice that after you select an object and are viewing its entry information, a Replica frame is shown in the bottom-left corner of the window. This frame shows a list of all servers in the replica ring for this object. The server name that is not hyperlinked is the server you are reading the object's information from. If you want to read the same object from a different server's perspective, you click the hyperlink for that server. After you have determined that there is a good copy of the object in at least one replica of the replica ring, you can proceed. On the server that has Unknown objects in its DIB, you load DSRepair with the -P switch and perform a repair of the local DS database, leaving all settings at the defaults ”but you need to make sure that the Rebuild Operational Schema option is set to YES (in Linux/Unix, you use ndsrepair -R -Ad -P ). Following that, you use one of the following procedures to rectify the issue:

Reload DSRepair without any switches. (However, -A may be required for older versions of DSREPAIR.NLM .) For NetWare, select Advanced Options, Replica and Partition Operations, View Replica Ring. Then select the server that has the good objects and press Enter. Finally, select Send All Objects to Every Replica in the Ring. For Windows, expand the Partitions list, right-click the server that has the good objects, and select Send All Objects to Every Replica in the Ring. For Linux/Unix, start ndsrepair with the -P option, select the replica in question, and then select Replica Options, View Replica Ring. Next , select the server that has the good objects and select Server Options, Send All Objects to Every Replica in the Ring.
Instead of using DSRepair to send all objects in the replica, as outlined in the previous procedure, you can use DSBrowse instead because it has an option to re-send a single selected object. For NetWare, load DSBrowse with -A to get this option. Load DSBrowse on the server that has a good copy of the object, browse to the object, press F3, and choose Resend Selected Object. This changes the object from Present ( Flags=1 ) to Present New Entry ( Flags=801 ) on the sending server. DSBrowse also timestamps the object with a newer timestamp, which should send updates to the offending servers, provided that the bad object on those servers has its flags value set to 201 ( Present Reference ). For Windows, launch DSBrowse ( -A is not required) on a server that has a good copy of the object, browse to the object, right-click, and select Send Object. This procedure cannot be used on Linux/Unix platforms because DSBrowse is not available; however, you can use NDS iMonitor instead, as described in the next procedure.
Instead of using DSRepair to send all objects in the replica, as outlined in the first procedure, you can use NDS iMonitor instead because it has an option to send a single selected object. NDS iMonitor is especially suited for Linux/Unix platforms because DSBrowse isn't available there. Use NDS iMonitor to locate a server that has a good copy of the object and then click the Send Entry to All Replicas link (see Figure 11.14).

Figure 11.14. Sending selected object information to other replicas.

NOTE

The DSRepair -P procedure marks all Unknown objects in the local DIB as Present Reference ( Flags=201 ). With this flag set, the server is ready to receive the object. In a Reference state, the Unknown objects are overwritten if a valid object is sent to that server, and the server will not synchronize the Unknown object to other servers in the replica ring.

Your choice on which of these procedures to use depends mostly on how many Unknown objects there are in the replica. If there are only one or two Unknown objects, the DSBrowse and NDS iMonitor options are the better choice because they generate only small amounts of traffic. However, if there are a fair number of Unknown objects, DSRepair may be more time-efficient, but at the cost of higher network traffic (depending on the number of objects in the replica).

It has been observed that sometimes Directory Map and Print Server objects spontaneously mutate into Unknown objects for no apparent reason. This can be caused by one of three events:

The server hosting the volume the Directory Map object was pointing to at the creation time of the Directory Map object has been deleted.
The server hosting the Print Server object it was pointing to at the creation time of the Print Server object has been deleted.
The Volume object the Directory Map object is pointing to has been deleted.

The last situation is easy to understand; however, the first two are not. They are due to a bug in the version of NWAdmin that shipped with NetWare 4 that appears when you're dealing with Directory Map and Print Server objects. (The same bug is also in the version of NetWare Administrator shipped with NetWare 5 and above and carried over into ConsoleOne.)

When a DM object is created, its Host Server attribute (which is not visible in NWAdmin but which you can determine by looking in the Others tab in ConsoleOne) points to the server that hosts the volume referred to by the Directory Map object. If you change the Directory Map object to point to a volume on a different server, the Host Server attribute is not updated; it remains pointing to the old server. If the old server object referenced in the Host Server attribute gets deleted, DS automatically removes the attribute, and the Directory Map object turns into an Unknown object because it loses its mandatory Host Server attribute. For example, when you create a Directory Map object called TEST_DM and point it to NETWARE65-A_VOL1:HOME , the Host Server attribute points to NETWARE65-A (or whatever your current default server is). If you later change the Directory Map object to point to NETWARE65-B_SYS:DATA , the Host Server attribute of the Directory Map object remains pointing to NETWARE65-A . So if at a later time you remove NETWARE65-A from the network, TEST_DM becomes an Unknown object. A similar problem exists with the Host Device attribute of Print Server objects.

NOTE

In the case of Print Server objects, the Host Device attribute identifies where the Print Server object's log file is to be kept. When the Print Server object is brought up, a licensed connection is made to the server identified by the Host Device attribute, even if the log file option is not enabled. This can also cause performance issues or prevent the Print Server object from being loaded if the referenced server happens to be across a WAN link or if the (remote) server or the link to it is down.

Although NWAdmin doesn't show the Host Server and Host Device attributes, you can easily look them up by using ConsoleOne, NDS iMonitor, or NList. The following NList command shows the Host Device settings of all Print Server objects in the current context:

 Nlist "print server" show "host device"

The output would look something like this:

 Current context: test.xyzcorp Print Server: PS-test         Host Device: NETWARE5-A.toronto. ------------------------------------------------------------- One Print Server object was found in this context. One Print Server object was found.

To address these problems, Novell has available the following (unsupported) solutions:

Novell's developer support Web site, at developer.novell.com/support/sample.htm, makes available a sample application called Mapobjch that is contained in a file called D3MAPOBJ.EXE . Mapobjch includes a browser to select what container you want to search and automatically changes the host server to that of the volume object's, if they are not the same.
An Appware utility called Hstdev enables you to change the host device of a Print Server object. You can locate this program by searching for HSTDEV.EXE , using the file finder at Novell's support Web site.

NOTE

If you are looking for a supported product, you might try HostEditor (see www.dreamlan.com/hostedit.htm ). Besides working with Directory Map and Print Server objects, HostEditor also works with Print Queue and Volume objects.

There is one situation in which the presence of Unknown objects is valid. As discussed in Chapter 6, "Understanding Common eDirectory Processes," in a replica ring that consists of servers running eDirectory and legacy NDS versions, objects containing auxiliary class extensions appear as Unknown objects on pre-NDS 8 servers because those earlier versions of NDS do not know how to handle auxiliary classes. In such a case, you should not be concerned with these Unknown objects and not attempt to delete them unless you have a good reason for so doing.

Renamed Objects

Generally, when a normal object turns Unknown , the object name is unchanged (only its object class is changed to Unknown ). If instead your normal objects have their names changed to names such as 1_2 , 2_1 , and 13_5 (that is, # _ # ) when you didn't name them that way ”and they keep coming back after you've renamed or deleted them ”you have a synchronization problem.

WARNING

You should not casually delete these renamed objects when you first detect them. They could be important objects, such as NCP Server objects, that got renamed. Deleting such objects could lead to dire synchronization errors or data inconsistency consequences if you are not careful.

These objects are called renames . Renames are caused by name collisions during synchronization. A collision occurs when the same object is found with different CTSs. The name collision problem happens mostly in a mixed NetWare 4.0x and NetWare 4.10 environment, which is a rare combination these days. It can also occur with the newer versions of NDS/eDirectory on a LAN/WAN where communication is not stable.

You might also observe multiple renamed objects in the SLP Scope container when you have multiple Service Location Protocol (SLP) directory agents (DAs) servicing the same scope and writing the information into DS. This is because each DA is attempting to write the same service information it detected into DS, but at a slightly different time. To remedy this, you need to ensure that you are running the latest version of the SLP module for your operating system platform because it contains an option to not store SLP service information in DS but to keep it in the DA cache instead.

TIP

When using multiple DAs for the same scope, you should configure only one of the DAs to write service information to DS while keeping the other DAs to use the cache.

The following steps can help you resolve name collision problems:

Ensure that the time is in sync on the network and that each server in the replica ring is running the same or the latest compatible version of the DS module. (All the servers in the tree should be running the latest version of DS.) Use the Time Synchronization option from DSRepair's main menu to check whether time is in sync on the network and the versions of the DS modules.
Make sure that all the replicas for the partition in question are in the On state. Also make sure that the partition has a Master replica and that the server holding that replica is accessible by the other servers. You can check them by performing a synchronization status report by using DSRepair, NDS iMonitor, or iManager. You should do this on the server that holds the Master replica of the partition.
Compare the replica ring information between the related servers in the ring. Resolve any conflicts or inconsistencies if any are found.
Use DSBrowse or NDS iMonitor to examine each replica in the replica ring to determine how many servers have been affected by the rename problem. There should be at least one server that shows the original name of the renamed object.
From the server that contains the Master replica of the partition, issue the following DSTrace commands:
```
 set dstrace=on set dstrace=+sync set dstrace=*h 
```
(or the equivalent, using NDSCon on Windows servers) and see whether the partition in question is synchronizing successfully.

If DSTrace reports All Processed = Yes , you should first try to rename the objects back to their original names before trying to delete the renamed objects. They should either keep the name change or be removed from the tree without reappearing. If they persist, you need to perform a DS health check to ensure that everything is in order. If nothing obvious shows up and renames keep happening, you should consider opening a call with Novell Support and have them dial in for a look at the underlying problem that is causing the renames to reoccur.

TIP

There is one more option you can try before calling Novell to deal with renamed objects. If at least one server in the replica ring is showing the original object name, you can move the Master replica to a server that shows the correct name. Then, one by one, you can remove the replicas from the other servers in the replica ring and then slowly add the replicas back again, waiting for each replica to be added before you go to the next server.

If the renamed object is an NCP Server object, refer to TID #10013224 for instructions on how to remove the affected server from DS and reinstall it into the tree. The TID was written for NetWare 4.11, but its concept is equally applicable to all versions of NetWare and non-NetWare DS servers.

Replica Ring Inconsistency

Although they are not very common, replica ring inconsistencies can reflect serious problems in a DS tree. A replica ring inconsistency is present when two or more servers holding a replica of a partition do not agree on what the replica rings look like.

The most common cause of this problem is a change in the replica ring while a server in the ring is down combined with a timestamp problem where the server that is down has a future timestamp on its replica information. When this occurs, the replica on the server that was down does not change its replica list to reflect the recent change. This can result in a number of odd situations: multiple servers holding the Master replica, inconsistent views for Subordinate Reference (SubRef) replicas, or servers missing from the replica ring.

NOTE

Multiple Master replicas can be a result of the disaster recovery process. If the server holding the Master replica is down for an extended period of time, another server in the replica ring could be designated the Master. When the downed server is brought back online, there would be two Master replicas.

One of the easiest ways to diagnose an inconsistent ring is to use NDS iMonitor to check continuity. You do this by selecting the Agent Synchronization link and then selecting Continuity for each replica hosted on the server. As discussed in Chapter 7, "Diagnostic and Repair Tools," this method provides a view of the DS partition from each server's perspective. By querying the continuity information from different servers in the ring, you can quickly determine whether there is an inconsistency in the replica ring. Figure 11.15 shows what this might look like for a two-server replica ring. One server sees two servers in the ring (as shown in the Replicas frame) but obtains status for only one server; there is no -625 or -626 error to indicate that there is a communication problem in contacting the other server; there simply wasn't any status information to be reported .

Figure 11.15. NDS iMonitor, suggesting that the replica ring for the `[Root]` partition is inconsistent.

NOTE

As discussed in Chapter 9, "Diagnosis and Recovery Techniques," you should not rely on a single tool to do your diagnosis. After you have narrowed down which servers are having an inconsistent view of the replica ring, you should use DSRepair to obtain a confirmation before formulating a repair plan.

There are a number of ways to resolve an inconsistent replica ring that has inconsistent views. The first and most advisable is to contact Novell Support to examine the replica list information on the servers, using Novell's diagnostic tools, and repair the database manually. This is the most common resolution we recommend because there are a number of different sets of circumstances that can lead to this sort of situation. If you are comfortable with using DSRepair, however, and are reasonably sure that there are no additional but yet-undetected causes of the inconsistent replica ring problem, you might be able to correct the problem without involving Novell.

WARNING

It is important to realize that an inconsistent replica ring problem is one of the types of problems for which proceeding without Novell's direct assistance may result in both DS and file system trustee data lost.

To start working with an inconsistent replica ring problem, the first thing to do is determine which server has the inconsistent view. If you have more than two servers in the replica ring, the most consistent view is the one you want to work with. The server with the view that does not match the others is the one you want to correct in most circumstances. If there is more than one server with an inconsistent view, you should start with the one that is most inconsistent.

The best way to correct the problem is to uninstall DS from the server in question and reinstall it. This ensures that the timestamps on the affected server are correct. The procedure for this is as follows :

Remove DS from that server by using either INSTALL.NLM for NetWare 4.x, NWCONFIG.NLM for NetWare 5.x and higher, the Add/Remove Programs applet for Windows, or nds-uninstall for Linux/Unix platforms.

NOTE

When running eDirectory 8.7 or higher, with the roll-forward log (RFL) enabled, you need to back up the logs before removing DS . This is because the RFL files are also removed by the uninstallation process.
Wait a few minutes. How long you wait depends on the speed of any WAN links involved and the number of replicas the change needs to replicate to.
Use NDS iMonitor or DSRepair to confirm that the replica lists on all servers in the ring show that the offending server is gone.
If NDS iMonitor still thinks the server is in the replica ring, there may be an additional problem with the server remote ID list. If this happens, use DSRepair on each server that is left to verify the remote ID for each server.
If the server still appears in the replica ring, run DSRepair with the -A switch (in Linux/Unix, run ndsrepair -P -Ad ) on the server that holds the Master replica to manually remove the server from the replica ring. For NetWare, select Advanced Options, Replica and Partition Operations, View Replica Ring. Then select the server you want to remove and press Enter. Finally, select Remove This Server from the Replica Ring. For Windows, expand the Partitions list and select the server that is to be removed. Then select Partitions, Replica Rings, Remove Server from Ring. For Linux/Unix, run ndsrepair -P -Ad , select the replica in question, and then select Replica Options, View Replica Ring. Select the server that is to be removed and select Server Options, Remove This Server from Replica Ring. (This step is only necessary on the server with the Master because the rest of the servers receive the update from the server with the Master replica, and the list should appear consistently in NDS iMonitor after this change has propagated. However, if the offending server holds the Master, you need to first designate another server to become the Master before performing this step.)
When DS has finished synchronizing the changes to the replica ring, reinstall the server that was removed back into the tree and replace the replicas on that server. This also places SubRef replicas on servers where they are required.

NOTE

With eDirectory 8.7 or higher, if the RFL had been enabled previously, don't forget to reenable the RFL files. Refer to the "Configuring and Maintaining Roll-Forward Logs" section in Chapter 8 for details.

There are many different variations to this problem, so it is important to examine the entire situation carefully before proceeding with a plan of action. Remember: Doing something just for the sake of doing something can make a situation much worse .

It is easiest to fix a replica ring that has more than one Master replica. If there are more than two servers in the replica ring, you should run DSRepair on one of the servers that do not have the Master replica and designate that server to be the Master. The procedures for doing this are outlined in Table 11.2.

Table 11.2. Procedures for Fixing a Replica Ring

OPERATING SYSTEM

PROCEDURE

NetWare

From the main menu, select Advanced Options, Replica and Partition Operations. From the list of replicas, select the desired partition and press Enter. Select Replica Options, Designate This Server as the New Master Replica (see Figure 11.16).

Figure 11.16. Using DSRepair to make the current server the Master replica server.

graphics/11fig16.jpg

You might need to load DSREPAIR.NLM with the -A option in order to see the Designate This Server as the New Master Replica Option.

Windows

Start dsrepair.dlm from NDSCon. Expand the Partitions view and highlight the desired partition. Right-click and then select Designate This Server as the New Master Replica.

Linux/Unix

Run ndsrepair -P , select the desired partition, and then select Replica Options, Designate This Server as the New Master Replica Option.

The two servers holding the Master should automatically be demoted to Read/Write replicas. You can then redesignate the original Master server as Master. This procedure must be carried out by using DSRepair and not one of the other management tools, such as ConsoleOne. This is because the other tools will first perform mini-health checks before changing the replica types and will abort when they see that there is more than one Master replica.

NOTE

If there are only two servers in the replica ring (and both are indicating that they are the Master), redesignating one of them as Master by using DSRepair will resolve the conflict.

Replica Inconsistency

Besides inconsistent views of servers in a replica ring, there can also be inconsistency in the number of objects between replicas. This could be due to time-related issues that lead to some servers holding some, but not all, of the objects in the replica rings. Depending on what objects are missing, the problem may go undetected for some time. The issue generally comes to light when users start complaining that they are having intermittent login trouble; they can log in fine when they are attached to one server but not when they are attached to a different server.

WARNING

Before you perform the following procedure, make sure there are no Unknown or renamed objects present. If there are any, refer to the earlier sections in this chapter to resolve them before proceeding.

The following procedure will help ensure that all servers in your replica ring hold the same number of objects. First, you need to use NDS iMonitor or DSBrowse to determine which server has the most accurate replica in the ring. Depending on your finding, you should then exercise one of these options:

If the Master has all the data and only a few servers in the ring have incomplete replicas, use DSRepair to perform a "receive all objects from Master to this replica" operation on the servers that have incomplete replicas.
If the Master is the only replica that has the complete replica, use DSRepair to perform a "send all objects to every replica in the ring" operation on the Master server.
If none of the replicas have complete information but each replica has some objects that other replicas are lacking, use DSRepair to perform a "send all objects to every replica in the ring" on each server in the ring.

Table 11.3 lists the detailed steps for accomplishing these options within each operating system.

Table 11.3. DSRepair Procedure to Synchronize Replicas

OPERATING SYSTEM

PROCEDURE

NetWare

To send from the (Master) replica to all servers in the ring: Select Advanced Options, Replica and Partition Operations, View Replica Ring. Select the (Master replica) local server and press Enter. Select Send All Objects to Every Replica in the Ring.

To receive data from the Master replica: Select Advanced Options, Replica and Partition Operations, View Replica Ring. Select the local server and press Enter. Select Receive All Objects for This Replica.

(For older versions of DSREPAIR.NLM , you might need to load it with the -A switch.)

Windows

To send from the (Master) replica to all servers in the ring: Expand the Partitions list, right-click the (Master replica) local server, and select Send All Objects to Every Replica in the Ring.

To receive data from the Master replica: Expand the Partitions list, right-click the local server, and select Receive All Objects from Master to This Replica.

Linux/Unix

(Run ndsrepair with the -P option.)

To send from the (Master) replica to all servers in the ring: Select the replica in question and then select Replica Options, View Replica Ring. Select the (Master replica) local server and then select Server Options, Send All Objects to Every Replica in the Ring.

To receive data from the Master replica: Select the replica in question and then select Replica Options, View Replica Ring. Select the local server and then select Server Options, Receive All Objects from the Master to This Replica.

WARNING

The Send All Objects to Every Replica in the Ring option does exactly what it says: It sends every single object on that server's replica to every other server in the replica ring. Each receiving server will either discard the received information because it already has the object, add the object to the receiving server's DIB because it did not previously have it, or overwrite an Unknown object with a valid object it just received. This process could generate a lot of network traffic, depending on the size of the replica. Therefore, it is advisable that you perform this "send all" operation after-hours and wait for it to complete on each server before starting it on the next server.

Stuck Obituaries

As described in Chapter 6, DS makes extensive use of obituary notifications for object management, and obituary flags are eventually cleared out when an object is removed. There are times, however, when an obituary gets stuck so that DS can't finish the cleanup process. Most obituaries get stuck because a server was not notified that a change to objects has taken place. To see whether you have any stuck obituaries, you should use the latest available version of DSRepair and select Advanced Options, Check External References on the Master replica of each partition. (You need to load DSRepair with the -A command-line switch.) This generates a list of all obituaries on the server. Then you need to review this list, searching for any line with a Flags=0 value. The server listed (that is, the last entry on the Backlink process line) below this value has not been contacted. The following is a sample DSRepair log that shows obituaries:

 /*************************************************************/ NetWare 6.00 Directory Services Repair 10515.37, DS 10510.64 Log file for server "NETWARE65-B.Test.DreamLAN" in tree "NETWARE65-TEST" External Reference Check Start: Thursday, March 3, 2004 2:14:25 pm Local Time  Found obituary for: EID: 11000FE8, DN: CN=User3.OU=Test. O=XYZCorp.NETWARE65-TEST -Value CTS : 01/16/2004 10:36:42 PM  R = 0001  E = 0003 -Value MTS = 01/16/2004 10:36:42 PM  R = 0001  E = 0003, Type = 0001 DEAD, -Flags = 0000 -RDN: CN=User3  Found obituary for: EID: 11000FE8, CN=User3.OU=Test. O=XYZCorp.NETWARE65-TEST -Value CTS : 01/16/2004 10:36:42 PM  R = 0001  E = 0004 -Value MTS = 01/16/2004 10:36:42 PM  R = 0001  E = 0004, Type = 0006 BACKLINK, -Flags = 0000 -Backlink: Type = 00000005 NEW_RDN, RemoteID = ffffffff,  ServerID = 010000BD, CN=TEST-FS1.OU=Test.O=XYZCorp. NETWARE65-TEST  Found obituary for: EID: 11000FE8, CN=User3.OU=Test. O=XYZCorp.NETWARE65-TEST -Value CTS : 01/16/2004 10:36:42 PM  R = 0001  E = 0004 -Value MTS = 01/16/2004 10:36:42 PM  R = 0001  E = 0004, Type = 0006 BACKLINK, -Flags = 0000 -Backlink: Type = 00000005 NEW_RDN, RemoteID = ffffffff,  ServerID = 030010C3, CN=TEST-FS2.OU=Test.O=XYZCorp. NETWARE65-TEST  Found obituary for: EID: 11000FE8, CN=User3.OU=Test. O=XYZCorp.NETWARE65-TEST -Value CTS : 01/16/2004 10:36:42 PM  R = 0001  E = 0004 -Value MTS = 01/16/2004 10:36:42 PM  R = 0001  E = 0004, Type = 0006 BACKLINK, -Flags = 0000 -Backlink: Type = 00000005 NEW_RDN, RemoteID = ffffffff,  ServerID = 03001101, CN=TEST-FS3.OU=Test.O=XYZCorp. NETWARE65-TEST Checked 0 external references Found: 4 total obituaries in this dib,     4 Unprocessed obits, 0 Purgeable obits,     0 OK_To_Purge obits, 0 Notified obits *** END ***

The information presented in the this DSRepair log is interpreted as follows:

EID stands for entry ID. This is a record number in the 0.NDS file (or ENTRY.NDS file in NetWare 4) that specifies the object that has the Obituary attribute assigned.
CTS and MTS are timestamps. They denote when the Obituary attribute was created and modified, respectively.
Type indicates both a number and a text description. There are three categories of types: primary, secondary, and tracking. A primary obituary indicates an action on an object. A secondary obituary indicates the servers that must be contacted and informed of the primary obituary action. A tracking obituary is associated with certain primary obituaries. The following are the valid obituary types:
- Primary obituaries are 0000 Restored , 0001 Dead , 0002 Moved , 0005 NEW_RDN (New Relative Distinguished Name [RDN]), 0008 Tree_NEW_RDN (Tree New RDN ”this does not specify an DS tree name but rather a partition root name), and 0009 Purge All .
- Secondary obituaries are 0006 Backlink (specifies a target server that needs to be contacted regarding an obituary) and 0010 Move Tree (this obituary is similar to the Backlink obituary). There is one move tree obituary for every server that needs to be contacted regarding a Tree_NEW_RDN operation.
- Tracking obituaries are 0003 Inhibit Move , 0004 OLD_RDN (Old RDN), and 0007 Tree_OLD_RDN (Tree Old RDN ”this does not specify an DS tree name but rather a partition root name).
Flags indicate the level or stage to which the obituary is processed. The following are the valid flag values:
- 0000 (Issued) ” This flag indicates that the obituary has been issued and is ready for processing.
- 0001 (Notified) ” This flag indicates that the obituary is at the notify stage, which essentially means that the servers identified in the backlink or tree move obituaries have been contacted and notified of the operation or action on an object.
- 0002 (OK-to-Purge) ” This flag indicates that the obituary is being cleaned up on the local database of each server identified in the backlink or tree move obituaries. This cleanup includes resolving all objects that reference the object with the obituary and informing them of the change (for example, deletion or move).
- 0004 (Purgeable) ” This flag indicates that the obituary is ready to be purged. The purge process essentially recovers the value to the free chain and enables it to be reused.

NOTE

For a more detailed discussion about the obituary process, see Chapter 6 .

Using this information, you can readily determine that the DSRepair log reports that User3.Test.XYZCorp has been deleted but the obituary is temporarily stuck because server NETWARE65-B (the server on which this DSRepair was run) is waiting to pass that information to servers TEST-FS1 , TEST-FS2 , and TEST-FS3 .

Armed with the necessary information provided by DSRepair, you can then begin to find the problem with that server. It could be that Transaction Tracking System (TTS) is disabled, the server is down, SAP/RIP filtering may be causing a problem, or the server may not even exist anymore but the server object is still in the tree. By checking these issues, you can resolve almost all obituary problems. With NDS 5.95 and higher, you can use a SET DSTRACE=*ST command, and it will report back information in the DSTrace screen on what servers are having the problems with obituaries.

One of the most commonly reported obituary-related DS error is -637 (0xFFFFFD83), which is a "previous move in progress" error. You may encounter this error when trying to do any kind of partition operation, such as adding or moving a replica or even adding a user (after a container move). In many cases with the newer versions of DS, especially with eDirectory, the -637 error can be resolved without the intervention of Novell Support. In some cases, however, Novell will need to dial in to your network to edit the DS database in order to remove the stuck obituary that's causing the problem.

For example, if the case is a server not able to communicate with one or more servers referencing the object being moved, you should be able to resolve the error without involving Novell. If a server referencing the object has actually been removed from the tree and the object move has still not completed, however, you may need to contact Novell for additional help.

TIP

For many -637 errors, the cause is due to communication loss to a server holding an external reference (exref) of the object that's being moved from one container to another. Sometimes the cause is that a server holding a subordinate reference of the replica got taken offline, thus preventing the synchronization cycle from completing.

The actual -637 error is caused by the Type=0003 (Inhibit Move) obituary. This obituary is placed on the object that has been moved and on the container it has been moved to, to prevent another move from taking place on this object until the previous move has completed. In some cases, two other obituary types may be involved as the cause to the -637 error: Type=0002 (Moved) obituary and Type=0006 (Backlink) obituary. The Moved obit is attached to the (original) object that has been moved from this container. The Backlink obit is attached to the object to point to another server holding an exref of the object, which must be notified when the object is modified (for example, deleted, renamed, moved). The Backlink obit can also be caused by an exref, where a server must hold information about an object in a partition that the server does not hold a replica of. DS stores this information in the server's database as an exref, which is a placeholder that contains information about the object that the server does not hold in a local replica. Exrefs are updated periodically by servers holding replicas of the object via the Backlink process that point to the object on the server holding the exref.

The following three steps can help you resolve most -637 errors without involving outside assistance:

Locate the object with the Inhibit Moved obituary because that is the culprit of the error. Go to the server holding the Master replica of the partition reporting the -637 error and use DSRepair (with the -A switch) to perform an exref check. Look for lines similar to this:
```
 Found obituary for ... EID... RN CN=  Objectname  .OU=  Container  .O=  Container  ... -Value MTS= ... Type=0003 Inhibit_move -Flags=0000 
```
This is the object causing a -637 error to be reported. Take note of the object's full name and context.
Locate the corresponding Moved obituary. It is placed on the object that was moved from another container to the one with the Inhibit Move obituary. You need to find where this object was moved from. A server holding a replica of the partition where the object was moved from gives you this information. If you are lucky, the same server holding the Inhibit Move obituary also holds a replica of the partition where the object was moved from. If you are not lucky, you will have to run DSRepair (with the -A switch) on every server in the tree that holds a Master replica of a partition and look for the following error when checking exrefs:
```
 Found obituary for ... EID ... CN=  name  .OU=  ou_name  .O=  o_name  ... -Value MTS= ... Type=0002 Moved -Flags=0000 
```
You are looking for the same CN name as the Inhibit Move obituary, only in a different container (remember that the -637 error is caused by moving objects). A Moved obituary is placed on the object that has been moved until the move is completed. In the same DSREPAIR.LOG file, also look for lines similar to the following:
```
 Found obituary for ... EID ... CN=  name  .OU=  ou_name  .O=  o_name  ... -Value MTS= ... Type=0006 Backlink -Flags=0000 
```
Look for the same object as the Moved obituary object. This object has exrefs, held on other servers in the tree, that must be notified of the move. A Backlink obituary points to the server holding the exref, and Flags=0000 tells you that the server holding the exref has not yet been notified of the move.

TIP

If all Type=0006 Backlink obituaries are at Flags=0000 , you should verify that you have a Master replica of each partition in the tree; the Master replica is responsible for forwarding obituary states.
Find out the status of the server(s) holding the exref(s). To find out which servers have exrefs to the moved object, use DSBrowse. On the server reporting the Moved and Backlink obituaries, use DSBrowse to locate the object and examine the value(s) of its Obituary attribute. You should see information similar to this:
```
 -Flags = 0000 -Backlink: Type = 00000005 NEW_RDN, RemoteID = ffffffff,  ServerID =  ##  , CN=  Servername  
```
When you find such entries, you have the names of the servers that are holding up the process.

When you have the server names, you need to determine whether the servers are up. Are they communicating properly (that is, no -625 errors)? Do the servers still exist? If the servers are simply down (say, for maintenance), you need to get them back up and running and communicating as soon as possible. If they no longer exist, you need to delete the NCP Server objects from DS by using the Partition view in ConsoleOne, and any exrefs they were holding should clean up after the server objects are deleted. If the servers are up and communicating, you can try the following on each of those servers:

Load DSRepair with the -XK3 switch (which kills all exrefs in the local DIB). Select Advanced Options, Repair Local DS Database. Set Check Local References to Yes; set the other options to No.
Perform the repair, save the repaired database, and exit DSRepair.
At the server console, enable the Backlink trace message and force the Backlink process to run:
```
 SET DSTRACE = +BLINK    (+BLNK on Linux/Unix) SET DSTRACE = *B SET DSTRACE = *H 
```
Toggle to the DSTrace screen and watch for the line BACKLINK: Finished checking backlinks successfully . If the screen scrolls too fast for you to catch the message, enter the following commands:
```
 SET TTF = ON SET DSTRACE = *R SET DSTRACE = +BLINK    (+BLNK on Linux/Unix) SET DSTRACE = *B SET DSTRACE = *H SET TTF = OFF 
```
Then use EDIT.NLM , VIEW.NLM , or a text editor to examine the resulting SYS:SYSTEM\DSTRACE.DBG file. (Use /var/nds/ndstrace.log on Linux/Unix platforms.)

The Backlink obituary should now have purged, which in turn enables the Moved and then the Inhibit Move obituaries to process. You can check the flags of the obituaries by using either DSBrowse or DSRepair, as described previously.

TIP

You should always allow NDS /eDirectory some time for its various background processes to do their jobs. You should wait at least 15 to 45 minutes. It may take a while for an Inhibit Move obituary to purge, depending on how many replicas and objects are in the partition.

If this procedure doesn't work, there is a trick you can try before having to call Novell. Recall from Chapter 6 that the Master replica initiates most of the obituary processing; the exception to this involves the processing of a Used By obit that is started by the replica that actually modified the object. The trick is to simply move the Master replica around in the replica ring. This forces each replica to process any obituaries in its local database and synchronize the changes to the other servers in the replica ring.

TIP

You may actually consider using the move-Master-around procedure before using the -XK3 option because the move trick is a lot less intrusive .

If your replica ring consists of mixed NDS/eDirectory versions, putting the Master replica on eDirectory helps to reduce the occurrence of Inhibit Move obits. If the replica ring has one or more NetWare 6 servers, putting the Master replica on one of them is also desirable.

On the other hand, if the server referenced by the obits no longer exists in the tree and the server object has been deleted from NDS, you need to call Novell Support for assistance.

NOTE

In some cases, there may be no corresponding moved obituary because the DS obituary process was (somehow) abnormally interrupted . In such situations, you must contact Novell Support for assistance in cleaning up the orphaned Inhibit Move obituaries.

TIP

To prevent orphaned obituaries from occurring, you should perform exrefs for any obituaries that have not completed processing (that is , not at Flags=0004 [Purgeable]) before bringing down a server permanently or performing any other operation that may prevent communication to a server or its replica.

Unknown Objects

Figure 11.13. The two different types of unknown objects in ConsoleOne.

Figure 11.14. Sending selected object information to other replicas.

Renamed Objects

Replica Ring Inconsistency

Figure 11.15. NDS iMonitor, suggesting that the replica ring for the [Root] partition is inconsistent.

Table 11.2. Procedures for Fixing a Replica Ring

Figure 11.16. Using DSRepair to make the current server the Master replica server.

Replica Inconsistency

Table 11.3. DSRepair Procedure to Synchronize Replicas

Stuck Obituaries

Figure 11.15. NDS iMonitor, suggesting that the replica ring for the `[Root]` partition is inconsistent.