Background Processes

The DS module maintains the database through several background processes running on each server. These processes run automatically and generally do not need to be manually invoked. There are cases in which there is benefit in forcing a process to run, but as a general rule, you should not force them to run unless necessary. As discussed in Chapter 4, "Don't Panic," doing something just for the sake of doing it is frequently not a good idea.

The Synchronization Process

The Synchronization process, sometimes referred to as the Skulker process , keeps the information in multiple replicas of the DS database current on all servers. The process is event driven, meaning it is kicked off after an object has been modified. Listing 6.2 shows a sample of the Sync process in the DSTrace screen.

NOTE

The exact format of DSTrace output varies, depending on the version of NDS /eDirectory (thus, the version of the utilities), flag settings, and sometimes the operating system platform. Therefore, the DSTrace, DSRepair, and other listings shown in this and other chapters in this book may not exactly match what you find on your systems, but they serve as a guide to the correct information.

Listing 6.2. A Sample Synchronization Process

 SYNC: Start sync of partition <[Root]> state:[0] type:[0]  SYNC: Start outbound sync with (#=2, state=0, type=1)        [010000C3]<RIGEL.West.XYZCorp> (21:11:57) SYNC: failed to communicate with server                   <CN=RIGEL> ERROR: -625 SYNC: SkulkPartition for <[Root]> succeeded SYNC: End sync of partition <[Root]> All processed = NO.

Listing 6.2 demonstrates a failed synchronization condition. The local server is attempting to contact the server named CN=Rigel.OU=West.O=XYZCorp but is unable to complete the Synchronization process. The error -625 indicates a transport failure ”also known as a communications failure. To correct this problem, the easiest way to proceed is to verify that the target server is up and that the communications links between the two servers are working properly.

A successful synchronization cycle of the [Root] partition between the two servers is shown in Listing 6.3.

Listing 6.3. A Successful Synchronization

 SYNC: Start sync of partition <[Root]> state:[0] type:[0]  SYNC: Start outbound sync with (#=2, state=0, type=1)        [010000C3]<RIGEL.West.XYZCorp>  SYNC: Using version 5 on server <CN=RIGEL>   SENDING TO ------> CN=RIGEL   SYNC: sending updates to server <CN=RIGEL>     SYNC:[010000B7][(20:02:16),1,3] XYZCorp (Organization)     SYNC:[010000B8][(22:20:00),2,1] ORION.East.XYZCorp (NCP Server)     SYNC:[0100029A][(20:02:50),2,1] Jim.East.XYZCorp (User)     SYNC:[0100029B][(19:50:43),2,1] Amy.East.XYZCorp (User)     SYNC:[010002A4][(19:49:49),2,1] Kenny.East.XYZCorp (User)     SYNC:[010002A8][(19:58:46),2,1] WINNT.Scripts.East.XYZCorp (Profile)     SYNC:[100002E1][(02:36:26),1,1] WIN98.Scripts.East.XYZCorp (Profile)    SYNC: Objects: 7, total changes: 25, sent to server          <CN=RIGEL>   SYNC: update to server <CN=RIGEL> successfully completed   Merged transitive vector for [010000C3]<RIGEL.West.XYZCorp>    succeeded  SYNC: SkulkPartition for <[Root]> succeeded  SYNC: End sync of partition <[Root]> All processed = YES.

This time the servers are talking to each other, and there are a few updates that need to be sent from one server to the other.

NOTE

Unlike many other DS implementations , NDS /eDirectory sends only the changed attribute values (the deltas ) of a given object, even if they are part of a multivalued attribute.

The frequency at which the Sync process runs depends on the object attribute being changed. Each attribute has a flag called that determines whether it is "high convergence." This flag has one of two possible values:

Sync Immediate ( DS_SYNC_IMMEDIATE_ATTR ) ” With this flag, the attribute value is scheduled for immediate synchronization (with a 10-second holding time after the first event is detected so that if there are subsequent events within this time window, they can be processed at the same time). This is required on some attributes, such as the Password Required attribute of a User object, to either maintain proper data integrity or security.
Sync Never ( DS_SCHEDULE_SYNC_NEVER ) ” The name of this flag is a little misleading. This flag indicates that a change to the attribute's value does not trigger synchronization (immediately). The attribute can wait to propagate the change until the next regularly scheduled synchronization cycle (30 minutes for NetWare 4 servers and 60 minutes for NetWare 5 servers and higher, including eDirectory servers) or some other event triggers synchronization.

NOTE

If the Sync Immediate flag is not specified for an attribute, DS automatically assumes the attribute to be Sync Never .

A Per Replic a ( DS_PER_REPLICA ) flag also exists and can be defined for attributes. When an attribute is defined as Per Replica , the information of the attribute is not synchronized with other servers in the replica ring. Most of the DirXML- related attributes are defined with this flag.

TIP

Appendix C, "eDirectory Classes, Objects, and Attributes," lists all the attributes defined for eDirectory 8.7.3, along with synchronization flag information.

Nontransitive Synchronization in NetWare 4

In NetWare 4.x any server that holds a replica of an NDS partition has to communicate with all the other servers that hold a replica of that partition. Figure 6.3 shows the type of communication that has to take place in order for synchronization to be completely successful on all NetWare 4.x servers.

Figure 6.3. Nontransitive replica synchronization between four NetWare 4.x servers.

graphics/06fig03.gif

As you can guess, the number of synchronization processes (or vectors , as they are sometimes called) that must complete grows exponentially as replicas are added. The amount of traffic generated can be tremendous. In fact, the number of communications vectors is n x ( n “1), where n represents the number of replicas in the replica ring. Thus, at 27 replicas, a total of 27 x 26, or 702, communications vectors exist.

Transitive Synchronization in NetWare 5 and Higher

In NetWare 5 Novell introduced the idea of transitive synchronization. Transitive synchronization is a synchronization methodology wherein a server doesn't have to contact every other server in the replica list. It can enable other servers to ensure that synchronization is complete, as demonstrated in Figure 6.4.

Figure 6.4. Transitive replica synchronization between four NetWare 5 and higher servers.

graphics/06fig04.gif

The reduction in traffic in a transitive synchronization environment is very significant, and the completion of the entire synchronization cycle is reduced. Ideally, this would create a scenario in which the vector count would simply equal n “1, so with 27 replicas, only 26 communications vectors would be needed. Table 6.3 shows the difference in vectors between transitive and nontransitive synchronization.

Table 6.3. The Number of Communications Vectors with Transitive and Nontransitive Synchronization

NUMBER OF SERVERS IN REPLICA RING	NUMBER OF NONTRANSITIVE VECTORS	NUMBER OF TRANSITIVE VECTORS
2	2	1
3	6	2
4	12	3
5	20	4
6	30	5
7	42	6
8	56	7
9	72	8
10	90	9

This discussion represents the ideal number of synchronization vectors when using transitive synchronization. As you can see in Table 6.3, the number of communications vectors with transitive synchronization is significantly smaller than the number with nontransitive synchronization, although it is possible that the number of vectors could increase, depending on the network design and availability of services. The actual number of synchronization vectors with transitive synchronization could be larger but will always be smaller than without transitive synchronization.

NOTE

In a way, you can consider transitive synchronization a feature of NDS 7 and higher. Therefore, you do not need to have NetWare servers to take advantage of it because the non-NetWare DS servers will be running eDirectory, which supports transitive synchronization.

Transitive synchronization also addresses mixed transport protocols used on different DS servers. Consider the example presented in Figure 6.4. Without transitive synchronization support, the servers Rigel and Orion will not be able to synchronize with the server Vega because they do not share a common transport protocol. With transitive synchronization, however, there is no problem because the server Betelgeuse acts as a gateway or a mediator.

WARNING

One side effect of replica rings with mixed transport protocols is that the servers Rigel and Orion in this example will attempt to talk directly to Vega (and vice versa). They will report "Unable to communicate with server x " errors. However, this does not indicate a problem with your DS . It's just that DS has detected a situation that is really not a problem.

To understand how transitive synchronization works, you must first be familiar with transitive vectors. NDS uses a time vector ”also called a time array ”to keep track of changes to a given partition. This time vector holds timestamps for all the replicas in the replica ring from a given server's perspective. (For instance, if there are two replicas for this partition, two timestamps will be found in the time vector, as illustrated in Figure 6.5.) Each server holds a copy of its own time vector as well as copies of time vectors from the other servers in the ring. This group of time vectors is collectively known as the transitive vector . The Transitive Vector attribute is multivalued and associated with the partition root object, so NDS/eDirectory can manage the synchronization process and determine what needs to be sent to other replicas. Each replica has its own transitive vector; there is only one transitive vector for each replica, and it is synchronized between all servers within the replica ring.

Figure 6.5. NDS iMonitor showing the time vector values of a transitive vector.

To see the transitive vector values in NDS iMonitor, as shown in Figure 6.5, from Agent Summary, you click the Partition Replica link and then click the Partition link, followed by Attributes. Finally, you click the Transitive Vector link.

NOTE

In NDS 6 and earlier, the single-valued attribute Synchronized Up To is used to determine when the latest changes were made. The value of this attribute is unique for each replica and is not synchronized to the other servers in the replica ring.

When you synchronize the transitive vector values, all the replicas can synchronize without needing to have every replica communicate with every other replica. Each time the replica synchronization process begins its scheduled run, it first checks the entries in the transitive vector to determine which other servers hold replicas that need to be synchronized. The check compares the timestamps of the time vectors of the source server that received the update with those of the destination server. If a timestamp is greater for the source server, replica updates are transferred. The source server updates its own time vector within the transitive vector and sends the updated transitive vector to the target server. At the end of the replica update process, the target server updates its own time vector within the transitive vector and sends that updated transitive vector back to the source server. Now the two servers know they are both up-to-date, and the target server will not try to sync with the source server with the same update.

NOTE

Under the transitive synchronization scenario, the source server does not request the target server's timestamps because they are already present in the transitive vector that is stored on the source server.

Multithreaded Synchronization

One of the most significant performance-enhancement features in eDirectory is the introduction of multithreaded replica synchronization, starting with eDirectory 8.6. In previous versions of eDirectory and NDS, all inbound and outbound synchronization was performed using a single thread. Partitions were synchronized in a serial manner ”changes in one partition could not be synchronized until the previous partition had been completely processed. However, this is not very efficient for trees where there may be many partitions.

Starting with eDirectory 8.6, outbound synchronization is now multithreaded. Partitions stored on one server can be synchronized out in a parallel manner, allowing replicas to be synchronized in a much more efficient manner.

NOTE

Inbound synchronization is still single threaded. An eDirectory server can receive inbound synchronization for only one partition at a time.

Multithreaded synchronization takes place using one of two synchronization methods (see Figure 6.6):

By partition ” This method causes eDirectory to send out one partition to multiple recipient servers at a time.
By server ” This method causes eDirectory to send out multiple partitions to multiple unique servers at one time.

Figure 6.6. Multithreaded synchronization methods.

graphics/06fig06.gif

When eDirectory starts up, it analyzes all partitions and corresponding replica rings stored on the server. This analysis procedure results in a list of all servers involved in replica synchronization and all partitions stored on those servers. If the number of partitions stored on the local server is equal to or greater than the number of unique servers minus one (the local server), eDirectory will automatically synchronize using the by-partition method. Otherwise , eDirectory uses the by-server method. By default, the synchronization method is dynamically adjusted (that is, selected by eDirectory upon startup). But you can also manually select a preferred method via NDS iMonitor (see Figure 6.7).

Figure 6.7. Use the Agent Synchronization link under Agent Configuration in NDS iMonitor to view and manage multithreaded synchronization.

The number of threads used for synchronization determines how multithreaded synchronization behaves. For example, if only one thread is configured for synchronization, multithreaded synchronization is effectively disabled. By default, eDirectory allows a maximum of eight threads for multithreaded synchronization.

eDirectory automatically determines the number of threads to use in multithreaded synchronization by determining whether the number of partitions is less than or equal to two times the number of unique servers in those partitions' replica rings. If the number of partitions is less than or equal to two times the number of unique servers in those partitions' replica rings, eDirectory will set its maximum thread usage to the number of partitions stored on the local server. Otherwise, the number of threads is set to half the number of unique servers in shared replica rings. This allocatable thread count is used only if it does not exceed the configured maximum thread count. If this count is exceeded, the number of allocatable threads will be set to the configured maximum thread count.

Incremental Replication

Perhaps the most problematic issue you're likely to encounter with database synchronization is designing correct methods for protecting against data loss or preventing unnecessary duplication of synchronization work due to communication failures. Prior to eDirectory 8.6, any type of communication failure during the replica synchronization process would cause the entire process to be restarted when communication was reestablished. With large partitions containing millions of objects, this could prove to be a very costly restart, especially if slow WAN links are involved.

eDirectory 8.6 addressed this problem by implementing incremental replication. Incremental replication allows for the replica synchronization process to be interrupted and later resume from the point of failure. To understand the how the incremental replication process works, you need to first understand the following related key terms and concepts:

Window vector ” The window vector , stored in the SyncWindowVector attribute (of type SYNC_OCTET_STRING ) on the partition root object of the receiving server, is the point in time to which the source replica is attempting to move the destination replica. For example, if the latest modification timestamp in the source replica is 2/14/2004 2:35 p.m. and the destination replica has a timestamp of 2/14/2004 1:10 p.m., the window vector in use for the synchronization process would be 2/14/2004 2:35 p.m.

Generally speaking, the window vector is equal to the source replica's transitive vector, unless the destination replica is more than 30 days behind the source replica. In that situation, the window vector is divided into 30-day intervals.
Window pane ” A window pane is a discrete unit of work. In the case of replica synchronization, a window pane represents a complete synchronization cycle. This would be the difference between the current transitive vector of the destination server and the transitive vector of the source server. In other words, the window vector represents the final point in the synchronization cycle, and the window pane represents the entire amount of work ”the number of objects and attributes values that need to be sent ”necessary to meet that window vector.
Distributed consistent ordering of objects ” To allow incremental replication, the object synchronization process must be able to stop and then pick up again at the point where it was stopped . For fault tolerance and performance, the synchronization process must also be able to be resumed by any other server in the replica ring. This is possible only if all servers in the replica ring are synchronizing objects in the same order as every other server. Because objects can be added to any replica at any time, all servers in the replica ring must use a consistent index of objects, based on some unique value for all objects, within a partition. eDirectory uses the object creation timestamp because all creation timestamps are unique.
Synchronization point ” The synchronization point is a collection of information that can be used to determine how far the synchronization process has progressed. This collection of information consists of the following types of data:
- An object producer ” The object producer is one of several sources or background processes that evaluate objects as candidates for the synchronization process. Examples of these producers are the Partition Send All, Change Cache, and Obituary processes.
- An ordering of objects ” The ordering of objects that have been produced by the object producer is based on the creation timestamps of the objects being produced.
- A key ” The key is the value used to determine the current synchronization location within the ordering of objects. This key is typically the creation timestamp of the objects being synchronized.

The synchronization point is stored as the SyncPanePoint attribute (of type SYN_OCTET_STRING ) on the partition root object of the receiving server.

REAL WORLD: Object Producers

Object producers are DS internal processes that are responsible for providing (that is, producing ) entries based on different criteria. The following are the producers for the synchronization process and a brief description of the purpose of each:

ChangeCache ” The ChangeCache producer is responsible for synchronizing all entries that exist in the local server's change cache for the current partition. (Entries are added to the change cache when they are modified in any way on the local server.)
EntrySendAll ” The EntrySendAll producer is used when a Send All for a replica has been performed or a Send All has been performed on an individual entry.
Obituary ” The Obituary producer is responsible for synchronizing all entries in an obituary state.
PartitionBoundary ” The PartitionBoundary producer is responsible for sending information about the current partition's boundaries.
PartitionIndex ” The PartitionIndex producer is used to walk through any partition on the server and is used by background processes such as the Janitor, Backlinker, and other processes.
PartitionIndexSync ” The PartitionIndexSync producer is used to walk through the partition being synchronized. It also provides keys used in the synchronization process to establish synchronization points.
PartitionRoot ” The partition root object is always synchronized first during all partition synchronization processes. The PartitionRoot producer is responsible for sending this object at the beginning of every synchronization cycle.

Now that you are familiar with the elements of incremental replication, let's discuss the incremental replication process. The following is an overview of the incremental replication portion of the replica synchronization process:

The Replica Synchronization process begins . The transitive vector has been checked, and a replica synchronization process has been started.
The replication process checks for the existence of the SyncPanePoint attribute on the target server . If the SyncPanePoint attribute is found, it indicates that the receiving server was in the middle of a replica synchronization process and was interrupted. When a SyncPanePoint attribute is located, the source server reads in the value of the SyncPanePoint attribute and determines the window vector, object producer, and key for the interrupted synchronization. Using that information, the source server resumes the synchronization process that was interrupted.

If no SyncPanePoint attribute is located, the source server calculates and establishes a new window vector for the synchronization process.

NOTE

If the window vector of the interrupted synchronization is newer than the transitive vector of the source server, the source server reestablishes a window vector equal to the source server's local transitive vector.
The replication process sends updates from the source server to the target server . Updates are sent as data packets across the wire. An individual packet can contain one or more object changes that need to be synchronized. To minimize data loss in the event of communication failure, each packet begins with a new SyncPanePoint attribute. The SyncPanePoint data contains the key, which indicates the present position in the synchronization process. This key provides a pointer for the last packet sent from the source server.
The receiving server updates its SyncPanePoint attribute for each data packet received . In the event that communication is interrupted, all data received before the last SyncPanePoint attribute will be preserved. At most, two data packets' worth of information would be lost.
The receiving server removes the SyncPanePoint attribute at the end of a successful sync . When the replica update process is completed, the SyncPanePoint attribute is removed from the receiving server's partition root object. This allows subsequent synchronization cycles to establish new window vectors.

As mentioned previously, incremental replication is available only in eDirectory 8.6 and higher. Safeguards are in place to prevent loss of data where DS servers running pre-eDirectory 8.6 are unable to synchronize replicas with SyncPanePoint attributes. When these servers with the older DS attempt to synchronize with an eDirectory 8.6 or higher server, they encounter error -698 ( ERR_REPLICA_IN_SKULK ), indicating that the target server is currently in the middle of a replica synchronization process. The purpose of the -698 error is to allow time for another eDirectory 8.6 or higher server to synchronize with the server reporting the -698 error. When another eDirectory server that is capable of incremental replication encounters the SyncPanePoint attribute, the synchronization process will be able to pick up at the point of failure (as indicated by the window vector), and no data will be lost.

TIP

The infrequent occurrence of the -698 error is an example of when an error is not indicative of a real error. However, if its frequency is high, it can indicate a communication issue lurking in the background.

To ensure that an eDirectory server capable of incremental replication is not a requirement for future synchronization (because of the presence of the SyncPanePoint attribute after an aborted sync), the SyncPanePoint attribute is automatically purged after a two- hour timeout. After the timeout period has passed, the SyncPanePoint attribute is purged, and any data received during the incomplete synchronization cycle is lost. At that point, any DS server can begin a new replica synchronization cycle with this server because there is no more SyncPanePoint attribute present to cause a -698 error.

NOTE

Although multithreading and incremental replication make the eDirectory synchronization process much more efficient, they also make LAN trace analysis and reading of DSTrace results more challenging.

Auxiliary Class Object Handling

NDS versions prior to NDS 8 do not understand or know how to handle auxiliary classes. Consequently, NDS 8 and higher servers only send auxiliary class and auxiliary attribute information to servers running NDS 8 and above. When synchronizing to servers running previous versions, eDirectory must send the auxiliary class information in a manner that is compatible with the previous releases. Because an auxiliary class adds attributes to an object that previous versions of NDS consider illegal, NDS 8 and eDirectory servers make the following modifications to the objects with auxiliary classes before they are sent to servers with previous versions of NDS:

The AuxClass Object Class Backup attribute (of type SYNC_CLASS_NAME ) is added to the object, and all the information from the object's Object Class attribute is stored in the attribute. This attribute is stored only on the pre-NDS 8 servers.
The object's class is changed to Unknown .
The auxClassCompatibility attribute (of type SYNC_CLASS_NAME ) is added to the object on all replicas and is used to maintain timestamps for the object.

Table 6.4 demonstrates how eDirectory modifies an object's Object Class , AuxClass Object Class Backup , and auxClassCompatibility attributes as it synchronizes to an NDS 7 or older server when an auxiliary class is present for the object.

Table 6.4. Auxiliary Class and Attribute Information, as Seen on Servers Running Different DS Versions

EDIRECTORY SERVER	NDS 7.X OR OLDER SERVER	NDS 8 SERVER
`Object Class` attribute value: `User` , `Organizational ndsLoginProperties` , `Top`	`Object Class` attribute value: `Unknown` , `Top`	`Object Class` attribute value: `User` , `Organizational Person` , `Person` , `Person` , `Person` , `ndsLoginProperties` , `Top`
”	`AuxClass Object Class Backup` attribute value: `User` , `Organizational Person` , `Person` , `ndsLoginProperties` , `Top`	”
`auxClassCompatibility` attribute value: `Unknown` , `Top`	`auxClassCompatibility` attribute value: `Unknown` , `Top`	`auxClassCompatibility` attribute value: `Unknown` , `Top`

When an NDS 8/eDirectory server receives an Unknown object, it checks whether the object has an auxClassCompatibility attribute. If there is such an attribute, NDS 8/eDirectory replaces the Unknown class with information from the AuxClass Object Class Backup attribute and restores the object to normal. The auxClassCompatibility attribute is maintained on all servers in the replica ring as long as at least one NDS 7.x or older server is in the ring. When all NDS 7.x and older servers are removed from the replica ring, the attribute is removed from the object. This information is often referred to as the "Aux Class Lie."

NOTE

Because many existing applications that read NDS /eDirectory class definitions do not necessarily understand auxiliary classes, Novell modified the read class definition APIs to provide backward compatibility. All the new routines do is intercept the client responses and substitute the class information located in the Object Class attribute with the information located in the AuxClass Object Class Backup attribute. As a result, if you look at the object in DSBrowse or NDS iMonitor, the object will still show up with an Unknown class, but NetWare Administrator and ConsoleOne will now show up as known objects. You should be able to administer such objects with NetWare Administrator or ConsoleOne as if they were normal objects. Only applications that have been updated to be compatible with NDS 8 and higher can display auxiliary class definitions with an auxiliary object class flag.

NOTE

You need DS.NLM 6.19/7.62 and higher to take advantage of the updated APIs.

The Schema Synchronization Process

You can modify the NDS schema by adding or deleting attribute definitions and object class definitions. Such changes need to be replicated among all the servers within the same tree that contain replicas. This synchronization is done through the Schema Synchronization process. This process is started within 10 seconds following completion of the schema modification operations; the 10-second delay enables several modifications to be synchronized at the same time.

NOTE

Although the Schema Sync process targets only servers hosting replicas, servers without replicas still receive schema information through the Janitor process (which is discussed later in this chapter).

NOTE

Keep in mind that base schema definitions cannot be modified. When a new attribute is added to a base class object definition, it cannot be removed.

The updates to the schema are propagated from one server to another; this is similar to the Replica Synchronization process. However, the Schema Synchronization process does not use a replica ring to determine which servers to send the schema updates to. Schema updates are sent to servers that contain either replicas of a given partition or Child partitions of the given partition.

Because schema modifications must occur on the server that is hosting the Master replica of [Root] , the modifications flow from the [Root] partition down to the extreme branches of the tree.

The actual Schema Synchronization process is made up of several different processes:

Schema process ” This process, which runs every four hours by default, is the main process. It schedules the execution of the following subprocesses (in the order listed). (DSTrace displays the message "Begin schema sync " at the start of the sync and either an "All Processed = Yes" or an "All Processed = No" message at the end. If processing is successful, the next Schema process is scheduled to run again after HeartBeatSchemaInterval , which is four hours by default; otherwise, the next Schema process is scheduled to run after SchemaUpdateInterval [60 seconds] plus 1 second.)
Skulk Schema process ” This process determines which servers the local server needs to synchronize to (by maintaining a server-centric schema sync list in server memory) and in what order to synchronize to them. It also ensures that the local server is in a state to successfully synchronize the schema. If the process detects that a schema epoch is in progress, DSTrace reports a -654 error ("partition busy"). A -657 error ("schema sync in progress") will be reported if a schema reset is detected.
Send Schema Updates process ” This process is the workhorse in the Schema Synchronization process. It is responsible for sending the schema changes ”all deleted classes and deleted attributes ”as well as the present attributes and present classes. eDirectory makes several passes through this process to ensure that all these changes are synchronized correctly. (During this phase, DSTrace reports "Sending < present or deleted > < Attributes or classes >".)
Schema Purger process ” This process is responsible for cleaning up any entry or value records that are no longer needed. (During the cleanup, DSTrace reports "Purger purged < class or attribute >; entries purged < number of values >."
DSA Start Update Schema process ” This process is the process that the receiving server goes through while another server is sending schema to it. When a server receives a request to send schema, it goes through the next two processes. (DSTrace reports "* Start inbound sync from server < senderID > version < protocol number >, epoch < epoch in seconds >:< epoch replica number >.")

TIP

Although the Schema Synchronization process never sends schema to itself, a check is made to ensure that the sender is never the receiver. In the unlikely event that the sender is the receiver, DSTrace displays "Warning - Rejecting DSAStartUpdateSchema Client < serverID >" and reports the error “699 ("fatal").
DSA Update Schema process ” This process details what the receiving server does with each update it receives. This process is looped through over and over, as long as the sending server continues to send updates. (During this phase, DSTrace reports "DSAUpdateSchema: Processing inbound packet one at a time because of " or "DSAUpdateSchema: Packet with < number of updates > updates," depending on the information found inside the data packets.)
DSA End Update Schema process ” This process signals the end of the update. The receiving server goes through the DSA End Update Schema process when it receives a DSAEndUpdateSchema request from the sending server. (Upon completion of the cycle, DSTrace reports "* End inbound sync from server < serverID >, Version < version >, Epoch < epoch in seconds >:< epoch replica number >.")

The detailed operation of the Schema Synchronization process is rather involved. The following simple example serves to illustrate how the Schema Synchronization process works from a high-level point of view. Figure 6.8 depicts a small tree with five servers and three partitions.

Figure 6.8. A Schema Synchronization process example.

A schema change is made to Betelgeuse because it holds the Master replica of the [Root] partition. After this server has been updated, this server sends the schema changes out to the other servers that hold copies of [Root] : Rigel and Andromeda. After all servers in the [Root] partition have received the updates, DS sends the updates to the other servers in the tree. It does this by looking at the servers that hold copies of [Root] and reading the replica list information to find out what other replicas are out there; then it builds a schema sync list. Each server's schema sync list may be different, depending on what replicas it hosts .

You can look up a server's schema sync list by using either DSTrace or NDS iMonitor. It is easiest to use NDS iMonitor, as shown in Figure 6.9; it is found under Service List under the Schema link.

Figure 6.9. A server's schema sync list.

To use DSTrace, you first enable the DSTrace filter with the +SCHEMA flag on NetWare or the +SCMA flag on Unix, and then you use the set dstrace=*ssl DSTrace option. The output looks similar to this:

 SchemaSyncList: --->>> [000080a3] <.DREAMLAN-W2KB-NDS.eDir_Book.W2K_EDIR_873.>         Flags: 0001  Lists: 0005  Expiration: 2004/01/12 6:11:21         List(s): [0005] Replica   Service Inbound schema synchronization lock status: Released resetSuccessfulSync = 0 in GetServersInSchemaSyncList

On Windows, you need to enable DSTrace's Schema Details from the DSTrace Edit menu and then trigger the Schema Sync process from the Trace tab of the DSA window.

By looking at the replica list on Rigel, for example, DS can determine that there are two child partitions ” OU=West.O=XYZCorp and OU=East.O=XYZCorp . The replica list on Rigel also indicates what other servers are in the tree. DS determines that the servers Vega and Orion also need to be updated. During this determination, note that Vega and Rigel are listed twice because of the replication scheme in this tree; even though Rigel receives an update in the first round of schema synchronization, after Vega receives the updates to the schema, Rigel is again checked to see whether its schema is current. If the schema is not current, it is updated.

TIP

Schema updates are normally not something to be concerned about unless the change is being made because of an update in the DS module. In cases where Novell has introduced a schema change in a new version of the DS module, you should first update the module on the server that holds the Master replica of [Root] ”because that is where schema modification takes place ”and then update the rest of your servers after the schema update has completed.

As discussed earlier in this chapter, schema changes are synchronized from the root of the DS tree down to its branches. Because a tree can have NDS 8 servers near the root, with NetWare 6 or 4.2 servers in the middle, and an eDirectory 8.7 server below them, eDirectory must be able to send schema information about auxiliary classes in a manner that is compatible with legacy versions of NDS. It must do so with sufficient clues that an eDirectory server can re-create an auxiliary class from the information. To accomplish this, when synchronizing schema with a server running NDS 7 or older, eDirectory makes the following changes to the three auxiliary class characteristics to make them compatible with previous versions of NDS:

Auxiliary class flag ” NDS 8 introduced this object class flag to indicate which classes are auxiliary classes. Because pre-NDS 8 versions do not recognize this flag, eDirectory servers send auxiliary class definitions as standard class definitions with one additional attribute, the Auxiliary Class Flag attribute, which contains the auxiliary class flag information. When an eDirectory server receives a class definition with this attribute, it removes the attribute from the class definition and re-creates an auxiliary class from the class definition.
Superclasses ” Prior to NDS 8, NDS required every class to have a superclass. To make auxiliary classes compatible with these rules, NDS 8 and higher servers send Top as the superclass of any auxiliary class that has declared no superclass. When an eDirectory server receives a class definition with the Auxiliary Class Flag attribute and with Top as its superclass, the server removes Top as its superclass.
Object Class Attribute ” In versions of NDS prior to NDS 8, the Object Class attribute is a Read-Only attribute. When NDS 8 or higher servers send the definition of this attribute to servers with previous versions of NDS, the source servers include the Read-Only constraint. When eDirectory servers receive the definition for this attribute from a server with previous versions of NDS, the Read-Only constraint is removed from the definition.

The Janitor Process

The NDS Janitor process is responsible for a number of different tasks , including the following:

Scheduling the Flat Cleaner process.
Issuing console messages when synthetic time is issued (on NetWare servers only).
Optimizing the local DS database.
Checking whether the partition root object has been renamed .
Updating and verifying the Inherited ACL attributes of partition root objects.
Updating the Status attribute in the DS database for the local server.
Ensuring that the local server is registered with another server to receive schema updates if there is no local replica.
Validating the partition nearest [Root] on the server and the replica depth of that partition.

The Janitor process has responsibility for some fairly critical tasks. By default, the Janitor process runs every two minutes, although it doesn't perform every task in its list each time it runs. (For example, it schedules the Flat Cleaner process only once every 60 minutes.)

DS uses synthetic time to manage situations where the current timestamp on an object is later than the current time. The Janitor process checks the timestamps on the objects in the server and when a new timestamp is needed for an object. If an object in the server's replicas has a timestamp greater than the current server time, the Janitor process notifies the operating system, and a message is generated on NetWare's system console:

 1-02-2004   6:33:58 pm:    DS-8.99-12 Synthetic Time is being issued on partition "NW7TEST."

Timestamps and Synthetic Time

Chapter 2, "eDirectory Basics," discusses the importance of time synchronization in regard to event timestamping. The timestamp itself is not discussed in detail in Chapter 2. A timestamp consists of three fields: the time and date when the timestamp was issued (more specifically , the number of seconds since midnight January 1, 1970), the replica number, and an event counter. The event counter is incremented every time a timestamp is issued until one second has advanced or 65,535 (64KB minus 1) events have been issued. The following sample timestamp indicates that the server holding Replica 2 issued this timestamp on October 10, 2004, at 04:23:18, and it was for the 34th event within that second:

 10/10/2004 04:23:18  2;34

DS uses two types of timestamps to keep track of changes in the database:

Creation timestamp ” This timestamp is issued when an object is created. A creation timestamp is used to identify an object; therefore, no two sibling objects (that is, objects in the same context) can have the same creation timestamp.
Modification timestamp ” This timestamp is issued whenever an attribute is added to, modified, or removed from an object. Every attribute has a modification timestamp that denotes the date and time the attribute was created or last modified (but not when the attribute was removed).

When a timestamp (either a creation or modification timestamp) is issued, the Next Timestamp field (also known as the Partition Timestamp field) in the partition record representing the partition in which this modified object resides is updated. The value placed in the Next Timestamp field is equivalent to the timestamp just issued, but the event counter is incremented by one. This allows DS to identify the minimum value for the next timestamp to be issued.

When a new timestamp is needed, the server obtains the next timestamp based on the partition timestamp of the partition in which the object is being modified. The server also obtains the current time from the operating system. The server then performs one of the following tasks:

If the time obtained from the operating system is higher than the Next Timestamp value (that is, if it is later in time), the server resets the event counter back to 1 and issues a new timestamp, using the time provided by the operating system, its replica number, and the new event counter.
If the time obtained from the operating system is equal to the Next Timestamp value, the server uses the value from the Next Timestamp field.
If the time obtained from the operating system is less than the Next Timestamp value (that is, if the Next Timestamp value is in the future compared to the operating system's time), the server uses the Next Timestamp value and displays on the operating system console that it is using "synthetic time."

When synthetic time is used, the partition timestamp is frozen, and the only thing that changes is the event count portion of the timestamp. Because every change that occurs requires a unique timestamp, the event counter is incremented from 1 to 65,535 as the server issues timestamps. When the event counter reaches its maximum allowed value, the counter is reset to 1, the next second is used, and the process repeats until the partition timestamp catches up with the current system time.

Synthetic time being issued is not always a critical problem. If a server's time is set back from within a few hours to within a few days, it is not necessary to correct the problem. This situation is a case where waiting is a better solution than doing something. Using DSRepair to repair timestamps is a serious step to take in that the fix actually destroys replicas on all servers except the server with the Master replica. When all non-Master replicas are destroyed , the replicas are re-created. See Chapter 12, "eDirectory Management Tools," for information about resolving synthetic time errors.

Janitor Process Optimization

One of the Janitor process optimization steps is the rehashing of the database information to enable the server to perform lookups more quickly.

If the Janitor process detects that the name of the partition root object has changed, it notifies all servers holding external references of this object of the new name.

Updating the Inherited ACL attribute values starts with the first partition in the partition database. After the Janitor process has located the partition, it validates that the parent object is not an external reference and looks at the ACL to determine whether any of the attribute values have been modified. If they have, it validates whether the attribute is inheritable, and if it is, it recalculates the Inherited ACL attribute values. The Janitor process performs this process for all the partitions on the server.

REAL WORLD: `Inherited ACL` Explained

Inherited ACL is an attribute that is assigned to and found only on a partition root object. It is used to identify effective rights that are inherited by trustees on the partition root object and that were assigned at a container higher up in the tree.

The Inherited ACL attribute provides a way for NDS to determine each object's effective rights, without having to walk past that object's partition boundary (that is, upward in the tree).

The NDS Janitor process calculates the Inherited ACL attribute by starting at the partition root object of a partition (if there are multiple partitions, beginning with the one closest to [Root] ) and performing the following tasks:

The Janitor process searches each of the subordinate containers for rights assignments.
The Janitor process replaces the rights assignment found on a subordinate container with the one found on a superior container object whenever it encounters a duplicate privilege assignment for a trustee.
The Janitor process searches for and applies any inherited rights filter ( IRF ) found on a subordinate container object to all trustee rights assignments inherited by that subordinate container.

After the NDS Janitor process has reached the lower boundary of the partition, the process adds the Inherited ACL values gathered up to that point to the child partition's partition root object; the child partitions could be Master, Read/Write, Read-Only, or SubRef partitions. The process then repeats itself, proceeding through the same steps with each child partition until all partitions on the server have been processed and no additional partitions exist.

The Inherited ACL attributes are synchronized to the other servers in each partition's replica ring. In the process, the servers that hold child partitions ”but not the parent partition ”are able to calculate the proper rights for objects without having to communicate with other servers.

Updating the Status attribute involves validating that the DS attribute Status of the NCP Server object is set to Up . Because the server that performs the validation is up and running, this server always checks for an Up value. If it is set to Down , the Janitor process updates the attribute. Figure 6.10 shows where in NDS iMonitor you can see the result of this operation. To reach this screen, you select Agent Summary, Known Servers; click the server of interest; and select Status.

Figure 6.10. A server status value shown in NDS iMonitor.

When an NCP Server object's Status attribute is set to Down , the synchronization processes does not attempt to communicate with that server. Sometimes when a server is brought back online, its Status attribute value of Up might not be noticed by the other servers in the replica ring right way. You can manually force the status to Up by using NDS iMonitor as shown in Figure 6.11 by clicking the Agent Configuration link and then selecting the Agent Triggers link.

Figure 6.11. Manually forcing server status to `Up` .

The Janitor process's role in ensuring that the server can receive schema updates if it holds no replicas is particularly important. Even if a server has no local replicas, it still receives information for external references (such as those used to grant rights in the file system). In order to handle this properly, the server needs to know about all the different class definitions in case an extended class object receives rights to the file system. Equally important is the need for the schema partition to be maintained in case a new replica is added to the server later. If the server does not have current information about the schema and a replica is added to the server, many objects will change to Unknown objects in the local database, which can cause problems with object administration if those copies of the objects are read by the various management tools.

Finally, the Janitor process is also responsible for updating the Revision attribute of external references when the attribute value on the referenced object is changed.

The Flat Cleaner Process

The Flat Cleaner process is scheduled by the Janitor process and runs 192every 60 minutes by default. Responsibilities of the Flat Cleaner process include the following:

Purging unused objects and attributes stored in the bindery partition or external reference partition
Purging obituaries that have reached the Purgeable state
Revalidating the Status and Version attributes of servers in all partitions of which the server has the Master replica
Verifying that all objects in the user-defined partitions on the server have valid public keys and Certificate Authority (CA) public keys.

NOTE

Because the Flat Cleaner process performs much of the purging of deleted records, it is also known as the Replica Purger process or simply the Purger process .

As described in Chapter 2, the bindery partition is the partition that is used to store information about the bindery user Supervisor . This partition also stores the SAP information that is received from IPX networks connected to the server. If a service is removed from the network, the SAP table in the server needs to be updated; this is one of the tasks the Flat Cleaner process is responsible for.

Obituaries that have reached the Purgeable stage need to be removed from the database, and the Flat Cleaner takes care of this. Essentially, the Flat Cleaner process removes any object or attribute flagged as Non Present .

REAL WORLD: Deletion of External Referenced Objects

When an entry is deleted, the Janitor and Flat Cleaner processes are the main processes responsible for cleaning up any associated external references. The Janitor process notifies the server holding an external reference of an object deleted from the tree. That server then marks the external referenced object as Non Present . The Flat Cleaner process then deletes the database record for this now non-present object.

The Flat Cleaner process is also responsible for validating the Up state of all servers that hold Master replicas. As discussed earlier in this chapter, the Janitor process is responsible for setting the server Status attribute to Up . The Flat Cleaner process is responsible for setting the Status attribute to Down as necessary if it finds that it cannot communicate with the server.

To understand the process better, let's consider a simple example where there are two servers, Orion and Rigel. Orion holds the Master copy of [Root] , the only partition in the tree. If Rigel is offline when Orion's Flat Cleaner process runs, Orion sets the Status attribute for Rigel to Down . When Rigel is brought back online, it runs its Janitor process, checks the Status attribute, and sees that it is set to Down . Because the server is no longer down, Rigel changes the Status attribute to Up .

The Flat Cleaner process also performs checks to validate all Public Key and CA Public Key attributes for objects the server holds. If it finds an invalid or missing key, it attempts to create new keys for the object. DS uses the Public Key and CA Public Key attribute values during the authentication process; if these keys are not valid on User objects, the user (or an administrator) has to change his or her password to fix the problem. If, however, these keys are corrupted on an NCP Server object, server-to-server authentication is disrupted, and synchronization does not occur.

The Backlink Process

The Backlink process, or Backlinker process , as it is called in some Novell documentation, checks on the validity of external references by verifying whether the original entry still exists and whether the reason for its existence is still valid. If the external reference is no longer needed or used after a given time period, the Backlink process removes it.

NOTE

The BackLink attribute consists of two fields: the DN of each remote server holding the external reference and the object ID of each of these servers (known as the remote ID). The Backlink process periodically verifies the DNs and remote IDs to ensure that they are valid.

The Backlink process also helps clean up external references. If the server holding the external reference no longer requires it or if the ExRef partition's life span has expired after not being used for a certain amount of time (the default is 192 hours [that is, 8 days]), the external reference of the object is deleted when the Backlink process has conformed each of the following conditions seven times:

The object represented by the exref has no file system rights assignments to a volume on the server.
The object represented by the exref is not listed on the connection table of the server.
The object represented by the exref is not required to complete the fully qualified DNs (FQDNs) ”that is, FDNs using typeful naming rules ”of any subordinate objects.
The object represented by the exref is not used as a local reference by an attribute stored in the attribute database.

The Backlink process is also responsible for ensuring that every external reference has a copy of the actual object's GUID attribute. This is necessary for file system rights for NSS volumes in NetWare 6 and higher.

By default, this process runs every 13 hours (that is, 780 minutes). You can modify the default value with via either DSTrace or NDS iMonitor (see Figure 6.12).

Figure 6.12. DSA background process settings in NDS iMonitor.

Because the Backlink process works with external references, it is also known as the External Reference Check process.

NOTE

NetWare 5 introduced distributed reference links (DRLs) to replace backlinks. DRLs have the advantage of referencing a partition rather than a specific server. Consequently, the Backlink process has since been updated to work with both backlinks and DRLs.

The Limber Process

The last of the automated background processes is the Limber process. Primary responsibilities of the Limber process include the following:

Verifying the network address for the server in all partitions of which the server holds a replica
Validating that the relative DN (RDN) for the server is correct on the server that holds the Master partition of the replica in which the server exists
Updating and maintaining the Version attribute for the server in the NDS database
Locating the entry closest to [Root] by pinging each server in the replica ring list, in order, and selecting the responding server whose entry closest to [Root] is closest to the tree root
Starting the Predicate Statistics collection (see Chapter 16, "Tuning eDirectory," for more information about Predicate Statistics)
Verifying that the network address for the server is correct in the server's DS object.

TIP

These operations perform verifications on the replica list information and server network addresses. If a replica list is inconsistent, forcing the Limber process to run by using NDS iMonitor (refer to Figure 6.11) on the server that appears to have the problem may correct the problem.

If a server name or address changes, the Limber process is responsible for ensuring that the modifications are made to each replica pointer table in the partition ring. If the changes occur on the server that holds the Master replica, the Limber process changes its local address in the replica pointer table. If the changes occur on a server that holds a non-Master replica, the Limber process tells the Master replica about the changes. The Limber process can initiate the Backlink process, which does part of the checking (on exref objects) for the Limber process.

TIP

After changing the server name or its network address (such as the IP address) of a server in a replica ring, you should force the Limber process to run to ensure that all other servers in the replica ring detect the change.

CAUTION

Never change the server name and its network address at the same time. If you do so, eDirectory will lose track of which server this is. You should follow these steps:

Change the server name.
Restart the server.
Force the Limber process to run (for example, by using NDS iMonitor).
Verify that the new object name has been synchronized throughout the ring and that the other servers in the replica ring see the new server name.
Change the network address.
Restart the server.
Force the Limber process to run (for example, by using NDS iMonitor).
Verify that the new network address has been synchronized throughout the ring and that the other servers in the replica ring see the new network address.
If other servers in the replica ring also need their names or network addresses changed, repeat steps 1 “8 on each server, one at a time.

As mentioned in Chapter 2, some of the information maintained by the Limber process is stored in part in the local System partition. The following tasks are considered to be secondary functions of the Limber process but are nonetheless important:

Verifying that the directory tree name stored in the server's System partition is correct
If the server does not hold a writable replica of the partition its own DS object is in, verifying that the external reference for this object is valid and checking that the BackLink attribute is valid on a server that holds a writable copy of the server object
Checking to ensure that the server's public/private key credentials are correct.

The Limber process is one of the background processes that cannot have its schedule interval changed. If the Limber process's primary operations complete successfully, the process reschedules itself to run again in three hours. If the primary operations have not completed successfully, the Limber process reschedules itself to run again in five minutes.

The Synchronization Process

Listing 6.2. A Sample Synchronization Process

Listing 6.3. A Successful Synchronization

Nontransitive Synchronization in NetWare 4

Figure 6.3. Nontransitive replica synchronization between four NetWare 4.x servers.

Transitive Synchronization in NetWare 5 and Higher

Figure 6.4. Transitive replica synchronization between four NetWare 5 and higher servers.

Table 6.3. The Number of Communications Vectors with Transitive and Nontransitive Synchronization

Figure 6.5. NDS iMonitor showing the time vector values of a transitive vector.

Multithreaded Synchronization

Figure 6.6. Multithreaded synchronization methods.

Figure 6.7. Use the Agent Synchronization link under Agent Configuration in NDS iMonitor to view and manage multithreaded synchronization.

Incremental Replication

REAL WORLD: Object Producers

Auxiliary Class Object Handling

Table 6.4. Auxiliary Class and Attribute Information, as Seen on Servers Running Different DS Versions

The Schema Synchronization Process

Figure 6.8. A Schema Synchronization process example.

Figure 6.9. A server's schema sync list.

The Janitor Process

Timestamps and Synthetic Time

Janitor Process Optimization

REAL WORLD: Inherited ACL Explained

Figure 6.10. A server status value shown in NDS iMonitor.

Figure 6.11. Manually forcing server status to Up .

The Flat Cleaner Process

REAL WORLD: Deletion of External Referenced Objects

The Backlink Process

Figure 6.12. DSA background process settings in NDS iMonitor.

The Limber Process

REAL WORLD: `Inherited ACL` Explained

Figure 6.11. Manually forcing server status to `Up` .