Synchronization Service Overview and Architecture | Microsoft Application Center 2000 Resource Kit 2001

The Application Center Synchronization Service provides the means for deploying content and applications and synchronizing a cluster—including content, applications, network configurations, and load balancing configuration. As you can see, synchronization is a core technology for maintaining consistent settings and content across Application Center clusters.

The Synchronization Service does not attempt to solve the problems of shrink-wrapped application installation or wide area deployment. However, it does not require any special hardware and implements the following guiding principles:

It works with existing publishing tools, such as Web Distributed Authoring and Versioning (WebDAV) and Microsoft FrontPage 2000.
Synchronization is virtually transparent to the user, and multiple content types are replicated seamlessly across the cluster (for example, files, metabase settings, and system Data Source Names [DSNs]).
At a minimum, synchronization performance is at the same level as the Site Server Content Deployment System (formerly called the Content Replication System) when handling large content updates. Typically it takes about 450 seconds to replicate 250 MB by using the Content Deployment System.

The Application Center "shared nothing" design means that every cluster member can stand alone as a complete unit for serving content and providing access to applications. It also means that any member can assume the cluster controller's role if required to do so, provided that the member contains a complete set of the controller and cluster configuration settings.

As noted in Chapter 2, "Feature Overview," Application Center replicates:

Web content and Active Server Pages (ASP) applications
COM+ applications
Virtual sites (and their associated ISAPI filters) and content
Global ISAPI filters
File system directories and files
Metabase configuration settings
Exportable Cryptographic Application Programming Interface (CAPI) server certificates
Registry keys
System DSNs
Windows Management Instrumentation (WMI) settings that Application Center uses

You can divide all of the items in the preceding list into two broad categories: configuration settings and content (or applications, depending on the cluster). Before examining these two categories in more detail, let's take a look at the underlying architecture that supports the Synchronization Service.

Synchronization Service Architecture

Application Center, like the Content Deployment System, uses a single master replication model, which means that changes to a cluster member are not pushed out to the rest of the cluster; instead, all changes to members come from the cluster controller. Another implication of this model is what happens if changes are made to an individual member. If these changes include data that's contained in the controller's master inventory, they are overwritten the next time synchronization occurs. If, however, these changes are outside the scope of the inventory, the controller isn't aware of them and they're ignored, as illustrated in Figure 6.1.

click to view at full size

Figure 6.1 How the single master replication model affects changes to member configuration or content

NOTE
Only files outside the scope of the inventory are ignored; all Internet Information Services 5.0 (IIS) configuration settings on members are overwritten by the controller's IIS configuration settings, which includes all the files that IIS references.

The architecture for the Synchronization Service consists of two parts: the replication engine and replication drivers. Generally speaking, the replication engine manages the synchronization process, while the drivers manage different content types. Figure 6.2 provides a high-level view of the Application Center Synchronization Service architecture.

The Replication Engine

As noted earlier, the replication engine drives content replication and cluster synchronization. It uses asynchronous communication to support its two-phase synchronization process—first transferring the data, and then writing it. Although synchronization is a two-phase process, the absence of commit and rollback capability means that this process is not truly transactional.

NOTE
The replication engine encrypts DCOM and RPC traffic by using the same encryption mechanism (RPC Packet Privacy) that Microsoft Windows 2000 Server employs between domain controllers and servers. However, HTTP traffic, which only consists of files, is not encrypted. This means that metabase and registry entries are secure, but sensitive information, such as a password, stored in a file is not.

The engine's primary responsibilities include handling administration requests coming from the user interface, coordinating replications among the various drivers, and performing general synchronization tasks. These tasks encompass:

Cross-server communication—The engine handles communications between the controller and any member that's a synchronization target.
Security—The engine uses the ACC_computername account for cross-server communication.

Figure 6.2 The Application Center Synchronization Service architecture
Batching—Changed items are batched during automatic synchronization at the replication source for either a certain period of time (10 seconds) or until the buffer size is exceeded. Because the buffer size is 250 items and the maximum number of buffers is 200, you can theoretically batch up to 50,000 changes.
Transport—The engine uses DCOM (or RPC if it's operating in stager mode) to transfer batched updates
Eventing, logging, and notifications—The engine sends events, as appropriate, in response to specific conditions. (Note: the various drivers also do this.)

The relationship between the replication engine and the various drivers is shown in Figure 6.3, which also shows how data is transferred between the source and target servers.

click to view at full size

Figure 6.3 Replication engine and driver architecture

Data Types and Transfer Methods

During the synchronization process, two types of data may be sent—either an XML content description list or binary data. There are three types of information that can be expressed as an XML list:

An IHave content list, which describes all the content present on the source.
An Action list, which is a request for updated content that a target sends to the source.
An Update list, which contains all the content that was updated, or a link to the content if it was transferred through the built-in HTTP engine.

For the sake of future synchronization processing examples, let's refer to the preceding lists as IHave, Action, and Update, respectively.

As shown in Figure 6.3, the two protocols for transferring data from the source server to a target server are HTTP and DCOM. RPC is used as an alternative transport to DCOM in staging scenarios. RPC is used because it supports the use of port specification, which is required for deployments through a firewall.

NOTE
The default port specification is port 4243 for HTTP file transfers and port 4244 for RPC data transfers.

Replication By Using the DCOM and RPC Protocols

When DCOM replication takes place, the driver passes to the engine a well-formed XML object. This object contains the change description for a single resource, such as a path statement for a file or directory.

In cases where replication takes place through a firewall, which is a typical staging scenario, the DCOM data is converted to RPC format and passed through the specified RPC port on the firewall. The replication engine transports the XML object (UTF-8 encoded) to the destination server. The receiving server ensures that the entire batch is received, reconstitutes the RPC data as DCOM, and then instructs the driver to apply the changes by writing the new data to the target.

Replication By Using the HTTP Protocol

This method sends the data in binary format by using FastFile transmit, an HTTP API, which is capable of moving large amounts of data quickly. Communication between the source and target servers takes place on a single port and uses HTTP. If used through a firewall, this port has to be enabled as well. (Only the file system driver uses this protocol to copy data.)

NOTE
Although synchronization is accomplished by using HTTP FastFile, configuring bandwidth throttling on the Web server has no affect on synchronization because the replication engine does not use IIS or the front-end adapter.

The Replication Drivers

Each driver shown in Figure 6.3 is responsible for replicating its own particular type of content. Of course, there are several additional activities that each driver handles. Depending on the driver, these activities can encompass:

Monitoring the driver data store.
Reading the driver-specific data.
Writing data to an XML object or to a driver-specific file transfer API.
Comparing the content list on the source and target.
Locking and/or unlocking a resource for writing.
Handling the transfer and committing it to permanent storage for a given resource.
Handling security at the resource level.

Resource replication order
The replication engine calls the appropriate driver based on the order that resources are specified in the metabase (LM\WebReplication\DriverOrder), and should not be changed.
The default order for resource replication is:

FS—file system

MB—metabase

IIS—IIS sites and bindings

NET—network and Network Load Balancing (NLB) configuration

REG—registry

DSN—DATA SOURCE NAMES

CAPI—CryptoAPI

WMI—Windows Management Instrumentation

COM+—COM+ applications

Assuming that a synchronization or deployment needs to replicate all of the preceding resources, each driver will be called sequentially, up to, and including the NET settings resource group. After the NET settings are applied, the balance of the changes (for example, DSNs and WMI settings) is un-ordered because they are applied concurrently.

NOTE
When an administrator creates a virtual directory that points to an existing path, this approach serves to guarantee that files referenced by virtual directories are applied to each member before the virtual directory settings are written to the IIS metabase.

Let's examine the Application Center replication drivers in more detail, starting with the File System driver.

File System Driver

The File System driver uses standard Microsoft Win32 APIs for read/write functions and picks up change notify from ReadDirChangesW, which works with FAT- and NTFS-formatted disks. The driver uses a file's last modified date, size, and attributes to create a signature. The File System driver does content comparisons of the content tree of the controller and individual members to identify content that needs to be replicated.

The driver informs the replication engine of changes as they occur, and when a file is synchronized, the entire file is copied to the target. The File System driver monitors the following areas by default:

All virtual directories identified in the IIS metabase.
All custom errors defined by the metabase.
All ISAPI and IIS filters identified in the IIS metabase.
User-specified directories in an application.

The File System driver assumes that the target has the same disk/directory configuration as the controller and does not attempt to translate drive letters or paths. Therefore, it is necessary to configure all members so that they have identical directory and file structures. The driver detects UNC paths and disk volumes that are using the exchange file system and does not replicate them.

File System Issues

FAT-based file systems do not support ACLs. If you synchronize or deploy from a server that uses a FAT file system to a server that uses NTFS, the files copied to the target will inherit the access control lists (ACLs) from the parent target where they're written on the target. In the case of file synchronization from an NTFS source to a FAT target, the synchronization or deployment will be aborted and rolled back and an appropriate error event generated.

NOTE
By default, the global replication definition is set to replicate ACLs. You can turn off this feature in the cluster_name Properties dialog box, and then enable/disable ACL replication in a deployment in the New Deployment Wizard.

If possible, you should make sure that your file systems are homogenous across the cluster. The key issue with mixed file systems is that the granularity of LastModified time, which is 2 seconds on FAT, and 1 second on NTFS. So, if NTFS is the source and LastModified time is an odd number, the file is always replicated, whether or not it has been updated.

The File System driver bypasses most file locking issues by renaming the old version of the file and moving the new version from a temporary location to its final destination. If this operation fails, the driver attempts to repeat this operation several times before stopping that particular write operation and moving on to the next file in its update list.

Metabase Driver

The Metabase driver uses the metabase BaseObject to perform read/write functions and utilizes the existing IIS notification code for change notify. The driver uses all the property identifiers and values in a metabase node to create a signature.

The Metabase driver is granular to the node level; that is, when there's notification of a property change, the driver sends all the properties for that node. For example, if you make a change to the ServerComment for site 1, every property in /W3SVC/1/Root will be replicated (this is not recursive).

During a full-synchronization replication, the driver walks the entire metabase and compares its values to those of a member. Any required settings and attributes are copied to the target.

NOTE
Unless you are using a non-NLB cluster, IUSR and IWAM account information is replicated, as well as the IP addresses associated with the controller's virtual sites.

Registry Driver

The Registry driver supports read/write functions and obtains change notify by using standard Win32 APIs. This driver doesn't use a signature because of the small size of the data to be transferred, and the default configuration of Application Center does not replicate any registry keys. However, you can identify and add registry keys to an application. The Applications view supports navigation through the entire registry down to the key level. Although the browser doesn't display the key's values, this registry path is sufficient for identifying keys and their values for replication purposes.

NOTE
Two things should be noted about this driver. First, it isn't intended to serve as a registry backup solution (for example, it doesn't handle secured keys). Second, the driver doesn't compare key values; it just sends the entire key content.

CAPI Driver

The CAPI driver, via the CAPI2 APIs, uses IIS to interrogate the metabase looking for CAPI certificates and a certification trust list (CTL) that is referenced by IIS virtual sites. Then, it will access the CAPI store to extract the relevant information. After receiving this information, the CAPI driver runs these identifiers through a hashing algorithm to generate a signature. See the Signatures sidebar below. When certificates change, the new information is encrypted with the signature, and then it is replicated to members.

NOTE
By default, IIS server certificates are exportable. However, if you disable this capability, they can't be synchronized to other members. The CAPI driver generates an event for each certificate it can't replicate.

WMI Driver

The WMI driver performs read/write operations via standard WMI functions (for example, put, get, and delete member of instance) and a custom event consumer provider. It also provides change notify. The driver uses the complete text of an object's instance to derive a signature.

The driver is responsible for comparing, and then moving, the Health Monitor and Application Center namespaces. During a full synchronization, the WMI driver compares controller and member signatures; if they don't match, the driver transfers the newer class/instance text to the target server. After the text is copied, the driver stores the class/instance information in WMI.

COM+ Driver

The COM+ driver uses the COM+ administrative APIs for read/write functions and uses object properties to create a signature. The driver is also used for replicating selected COM+ Global Catalog settings. Because the Global Catalog doesn't support change notification, these settings are replicated only when a full synchronization takes place or when a COM+ application is replicated. The driver compares COM+ applications on the source and target servers and synchronizes objects that have changed.

NOTE
Only the New Deployment Wizard uses the COM+ driver, so you can synchronize COM+ applications only by using the New Deployment Wizard.

For more a more in-depth look at COM+ replication, see "Special Cases" later in this chapter.

Signatures
Application Center uses signatures to compare files or values.
A file signature is created by taking a set of numeric values, such as file size and attributes, and running these values through the MD5 hashing algorithm to compute a value, which is the signature. Every time a file changes, a new signature gets created. By comparing the new signature to the old, you can assume that the file was changed in some fashion.

In most cases, Application Center uses the last write, file size, and file attributes (except for archived files) to create any required signatures. See also, "Cryptography and PKI Basics" at http://www.microsoft.com/TechNet/win2000/win2ksrv/cryptpki.asp.