A SyncML Server will likely serve thousands or tens of thousands of Clients. SyncML compliance and the above functional characteristics are necessary but not sufficient attributes of a production SyncML Server. A Server should offer high performance, be scalable, and be reliable. This section discusses a number of means to design a SyncML Server to help achieve the above goals.
Exploiting SyncML Characteristics
In Chapter 4, we discussed certain characteristics of SyncML data synchronization that enable the building of scalable Servers. They are the following:
SyncML allows operations on datastores to be batched in one SyncML Package. A SyncML Server must take advantage of this batching, as many back-end data sources have appreciable connection overheads that can be effectively amortized by batching operations on the datastores. In addition, the SyncML Server can attempt to batch operations across many concurrent synchronization sessions. Batching operations across concurrent sessions enhances overall performance. Further, it enables the Server to detect and resolve conflicts caused by concurrent updates before incurring the cost of datastore operations. Most commands in SyncML place no constraints on the ordering of operations. This facilitates building highly concurrent, multithreaded systems where numerous threads process parts of the overall task and do not have to co-ordinate to preserve any order. Most operations in SyncML also do not offer transactional guarantees. This implies that the Server threads do not have to wait until back-ends actually confirm that operations have succeeded.
Exploiting Back-End Characteristics
Batching back-end operations, as discussed above, is the first step toward exploiting back-end characteristics. For administrative reasons, it is often found that the back-end database is resident on a machine other than the synchronization Server. In such cases, back-end operations incur the additional cost of network transfer. Several back-ends, however, have in-built replication support (e.g., Lotus Notes). The synchronization Server should take advantage of back-end replication support whenever possible by using a local replica of the back-end datastore and replicating with the actual back-end datastore at certain intervals.
In multiple-path synchronization, it is especially critical to be able to exploit back-end characteristics when building the Datastore Adapters. It is sometimes difficult to keep track of changes made to back-end data (by non-sync applications). Several back-end stores allow means for applications to be notified of changes. For example, Relational Databases allow triggers, which can notify the Datastore Adapter when changes are made to the database. Certain Relational Databases allow applications to track changes by allowing APIs to access the database operation log. Depending on the particular database used, one mechanism may be more efficient than the other. Non-Relational Databases, such as Lotus Notes, often allow applications to register agents that can notify applications of changes. The Server implementation must carefully choose between alternative mechanisms available for different back-ends on a case-by-case basis.
Exploiting Application Characteristics
Large classes of applications that are commonly considered data synchronization applications have special characteristics. Some applications are read-only, where Client devices only read Server data and never change it. An insurance agent downloading daily rate quotes is a read-only application. Some applications are read-write applications, which do not generate any conflicts by design. For example, if one uses a personal email application from a single mobile device only (a common case), emails are read and written but a conflict never arises. When synchronizing such applications, the Sync Adapter can make direct calls to the datastore APIs without incurring the cost of invoking the Sync Engines.
Application usage pattern is also an important factor to consider. Many applications will generate a high load only during certain times of the day. If the Server is able to anticipate such high load conditions for an application, it can establish anticipatory connections to back-ends, and it can prefetch and cache back-end updates. These steps will likely reduce the overall latency observed by the application. Servers will need to maintain application usage statistics and reconfigure parameters based on such statistics.
Effective Use of Concurrency and Asynchrony
Experiences in system design show that a production SyncML Server must be able to exploit concurrency effectively. One can imagine designing the Server as one complex, monolithic procedure that implements all the functions of Protocol Management, Sync Engine, and Data Management. Such a system is not conducive to low latency and high performance, as an arriving Client request will need to wait until every earlier request is completely processed. In addition, such a monolithic system is difficult to extend for new data types and applications.
Figure 12-1 offers a tacit suggestion that a SyncML Server can be built in pipelined stages. Each stage of the pipeline offers concise functions such as authentication, SyncML Message parsing and/or generation, conflict detection/resolution, and data access operations. Pipelined stages allow a preceding stage of the pipeline to service the next Client request while the current stage handles the current request, increasing the throughput of the system dramatically. Multiple concurrent threads can realize each pipelined stage. Ideally, using n threads for a stage reduces the average delay of the stage by a factor of n.
If threads in one stage wait for threads in the next stage to complete, the overall benefits of concurrency are greatly reduced. If the Sync Adapter thread waits for a response from the Sync Engine thread, which in turn waits for a response from the datastore thread, the system overall slows down appreciably. Communication between threads should be asynchronous. A thread in a precursor stage should be called back when results from a thread in a successor stage are available. Languages such as Java natively support threads and asynchronous communication. Similar thread libraries are available for languages such as C and C++. There may be situations when different stages of the pipeline are implemented in different languages or executed in different processes, or in different machines altogether. In such situations, it may be beneficial to use message-queue-based communication, such as IBM MQSeries® or Microsoft Message Queue.
Load balancing among different physical Servers is another way of achieving concurrency and high throughput. Load-balancing SyncML Servers is discussed in Chapter 4. The reader should note that different stages of the SyncML Server could also be load-balanced, with multiple machines dedicated to support a particular stage.
Failure and Recovery
Several failures can occur in an end-to-end data synchronization system. The Clients can fail, communication can fail, back-ends can fail, and the Server itself may crash or fail. In SyncML, the Server is expected to facilitate recovery from Client and communication failures. The Server maintains synchronization session information for each Client. The Server can detect Client failures by detecting discrepancies between sync anchors and initiate a slow synchronization with a Client. The Server can also use the stored session information to restart an ongoing synchronization session with a Client after a communication failure. The Server should be designed to mask transient back-end failures by retrying back-end operations or temporarily buffering back-end operations.
It is also important that the Server be prepared for the eventuality that it can crash or fail. State information such as capabilities, session, and identifier mapping should be in persistent storage or be backed up in persistent storage regularly. Using multiple machines per stage not only distributes load effectively but also can help continued operations when certain machines fail. For certain SyncML operations (such as Atomic, which has transactional semantics), the Server may be required to implement checkpoint and recovery functions.