Mapping of Identifiers and Slow Synchronization | SyncML: Synchronizing and Managing Your Mobile Data


Team-Fly

	SyncML®: Synchronizing and Managing Your Mobile Data By Uwe Hansmann, Riku Mettälä, Apratim Purakayastha, Peter Thompson, Phillipe Kahn
	Table of Contents

	Chapter 5. Synchronization Protocol

In this chapter, handshakes and other features of the Synchronization Protocol have been introduced, and impacts have been considered. All of them together create the overall functionality provided by the SyncML Synchronization Protocol. Now is a good time to closely look at two important features of the Synchronization Protocol: identifier mapping and slow synchronization. Their impact on implementations and the overall infrastructure is explored below.

Nature of Identifier Mapping

As explained earlier, the whole idea of identifier mapping is to enable the Client and the Server to address data items using distinct identifiers. When synchronizing data and exchanging modifications, the identifiers allocated by a Client are used. The Server identifiers are only used when the Server adds a data item to a Client. Because of this, the Server needs to maintain identifier mapping with every Client that it synchronizes with. From the Client standpoint, this is a huge benefit because the Client does not have to store the long globally unique identifiers (GUIDs) used by the Servers.^[2] For wireless devices implementing the Client functionality, using the same data identifiers as the Server would, in general, be an impossible requirement.

^[2] The Servers commonly use globally unique identifiers (GUIDs) for data objects. The lengths of those GUIDs are typically in the range of 64 128 bytes. In practice, Clients use locally unique identifiers (LUIDs), whose lengths are equal to or less than 16 bytes.

Figure 5-7 depicts a simple situation in which one Server (Server A) only synchronizes with two Clients (Client A and Client B). Now, Server A has to keep two mapping tables, one for Client A and one for Client B. In other words, the Server knows which Client ID needs to be used for each data object having the mapping information. Also, if the Clients refer to a data item with a LUID (local unique identifier), the Server knows at which data item a modification is targeted.

Figure 5-7. One Server and two Clients in synchronization environment

graphics/05fig07.gif

Figure 5-8 depicts a more complicated situation, in which two Servers (Server A and Server B) synchronize with two Clients (Client A and Client B). Both Server A and Server B have the ID mapping of Clients A and B. It is possible that duplicate items may occur. For example, if Client A and Client B (both containing the same item) synchronize with Server A, duplication may occur due to the fact that the Clients have different LUIDs. Since the Client sends the modifications first, the Server has the role of detecting if a data item already exists with a different identifier. In other words, the Server must have the ability to prevent the creation of duplicates

Figure 5-8. Two Servers and two Clients in synchronization environment

graphics/05fig08.gif

To understand this issue more, the example in Figure 5-9 reveals some points. In that example, a user adds a data item into Client A, which is then synchronized to Server A. After that, Server A and Client B synchronize with each other. As a consequence, the data item is now also found on Client B. It is then the turn of Client B and Server B to synchronize their data. Now, Server B has the data item. Finally, Client A and Server B synchronize. Client A obviously sends the data item to Server B because it does not know that the item already exists there. At this moment, Server B cannot count on ID mapping because it has no mapping for this item with Client A. Thus, it needs to use another mechanism to find out that it already has the data item. If this mechanism succeeds, no duplicate of this data item is created in Client A and Server B.

Figure 5-9. Example of potential duplicates in identifier mapping

graphics/05fig09.gif

Mechanisms for preventing duplicates are very much implementation-specific and, as such, are outside the scope of the Synchronization Protocol. Overall, those mechanisms are based on the analysis of the content of a data item. Basically, this means that the content of an incoming data item is compared to existing content before deciding whether a data item needs to be added.

Slow Synchronization

The way to recover from failures that may happen during synchronization is to use slow synchronization. It is also clear that this should be avoided if possible, since a slow sync requires the transmission of a lot of data. Nevertheless, slow synchronization is sometimes needed.

The Synchronization Protocol allows a great deal of flexibility regarding the functionality of slow synchronization. The Synchronization Protocol specification defines it in the following way: "The slow sync is a form of the two-way synchronization in which all items in one or more databases are compared with each other on a field-by-field basis. In practice, the slow sync means that the client sends all its data in a database to the server and the server does the sync analysis (field-by-field) for this data and the data in the server. After the sync analysis, the server returns all needed modifications back to the client."

Server implementations can quite freely decide how they process slow synchronization, as the specification does not really define it strictly. In practice, there are major differences in how implementations behave when a slow sync is initiated. The drawback related to this is inconsistency i.e., the end-user experience offered by different servers can be very different. As a consequence, an optimized implementation of the slow sync operation is a good opportunity for a Server vendor to really differentiate itself and really show how good its implementation is. For instance, the differentiation can be related to the performance and the number of created duplicates.

Identifier mapping and slow synchronization have common elements because slow synchronization is always used when synchronization is done between a Client and a Server for the first time. It is possible that many new data items may enter the Server from the Client. Figure 5-10 gives an illustrative example to see how the slow synchronization operation links to the identifier mapping. In this example, Client A first synchronizes with Server A. After that, the Servers are synchronized with each other. Thus, the content, which was synchronized from Client A to Server A, is now in Server B, too. If it is assumed that the Client A and Server B were not earlier synchronized with each other, then slow synchronization is initiated when the two entities are synchronized with each other. When slow synchronization is started between Client A and Server B, all data items from Client A are sent. Server B should be able to detect that they exist already; if so, it will only need to update the mapping for these data items.

Figure 5-10. Slow synchronization in an environment of multiple devices

graphics/05fig10.gif

To summarize this subsection, the main messages to the developers are:

The slow synchronization operation is a very powerful tool to recover from the failures, but it should only be used when really necessary.
The identifier mapping operation is very useful when dealing with data items that have been synchronized earlier.
When using the slow synchronization operation or adding new items, checking the existence of the items needs to be based on the content of the items.

By following these rules, Client and Server implementation can skip many quality- and performance-related problems, which will put them well on the way to providing a great user experience.

An Example Synchronization Dataflow

This section illustrates a hypothetical synchronization scenario between a Client and a Server using the SyncML Representation and Synchronization Protocols. The example shows a Two-way synchronization, in which both the Client and the Server exchange respective updates. For simplicity, possible failure scenarios such as unmatched synchronization anchors are ignored. For brevity, exchanged information is logically expressed, instead of illustrating exact syntax. The first scenario shows the changes made to data items on the Client and the Server. Then, each subsection of this section outlines how information is exchanged during synchronization and what actions the Server and the Client take. This is analogous to the Message sequence chart in Figure 5-3. This is a normal Two-way synchronization and the Client initiates the synchronization session. Therefore Package #0 is not needed in this example.

In this example, the Client has made the following changes to its datastore since the last synchronization time with this Server:

Updating entry A
Deleting entry B
Inserting new entry C

During the same time the server datastore makes the following changes concurrently:

Updating entry A
Updating entry B
Inserting new entry D
Updating entry E

This will result in the following conflicts and the assumed corresponding resolutions:

Update/Update conflict on entry A. Resolved in favor of the Server by the Server.
Delete/Update conflict on entry B. Resolved in favor of the Client by the user. In other words, the Client deletes the update entry B coming from the Server.

Package #1: Client Initialization

This is the first Package (Package #1) in a Two-way synchronization used by the Client to initiate the session with the Server. The Package consists of the following:

Header (with serverID, clientID, and authentication data)
Alert command (with type=Two-way-sync, target datastore, source datastore, Last and Next synchronization anchors)

The header contains essential identifying information and authentication information. The Alert command indicates the requested Sync Type, and identifies the target and source datastores and the synchronization anchors.

Upon reception of this Message, the Server processes the header information, authenticates the Client, and authorizes it for synchronization with the target datastore. It compares the Last synchronization anchor with the stored Next synchronization anchor for this Client, identified by its clientID. The Server may inspect the Client's capabilities to make a decision if multiple Messages are needed for certain synchronization Packages in order to limit the buffering space needed for the Client to process the Packages.

Package #2: Server Initialization

This Package (Package #2) consists of the following:

Header (with serverID, clientID, and authentication data)
Status command for previous Client commands
Alert command for Server synchronization anchors

The Server sends its authentication data to enable a Client to authenticate the Server, if necessary. This type of Server authentication is essential for applications where data on Client devices is sensitive or proprietary data that must be secured from theft. The Server may send its synchronization anchors which the Client may match with its synchronization anchors to determine if it should proceed with normal synchronization.

Package #3: Client Modifications

This Package consists of the following:

Header
Several Status commands to reply to the commands from Server
Sync command with the following operations: Replace A, Delete B, and Add C

The Server performs the necessary authentication and begins to process the Sync command. It finds that the Client's update of entry A conflicts with its own update of the same item. The Server determines that its update takes precedence over the Client's update. It prepares a Status command (Status 1), indicating the conflict and its resolution, which is included in the next Package from the Server to the Client. It also finds that the Client's delete operation on item B conflicts with its update operation on the entry B. In this case, however, it determines that the Client should resolve this conflict. The Server prepares a Status command (Status 2), which indicates the identification of this conflict but no resolution. The Server proceeds to add the entry C to its datastore and prepares a Status command (Status 3), which indicates the addition of this entry. Since the entry C is a new item created by the Client, it only has a LUID associated with it. The Server determines the appropriate GUID for the entry C and updates its mapping table for this particular Client, storing the LUID and GUID for the entry C.^[3]

^[3] See data identifier mapping details, discussed earlier in this chapter.

Package #4: Server Modifications

This Package (Package #4) consists of the following:

Header
Status commands (Status 1, 2, and 3)
Sync command with the following operations: Replace A, Replace B, Add D, and Replace E

The Status commands include the status information described in the previous section. As discussed, Status 1 and Status 2 indicate the conflicts in entries A and B. The Sync command contains the Server's updates to entries A and B. For entry B, the Client ignores the command and keeps entry B as deleted. The Client also proceeds to add the new item D. The Client assigns the LUID for item D. The Client then creates a Map command, which contains the assigned LUID. Also, it prepares a Status command, which indicates the addition of this entry. The Status and Map commands are included in the next Package to the Server. The Client processes the Replace command for E, and makes the necessary update on its local datastore and also prepares a Status command for this update.

Package #5: Status and Map

This Package consists of the following:

Header
Status commands for the addition and update
Map command for ID mapping

The Status commands include the status information described in the previous section. The Map command contains the Client's LUID and the Server's GUID for the new entry D. The Server extracts the LUID from the Map command and updates its data-identifier mapping table for the entry D. At this point (or just after sending Package #6), the Server also updates the stored sync anchor for the Client. This stored sync anchor must match the Client's Last sync anchor in the next session.

Package #6: Map Acknowledge

This Package consists of the following:

Header
Status command for the Client Map command

This Package primarily acts as a final acknowledgement from the Server that the synchronization is complete, including data-identifier mapping. At this point the Client may update any sync anchors that it may store for the Server.


Team-Fly

Top