Changes made to different copies of a datastore can be propagated to other copies of that datastore in different ways. The synchronization topology defines the logical flow of the changes propagating through the network of computers hosting instances of that datastore. The four major topologies are:
The one-to-one topology is the simplest case. The other topologies can be seen as an extension of this one. Here the data is only shared between one server (the square in Figure 1-1) and one client (the circle in Figure 1-1). A possible usage scenario for this topology is a datastore that is mirrored for backup purposes. All changes made to the client are also sent to the server to ensure that its copy of the data reflects the current version of the client copy. Assuming that data is only changed in the client directly (i.e. no modification is made to the server copy besides synchronizing with the client), then there is no risk of any conflict in this topology. The one-to-one topology is also known as the "Dedicated Pair" topology.
Figure 1-1. One-to-one topology
This kind of topology is also used between someone's PDA and personal computer, with the difference that changes are usually made on both the PDA and the personal computer. In this case, the conflicts are typically identified by the PC and directly resolved on the PC. In some cases the conflict is marked and the user is asked to resolve it.
Numerous commercial systems are examples of the many-to-one topology (also known as central master or star topology). In this topology, data is propagated from a central master to the different entities containing copies of the data, as shown in Figure 1-2.
Figure 1-2. Many-to-one topology
The main advantage of many-to-one topology is its relative simplicity to implement compared to many-to-many topology, which is described in the next section.
All clients exchange data with the central server only two clients cannot exchange data directly without the intermediary central server. Because of this characteristic, conflicts can only arise at the central server, which needs to detect and resolve them. The clients themselves do not need to worry about conflicts. They just inform the central master about the local modifications and process the change requests they receive from the central master. There is no need for the client to determine where to send it, as in the many-to-many topology.
This topology is common when a person has a PDA, a cellular phone, and a personal computer sharing an application such as the calendar application, and both the cellular phone and the PDA are synchronized with the personal computer (but not between themselves). This kind of interaction is also common when family members carry cellular phones and update their shared family Web calendar independently or when mobile employees in an enterprise update inventory datastores independently.
The drawback of this architecture is that the central master could become a bottleneck, a single point of failure that could immobilize the entire system. Let's consider an Internet service provider scenario with a central master that serves several hundred thousand accounts, all trying to synchronize with the same central datastore. Here the central master should not be a single server, but a cluster of high-performance servers to limit the latency in response time even if one of the servers fails.
In many-to-many (or peer-to-peer) topology, there is no central server. Every client is also a server, as shown in Figure 1-3. For simplicity in this chapter, the client/server combination on each device in the many-to-many topology is just called client.
Figure 1-3. Many-to-many topology
Every client gets updates from and sends updates to every other client. After a record on one client is updated, this client is responsible for updating all the other copies of the data on all the other clients to ensure that the consistency of the distributed datastore is maintained. This might be by directly contacting the other clients or by sending the updates to the clients nearby, which are then responsible for propagating it further.
Consequently, every client must be able to detect and resolve conflicts. This requires more complex software on each client, which naturally increases the implementation cost, especially on small mobile clients, like mobile phones, in which memory is a scarce resource.
Compared to the many-to-one topology, the many-to-many topology is more robust but also clearly adds to the complexity. In this topology, it is very difficult to find out if a modification was indeed propagated to all clients at a given point in time.
One advantage of the peer-to-peer topology is that without a central server, there is no single point of failure. Every client has a copy of the data and can act as a server. The clients can continue to work and exchange data despite failures in other parts of the network. A client can retrieve updates from the closest server in the network, which gives quicker access to data otherwise stored remotely.
This topology may occur whenever there is no notion of a primary datastore involved in the system. Consider a team of emergency response workers taking readings such as measured temperatures, toxin levels, and structural stress conditions in a building or an affected area. They can synchronize these readings as they pass by each other using direct wireless or infrared links between their handheld devices.
Hybrids of Many-to-One and Many-to-Many
In an effort to combine the advantages of many-to-one and many-to-many topologies, hybrids containing characteristics of both types can be used, as shown in Figure 1-4.
Figure 1-4. Hybrids of many-to-one and many-to-many topologies
The cluster consists of a two-level structure of data copies. The top level consists of a cluster of servers. All servers contain copies of the data and replicate between each other, but for each data object only one server keeps the authoritative copy. The other servers are unaffected by the failure of one of them. Using geographically distributed servers can contribute to reducing the distance between server and clients.
In a hierarchy, the server structure could be modeled according to the organizational structure of a company. The top part of the figure shows servers, which are at the same time clients of a server one level above. In this structure, even when one section experiences a failure, the overall topology can still work properly.
In commercial implementations using a central master topology, the master server itself consists of a cluster of servers accessing a central datastore. This setup guarantees high availability and reduces the disadvantages of a central master topology with regard to the single point of failure. Nevertheless in this setup the servers are physically at the same location and a network failure could make them unreachable. That would not be the case in the cluster or hierarchy topology, as described above.