Group Communication | Scalable Internet Architectures

Every distributed architecture has one inherently fundamental component that is also critical to its overall effectiveness: communication between the various participants. Communication between only two nodes in the Internet may occasionally pose some problems as well, but the methods and caveats are well understood and are the object of any Introduction to Networking class. Basically, using UDP/IP for unreliable communication and TCP/IP for reliable communication covers 99% of point-to-point communication needs. What happens, though, when communication is needed between several participantsespecially if they are not all part of the same LAN?

Point-to-point communication between all participants without any additional logic is obviously both expensive and chaotic. Reliable IP-multicast can be employed in certain situations, but it is more adequate in scenarios that involve a single sender and a large number of receivers, whose identities are not necessarily important to the sender. In contrast, the problems that we want to tackle usually involve a relatively small number of participants, whose identities are important to be known at all times, and who may act simultaneously both as senders and receivers. This points us toward another possible communication paradigm: group communication.

The group communication paradigm provides a framework meant to ease the process of managing the communication aspect of distributed applications. The paradigm provides an intuitive abstraction and a set of communication primitives with meaningful and well-defined properties that are not trivial to satisfy in an asynchronous, unreliable network.

First, we identify the participants (or processes) in a distributed application as members of a group. Any member of a group can send messages to the entire group and also receive all the messages sent by the other members. Groups may also be open, in which case processes that are not part of the group are allowed to send messages to the entire group.

The second important abstraction, directly related to the notion of group, is the group membership. A group communication system provides the process with primitives that identify all the members of a group that the process is part of at any given moment time. This may include notifications when new members join the group or when current members leave the group, either voluntarily or due to intermittent network communication issues or process crashes.

Finally, group communication systems provide primitives that enable and govern the communication between the participating processes. These primitives are basically group broadcast primitives whose properties are defined from two perspectives:

Reliability
Ordering

Messages sent to a group may be unreliable or reliable. Unreliable messages may be lost and are not recovered by the group communication system. Reliable messages are received by all members of a group, as long as they do not crash or become otherwise disconnected from the group.

The ordering guarantees define the order in which messages are delivered by the group communication system to each recipient. Several common ordering guarantees are identified here:

FIFO orderingIf process X sends messages A and B, in this order, all members of the group who receive A and B will receive them in the order in which they were sent.
Causal orderingIf messages A and B are sent by process X in this order, or if process X sends message B subsequent to receiving message A sent by another member, all members of the group will receive message A before receiving message B (B is potentially causally dependent on A). Causal order is an extension of FIFO ordering.
Total orderingIf process X receives messages A and B in this order, any process Y that receives messages A and B will receive them in the same order. Total ordering is not necessarily consistent with FIFO or causal ordering, although in practice it is particularly useful when combined with causal or at least FIFO guarantees.