Discovery answers the big questions about a network:
P2P is forced to identify answers to these questions. Unfortunately for the Java developer, not all P2P technologies are successful. Worse yet, some P2P technologies are closed and proprietary, or they hard-code implementations into one solution that would otherwise use open technology. Although many P2P techniques exist to build peers, three types of peers have emerged as popular designs:
A simple peer is designed to be an endpoint that offers functions and data to peers making requests. Simple peers have the least responsibility of all three peer types. They usually reside outside a general network, and possibly behind a firewall or Network Address Translation (NAT) router. Simple peers are not expected to handle communication on behalf of other peers, or to serve information that they don't directly consume themselves. Rendezvous peers provide a dating service in which peers discover other peers and peer resources like data and functions. All three types of peers issue discovery queries to rendezvous peers, but the rendezvous peer is also usually a cache of previous requests. When a rendezvous peer lives behind a firewall, it must have the ability to communicate through the firewall to other peers. Router peers provide a mechanism for peers to communicate through firewalls and NAT routers. A router peer tunnels peer requests across a network. The information needed to use a router peer is enough to replace the need for a Dynamic Naming Service (DNS) and supports dynamic IP addressing. Let's look at a simple example of the three peers in action. Imagine using a P2P client that looks for magazine articles on human genomics. The user initiates a search for the articles with a simple peer. The peer sends a discovery query to all its known simple peers and rendezvous peers. The rendezvous peers that receive the query look to see whether they have data the simple peer is looking for. If so, the rendezvous peer might return a discovery response message containing advertisements from other peers that are stored in its cache. The rendezvous peer will also likely send along the same query to its list of known peers. Although we have described three different types of peers, in real-world P2P applications each peer might include a combination of the functions described in simple, rendezvous, and router peers. Let's look at how peers discover data, functions, and services using a variety of P2P techniques. Router Peers and Dynamic NetworksP2P technology expects to find a network filled with firewalls, dynamic addresses, and changing peer locations. P2P provides a loose coupling of peers, so the P2P network remains functional even when parts of the real network break. Three P2P discovery techniques have become popular in this environment:
These techniques will be joined, modified, and abandoned over time as new ways to dynamically form a network are identified. The following are some of the areas of study from which P2P technology innovations might spring:
BroadcastsTraditionally, broadcast messages have been sent by devices that deal with network routing or data packet exchange at a low level, such as routers. Broadcast messages on IP networks contain a special address reserved for broadcasting. The network and host part of the address is set to ones (hex: FFFFFFFF). This indicates to the network layer that the packet is addressed to every device on the subnet, as seen in Figure 6.1. Figure 6.1. Broadcasts try to reach all nodes on the subnet.In a P2P context, broadcasting might sound like TCP/IP multicasting, but it isn't. P2P technology plays mostly in the application layer of a software application. The actual method for moving a broadcast message across the Internet might use multicasting or a number of other techniques that we will explore next. Transport Multicast Versus Unicast MessagingMulticast messaging is often compared to radio or TV broadcasts, in the sense that only those who have tuned their receivers to a particular frequency receive the information. Only the channels selected are heard. The sender sends the information without knowledge of the number of receivers. In contrast, when you send a packet and there is only one sender and one recipient, this is referred to as unicast. A unicast transmission is by definition point-to-point. Unicast can be used to send identical information to many different destinations; however, this involves replicating data, and is not the most efficient transport. Multicast addresses are in the Class D 224 239 range. Multicast messaging uses this range of addresses to define multicast groups, as shown in Table 6.1.
Note You can find all the reserved multicast addresses at http://www.iana.org/assignments/multicast-addresses. Multicasting has produced mixed results in applications that require a number of machines in a distributed group to receive the same data, such as conferencing, group mail, news distribution, and network management. Multicasting suffers from the lack of a control protocol, which makes it unsuitable for large, reliable, and sustained transmissions. Multicasting appears to be well-suited to P2P because peers on a P2P network do not require the synchronization of data among the peers, as multicasting often fails to deliver 100% of its data to everyone listening to the multicast. Figure 6.2 shows multicasting being used in P2P networks for discovery. Figure 6.2. Multicasting goes beyond simple subnet penetration, but it requires that receivers listen on a specific "channel." The underlying network supports the transport services.Multicast advantages include the following:
Unfortunately, multicasting is not implemented everywhere. Hardware, specifically routers, often block multicast traffic from penetrating corporate networks or traversing ISP providers. Firewalls and NAT devices often block not only multicast traffic, but constrain traffic in general to well-controlled choke points (ports). As a result, additional means of discovery are generally required in scalable P2P networks. Radius of BroadcastBroadcast packets need to have a mechanism to avoid bouncing around the network forever. This can happen when there is invalid addressing or routing information delivered with a packet. The time-to-live (TTL) parameter (an 8-bit field in an IP packet header), has been defined to address this issue. It ensures that packets cannot traverse the network endlessly. Each packet has a TTL value, which is a counter that is decremented every time the packet passes through a hop; for instance, a router between networks. In the example in Figure 6.3, the TTL parameter is set to 4, and the broadcast request needs to make five hops (pass through five routers) to make it to the nearest peer. Peer-2 will never "hear" the broadcast request, and Peer-1 will never "know" about Peer-2 through this route. The packet will be discarded when the TTL count reaches zero. Figure 6.3. Time-to-live parameters define the extent to which a packet can travel across the network. Routers typically decrement the TTL value of the packet as it passes through the router. When it reaches zero, the packet is discarded.When a peer receives a request, it looks at the TTL value. If the value is greater than 1, it decrements the value and transfers the request to the destination address or the next hop. If the value is 1 or less, it discards the message. In this respect, the P2P network is providing a layer of control that "overlays" the network layer. Frequency of BroadcastMost systems that use broadcast techniques place some control on the frequency of the broadcast. For instance, when a peer activates, it sends a discovery message on the local subnet and waits a predetermined time before sending another discovery request. If no response is returned within that time interval, a subsequent request will be sent. In effect, the peer has started to poll the network. If responses are returned, the peer builds a map, or view of the peer network. This is important, because the peer view is probably very different from the physical view. The map reflects the peers that responded to the discovery request. As peers enter and leave the network, they must be able to update their view. One approach is to go into a heartbeat mode of polling. The peer periodically sends a discovery request. As responses are received, the map is updated. During the polling process, some peers might no longer be available. In Java fashion, these peers are eventually removed when the Java garbage collector destroys the object holding the instantiated map. New peers that respond will be added to the map. A simple ping map contains the list of peers that have responded to discovery requests. The ping map can be as simple as a list of active IP addresses, as in Table 6.2.
The ping map, which might also be viewed as a peer routing table, is built from scratch each time the peer activates. In this model, the peer does not implement the notion of memory. In other words, each time the peer activates, it invokes the discovery process and collects a new image of the peer network. This approach is unable to deal with many of the problems inherent with P2P networks. For instance:
The identity of the peer is directly mapped (implicitly) to the IP address. If a peer changes its IP address, it is considered a new member of the network. A history of prior interactions is not possible. The ping map can be extended to include the notion of identity, which resolves some of the problems. Persistence or memory of the peer network becomes more viable and attractive with identity. This approach requires each peer to have a unique ID. Once generated, the ID is fixed for the lifetime of the peer. When a discovery request is received, the responding peer returns its IP address (which might be different) and its unique ID (which never changes). This assumes that peers have a consistent method to generate unique IDs (see Table 6.3). ID collision occurs if two peers generate the same ID. Inconsistent ID representation (integer, String, UUID, and so on) causes identification problems throughout the network. Clearly, there are control mechanisms required even when using this simple approach.
Selective BroadcastInstead of sending a discovery request to every peer on the network, peers are selected based on heuristics such as quality of service, content availability, or trust relationships. Trust relationships are commonly used when a specific peer(s) acts as a relay or router to the peer network. Usually the trusting peer is seeded with the IP address of the trusted peer. This is the technique used by JXTA routing and rendezvous peers. The trusted peer has some knowledge of the network and is publicly available. Selective broadcast requires that you maintain historical information on peer interactions, peer roles, peer identity, and so on. It begins to extend the ping and identity map concept to include the following:
Selective broadcast systems are much more scalable than simple broadcast networks. Instead of sending a request to all peers, it is selectively forwarded to specific peers who have a higher probability of being able to locate other peers or resources. Each peer must contain or have access to information used to route or direct requests received. Although this might be appropriate from relatively small networks, in larger networks this overhead can quickly grow to levels that are unsupportable.
Adaptive BroadcastAs mentioned in Chapter 1, "What Is P2P?," adaptive broadcast tries to minimize network utilization while maximizing connectivity to the network. You can limit the growth of discovery and searching by predefining a resource tolerance level that, if exceeded, will begin to curtail the process. This will ensure that excessive resources are not being consumed because of a malfunctioning element, a misguided peer, or a malicious attack. Adaptive broadcast requires monitoring resources such as peer identity, queue size, port usage, and message frequency. Rules can be used to complement metadata to build sophisticated discovery techniques (See Table 6.5).
The ALPINE Network implements a form of adaptive broadcast in its adaptive social discovery protocol. It's based on the ALPINE-defined datagram protocol DTCP. See www.cubicmetercrystal.com/alpine/overview.html for more information on ALPINE networks and protocols. |