In P2P systems, the transport refers to how messages are exchanged between peers. Peers must have some mechanism to transmit data over the network. There are a number of low-level activities that must be managed in order to ensure orderly communication, and low-level network programming can be difficult. Fortunately, Java reduces the learning curve. As discussed earlier, the underlying details of network programming have been hidden by the Java programming API. This is both a blessing and, for some, a curse. The programming model is similar to the file programming model. So if you are experienced with file streams on local systems, reading and writing to remote systems should be familiar. In addition, Java's built-in multithreading capabilities ease the development of building applications that must handle concurrent multiple connections. Concurrent multiple connections are important to enabling peer group formation. Chapter 6, "P2P Dynamic Networks," discusses the components and functions common to P2P networks. The transport is a critical component because of the impact dynamic discovery has on the performance of the overall system, and the frequency with which peers exchange data over the network. Placing the transport in context will help clarify the issues. Models of CommunicationThe OSI reference model and the Department of Defense (DoD) Four-Layer model, illustrated in Figure 5.1, define and describe network communication services and protocols. They have become the standard reference models for application-to-application (A2A) communication. Figure 5.1. The OSI reference model and DoD model are used to describe the layers of communication in networked applications. The DoD model is referenced as the model of the Internet.The DoD model is the basis for the Internet protocols that are commonly used today. The model defines four layers:
From a P2P perspective, the most noteworthy layers are the host-to-host and process layers of the model. More specifically, at the host-to-host layer you are concerned with the two workhorse protocols, TCP and UDP, while the focus of the process layer is on HTTP, SMTP, and new P2P entrants such as XML-defined JXTA. Discovery ImplicationsPeer discovery requires that you make a number of transport design decisions. For instance, do you use connection-oriented (TCP) or connectionless-oriented (UDP) communication to discover peers? TCP creates a virtual circuit that remains open and connected during the life of the session, and guarantees a certain level of reliability during the exchange and thus the discovery process. This connection-oriented transport generally has additional overhead because of the number of packets that must be exchanged and acknowledged. The alternative is to use a connectionless transport such as UDP, which does not guarantee packet delivery. In addition, it does not maintain a session for communication. This has the tradeoff of speed versus reliability and performance. Although most broadcast and multicast styles of discovery use connectionless communication, there are examples of connection-oriented transports, such as Gnutella, that use TCP-based broadcasting. Virtual NamespacesAs mentioned before, dynamic networks are an essential characteristic of P2P. Dynamic networks are enabled by discovery, identity, presence, and a virtual space (network) bound by the discovery horizon. Java uses the InetAddress class to represent an Internet network address. This class supports the getByName method, which retrieves the IP address of a list given its name. InetAddress makes it easy to map a host name to its address using the local and network naming services available, including DNS and NIS. The InetAddress contains two fields: the host name and the IP address. Virtual namespaces extend addressing beyond IP addresses, and might actually map a machine address to another domain-specific representation. As mentioned in Chapter 3, "P2P Application Types," identity and presence services are not dependent on IP addresses. They might map an identity to an IP address at runtime, but this mapping is required to address routing issues more than for identification. Namespaces are important because they help to establish a context for higher-level communication. This context is required if peer applications are to interoperate. RoutingRouting of P2P traffic occurs on the network and application levels. From a network perspective, routing refers to the path the network chooses to transport a message. This usually involves charting a course over a number of intermediate routing hops. Network and subnet portions of the IP are used to determine route source and destination. The actual path the message traverses between these endpoints is dependent on the routing protocol. Different routing protocols are used to improve efficiency and optimize network usage. P2P addresses higher-layer (application) routing issues, which might affect (overlay) network layer routing. For instance, alternate routing often involves establishing paths between edge devices that would not normally be possible. In P2P networks, the path between the nodes emerges through the information sharing patterns, rather than being enforced by static configurations. Firewalls and NAT (Network Address Translation) devices are often used to block transmissions or constrain point-to-point communication. P2P networks have come up with novel ways of addressing these constraints and restrictions. NAT devices enable nodes in a private network address domain to access the Internet when an insufficient number of unique public network addresses are available. They are also used to hide the identity of individual nodes from the network at large. Firewalls provide security by limiting network traffic allowed from the Internet to and from a private network. Firewalls employ three basic techniques:
Most firewall and NAT configurations do not permit bidirectional communication between two peers separated by NAT devices and/or firewall(s). Solutions permitting outbound connectivity are usually more mature than solutions enabling inbound connectivity. This is because externally-initiated communication is assumed to be untrustworthy, and is not as predominant in client/server computing. NAT devices and firewalls also typically limit communication so that only responses to requests that originated behind the devices are allowed back in. This blocks communications from hosts outside the protected domain to those hosts inside the domain. Often, certain ports/services are closed down completely and access is denied. These restrictions reduce the utility of peer-to-peer networks by prohibiting public nodes from initiating communication to blocked nodes, and by preventing direct communication between pairs of blocked nodes. This has a dramatic impact on impeding the formation of a P2P network. If you are behind a firewall, special configuration will be required to enable participation. The Peer-to-Peer Working Group (www.peer-to-peerwg.org) has identified three primary techniques for overcoming firewall and NAT translation routing restrictions:
These solutions rely on a third-party host, running special software that can help nodes behind firewalls detect what kind of connectivity they have and help broker connections with other nodes. The following brokering techniques (see Figure 5.2) are often used:
Figure 5.2. Three common models of peer-to-peer routing around and through firewalls and NAT devices.Overlay networks are an area of rapid development and change. Expect this to continue to evolve as peer-to-peer technologies become more prevalent in government, university, and corporate settings. |