Transport | JavaT P2P Unleashed

In P2P systems, the transport refers to how messages are exchanged between peers. Peers must have some mechanism to transmit data over the network. There are a number of low-level activities that must be managed in order to ensure orderly communication, and low-level network programming can be difficult.

Fortunately, Java reduces the learning curve. As discussed earlier, the underlying details of network programming have been hidden by the Java programming API. This is both a blessing and, for some, a curse. The programming model is similar to the file programming model. So if you are experienced with file streams on local systems, reading and writing to remote systems should be familiar. In addition, Java's built-in multithreading capabilities ease the development of building applications that must handle concurrent multiple connections. Concurrent multiple connections are important to enabling peer group formation.

Chapter 6, "P2P Dynamic Networks," discusses the components and functions common to P2P networks. The transport is a critical component because of the impact dynamic discovery has on the performance of the overall system, and the frequency with which peers exchange data over the network. Placing the transport in context will help clarify the issues.

Models of Communication

The OSI reference model and the Department of Defense (DoD) Four-Layer model, illustrated in Figure 5.1, define and describe network communication services and protocols. They have become the standard reference models for application-to-application (A2A) communication.

Figure 5.1. The OSI reference model and DoD model are used to describe the layers of communication in networked applications. The DoD model is referenced as the model of the Internet.

graphics/05fig01.gif

The DoD model is the basis for the Internet protocols that are commonly used today. The model defines four layers:

The network access layer is responsible for the physical transmission of data over a specific hardware media. The protocols at this layer were not explicitly defined by the model creators to enable access to a wide array of hardware media. Some of the protocols common to this layer include Ethernet, Token Ring, X.25, and Frame Relay.
The Internet layer is responsible for routing and providing a single interface to higher-layer protocols. The IP layer has logical routing intelligence, while the network layer has physical point-to-point responsibility. The popular "ping" of ICMP resides in this layer.
The host-to-host layer abstracts the complexities of the network from applications. It handles connection rendezvous, flow control, retransmission of lost data, and other generic data flow management. The Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) are most often cited as the workhorses of this layer.
The process layer contains protocols that implement user-level functions. These include telnet for remote access, FTP for file transfer, NFS for file sharing, and SMTP for mail delivery. These protocols implement the set of functions most readily known to users of distributed applications.

From a P2P perspective, the most noteworthy layers are the host-to-host and process layers of the model. More specifically, at the host-to-host layer you are concerned with the two workhorse protocols, TCP and UDP, while the focus of the process layer is on HTTP, SMTP, and new P2P entrants such as XML-defined JXTA.

Discovery Implications

Peer discovery requires that you make a number of transport design decisions. For instance, do you use connection-oriented (TCP) or connectionless-oriented (UDP) communication to discover peers?

TCP creates a virtual circuit that remains open and connected during the life of the session, and guarantees a certain level of reliability during the exchange and thus the discovery process. This connection-oriented transport generally has additional overhead because of the number of packets that must be exchanged and acknowledged.

The alternative is to use a connectionless transport such as UDP, which does not guarantee packet delivery. In addition, it does not maintain a session for communication. This has the tradeoff of speed versus reliability and performance.

Although most broadcast and multicast styles of discovery use connectionless communication, there are examples of connection-oriented transports, such as Gnutella, that use TCP-based broadcasting.

Virtual Namespaces

As mentioned before, dynamic networks are an essential characteristic of P2P. Dynamic networks are enabled by discovery, identity, presence, and a virtual space (network) bound by the discovery horizon.

Java uses the InetAddress class to represent an Internet network address. This class supports the getByName method, which retrieves the IP address of a list given its name. InetAddress makes it easy to map a host name to its address using the local and network naming services available, including DNS and NIS. The InetAddress contains two fields: the host name and the IP address.

Virtual namespaces extend addressing beyond IP addresses, and might actually map a machine address to another domain-specific representation. As mentioned in Chapter 3, "P2P Application Types," identity and presence services are not dependent on IP addresses. They might map an identity to an IP address at runtime, but this mapping is required to address routing issues more than for identification. Namespaces are important because they help to establish a context for higher-level communication. This context is required if peer applications are to interoperate.

Routing

Routing of P2P traffic occurs on the network and application levels. From a network perspective, routing refers to the path the network chooses to transport a message. This usually involves charting a course over a number of intermediate routing hops. Network and subnet portions of the IP are used to determine route source and destination. The actual path the message traverses between these endpoints is dependent on the routing protocol. Different routing protocols are used to improve efficiency and optimize network usage.

P2P addresses higher-layer (application) routing issues, which might affect (overlay) network layer routing. For instance, alternate routing often involves establishing paths between edge devices that would not normally be possible. In P2P networks, the path between the nodes emerges through the information sharing patterns, rather than being enforced by static configurations. Firewalls and NAT (Network Address Translation) devices are often used to block transmissions or constrain point-to-point communication. P2P networks have come up with novel ways of addressing these constraints and restrictions.

NAT devices enable nodes in a private network address domain to access the Internet when an insufficient number of unique public network addresses are available. They are also used to hide the identity of individual nodes from the network at large. Firewalls provide security by limiting network traffic allowed from the Internet to and from a private network. Firewalls employ three basic techniques:

Packet filtering Packet filtering involves examining network traffic passing through and selectively passing or dropping packets.
Proxy servers Proxy servers examine and filter packets, but also act as a relay between hosts behind the firewall and the external network domain. In this regard, they perform address translation functions similar to a NAT device. External systems only see the proxy server address.
Stateful-inspection Stateful-inspection methods monitor communication protocol and application-level state changes. An application context is built from the state changes to enable more intelligent inspection decisions based on the type and location of packets that should be exchanged in conjunction with the security policies in effect.

Most firewall and NAT configurations do not permit bidirectional communication between two peers separated by NAT devices and/or firewall(s). Solutions permitting outbound connectivity are usually more mature than solutions enabling inbound connectivity. This is because externally-initiated communication is assumed to be untrustworthy, and is not as predominant in client/server computing.

NAT devices and firewalls also typically limit communication so that only responses to requests that originated behind the devices are allowed back in. This blocks communications from hosts outside the protected domain to those hosts inside the domain. Often, certain ports/services are closed down completely and access is denied.

These restrictions reduce the utility of peer-to-peer networks by prohibiting public nodes from initiating communication to blocked nodes, and by preventing direct communication between pairs of blocked nodes. This has a dramatic impact on impeding the formation of a P2P network. If you are behind a firewall, special configuration will be required to enable participation.

The Peer-to-Peer Working Group (www.peer-to-peerwg.org) has identified three primary techniques for overcoming firewall and NAT translation routing restrictions:

The node behind a NAT device or firewall initiates communication with a node that is publicly addressable. Examples of this include Napster and Gnutella.
Using a rendezvous server to provide a repository for advertisement information, such as that used by JXTA to support discovery.
Using a publicly visible node to act as a relay, or router, for blocked nodes (for example, JXTA).

These solutions rely on a third-party host, running special software that can help nodes behind firewalls detect what kind of connectivity they have and help broker connections with other nodes. The following brokering techniques (see Figure 5.2) are often used:

Reverse the connection Whenever Node A wants to communicate with Node B, if Node B cannot receive connections, you would use a third party to tell Node B to initiate a connection to Node A. (This assumes Node B can initiate a connection to Node A.)
UDP requests The NAT device can allow UDP to "open up" a hole that allows incoming traffic and routes it to the machine that originally sent it. Node A, behind a NAT device, sends UDP packets to a third party. Then the third party tells Node B where to send packets to Node A. Reversal only works when one of the communicating nodes is behind a NAT device, while this solution works even if both Nodes A and B are behind NAT devices.
Push requests With this solution, intermediaries receive requests from clients that are unable to establish direct communication. The intermediary forwards the request to the intended target. The model is based on "pushing" the request to the intended recipient, who then establishes a connection with the push originator. This is similar to the Gnutella network solution, in which no central or special purpose servers need to be defined.

Figure 5.2. Three common models of peer-to-peer routing around and through firewalls and NAT devices.

graphics/05fig02.gif

Overlay networks are an area of rapid development and change. Expect this to continue to evolve as peer-to-peer technologies become more prevalent in government, university, and corporate settings.