Comparing Traditional Systems | JavaT P2P Unleashed

Now that we understand the basic building blocks of P2P networks, and have defined criteria for comparison, we can take a closer look at important distributed systems.

Usenet News and NNTP

Usenet News was based on the Unix-to-Unix copy protocol (UUCP), and was developed in the late 70s. It automates machine-to-machine file distribution and topic-based classification for posting messages between systems. Usenet was designed as a decentralized system with loose control over the topology and the reliability of the network. This lack of central control and network reliability compares favorably to the P2P model.

Today, Usenet uses the Network News Transport Protocol (NNTP) to coordinate the exchange of messages between newsgroups. Newsgroup articles are distributed via news servers, which contain databases of articles, and are operated by Internet service providers, universities, and public companies.

A newsgroup article is posted to a designated news server. The news server then sends copies of the article to its peers that have agreed to exchange articles. Those servers in turn send copies to their peers. Eventually, every server that carries the newsgroup has a copy. This sounds familiar to P2P propagation techniques, and to the communication patterns of products like Jabber.

However, there is no concept of discovery in the protocol. Servers are configured to receive articles from specific peers and send articles to specific peers based on newsgroups. In effect, the peer network is static and known in advance. Articles are replicated in many places on the network, and the timely delivery of any one article to a site is not guaranteed.

Since NNTP is similar in many respects to existing peer-to-peer applications, it's interesting to examine how NNTP manages the propagation of content in some detail.

Loop control is handled by a trace list and a list of the Message-IDs of received messages. Using these controls, a server can reject a message that it has already seen. The Path header shows the sites that the article has traveled through, between the originating server and the current server. This is similar to the technique used by router peers in P2P networks. If the receiving server appears in the Path line, the sending server does not try to send the article, because it knows that the receiving server has already received a copy.

The Message-ID header contains an identifying code that is unique for every article. Before transmitting the article, the sending server queries the receiving server to determine whether it already has the article or requires it to be transmitted.

Usenet News standardizes two variants of the NNTP protocols one for communication between peer servers, and one for communication between a client and a server.

Information about a user of Usenet News is stored on the client. The server might not even know the identity of the clients using it. News clients keep state information to connected news servers. If the servers are not carefully synchronized, clients can lose important session information.

Usenet News does not support closed groups easily. When you post an article, it can literally go around the world!

The decentralized model is the primary reason for the P2P comparison. Identity, presence, and virtual spaces are not well defined. Control is based on administrative parameters, and there is a definite distinction between client and server roles. In addition, security can be an issue because access controls are minimal.

Email

As opposed to news, which provides public communication, email provides private communication. Email messages are addressed to specific individuals or groups. The primary components of an email system are

Mailbox Typically a file or directory where messages are stored.
User agent An application run directly by a user to interface with a transfer agent. This is probably the most recognized component of the system.
Transfer agent The component that is used to transfer messages between machines. Transfer agents in effect resemble communicating peers from a functional perspective. They form a chain of responsibility for transferring email between cooperating agents.
Delivery agent The delivery agent is responsible for adding the message to the user's mailbox. The transfer agent recognizes the destination address, and passes the message to the delivery agent for delivery.

Each message consists of two parts. The headers indicate the message's author, recipient, subject, the time and date of its creation, and so on. The body contains the actual information/message from the sender.

Transfer agents communicate using a transfer protocol. There are many in existence, but the most common is SMTP (Simple Mail Transfer Protocol). Mail routing uses the Domain Name System (DNS). Mail eXchanger (MX) records are maintained by domain name servers to tell mail transfer agents (MTAs) where to route mail messages. These MX records vary depending on the domain. An MX record has three parts: your domain name, the name of the machine that will accept mail for the domain, and a preference value. The preference value lets you designate the order in which multiple mail servers will be accessed to accept mail deliveries. This provides a form of fault tolerance for your domain.

The routing functionality of Internet email is based on DNS. The concept of dynamic discovery is not applicable in the email network. The comparisons to P2P arise from email servers being decentralized. However, email is actually based on the hierarchical structure of DNS. Email does have the concept of identity. In effect, your email address is your unique identity within a specific domain.

User agents are the client component of an email system. In this respect, email has a traditional client/server partitioning of functionality. Transfer agents could be considered peers. SMTP is well understood, and provides a standard protocol to enable message transfer.

Integration is occurring between instant messaging products and email. Expect this trend to continue.

Domain Name Service (DNS)

As explained in Chapter 1, the Domain Name System (DNS) is a distributed Internet directory service. DNS is used to translate between domain names and IP addresses, and to control Internet email delivery. The Internet relies on DNS to organize and control distributed resource lookup and decentralized email delivery. Without DNS, the Internet, and more specifically the Web, would not be possible.

DNS consists of 13 root servers in various worldwide locations that contain replicated information. These servers maintain information about top-level domains, such as where to find other DNS servers that have more specific information about domains such as those represented by .com or .edu. These servers are then used to find other DNS servers representing subdomains that are responsible for identifying the address of a specific host. Although DNS is comprised of thousands of machines distributed globally, it acts as a single directory with servers communicating to servers to provide address resolution.

The DNS directory contains billions of resource records that are split into millions of files called zones. There are two types of DNS servers, authoritative and caching. Authoritative servers are responsible for maintaining information on specific zones. Caching servers use authoritative servers to query for specific DNS zone information. This helps to improve the scalability, efficiency, and response time of the DNS system in general. Like DNS, P2P networks are developing rendezvous peers that are used to query and cache information for peer discovery. In this respect, a rendezvous peer could be considered authoritative or caching depending on its implementation.

DNS is a massively distributed and decentralized database. Its role as an underpinning for the Internet cannot be over-emphasized. It provides the identification and hierarchy to address and access resources across the Internet. P2P systems are augmenting DNS capabilities to support, higher-level routing, edge devices, and enabling firewall penetration.

In the next few sections, we will compare and contrast P2P to the new entrants in distributed processing. We'll see how these technologies Web services, Jini/JavaSpaces, and JXTA relate to P2P.