A Brief History of P2P Networks | Skype: The Definitive Guide

When you hear people talking about a P2P network, generally, they are referring to two things:

The format or communications protocol that a group of technical people have agreed will be used to transmit and receive data among a particular set of devices that need to share information in a network
The decentralized P2P computing infrastructure described briefly in Chapter 1

Although not everybody agrees on the exact definition of a P2P network, there is broad agreement that P2P networks take advantage of the processing power, bandwidth, and file-storage and -retrieval capabilities provided by the individual computers in the network (versus using centralized computing equipment and servers). For an illustration of centralized computing equipment and servers, see Figure A-1

Figure A-1. Centralized computing equipment and server

In a "true" P2P network, the individual computers and computing devices, or nodes, at the edges of the network join dynamically to route network traffic and to process CPU-intensive and bandwidth-intensive tasks (see Figure A-2).

Figure A-2. P2P decentralized network

The P2P network protocol itself is a technical description of a specific set of rules, as well as the types of interactions or behaviors that result when the rules are enacted. The protocol establishes what the network can do and the capabilities it has to offer. P2P networks also rely on an application, which generally is a software program that enables people to use the P2P network for a particular purpose.

You can think of the protocol and application in terms of transportation infrastructure and automobiles. The traffic laws, painted guidelines, signs, and stoplights act like a protocol. They define where vehicles should go and how they should behave. Your car is like an application, because it enables you to accomplish something. If you want to go somewhere, you use the application (the car) and follow the protocols (the rules of the road). The network is the whole system of roads that takes you from place to place.

Different kinds of transportation require different types of vehicles. Different uses of the P2P used to require different types of applications. People exploited applications like the original Napster to swap MP3 files over the Internet, and using the Internet to make voice calls requires an application like Skype.

The Earliest P2P Networks and Applications

The earliest P2P networks were, put kindly, difficult to use. People didn't think of them as P2P systems per se, because they were limited-use research and business-oriented networks. The early P2P networks were difficult to use because the engineers who designed them were less focused on ease of use and more focused on the technical details related to how the bits of data should be distributed and shared among the peer computers for optimal performance.

Moreover, in the absence of a "killer app" to fuel adoption (such as downloading MP3s), the engineers were less concerned with figuring out how to make the networks simple to search and straightforward to use. Since these early beginnings, three generations of P2P networks have followed.

First-Generation P2P Networks and Applications

ICQ and to a lesser extent Napster were largely considered to be the first generation of popular P2P networks. These P2P networks worked by allowing someone to connect directly across the network to someone who was using a copy of the same program.

Although there are stronger similarities between ICQ and Skype, Napster provides a simple and straightforward example of how this type of P2P network functioned. Napster (the company) maintained central servers, which hosted a directory search index of the files that each Napster user had on his or her computer. When a user wanted to find a particular MP3 file, he simply searched the network by querying the central directory (see Figure A-3). If the directory search index contained the name of the file the user was looking for, Napster efficiently connected the two nodes directly so that the user could download the song (see Figure A-4).

Figure A-3. P2P network with a centralized directory

Figure A-4. Connecting two nodes directly in a P2P network

The centralized directory turned out to be the Achilles heel of Napster, because it made it easy for lawyers to demand that the service be shut down in response to claims that the network enabled copyright infringement. To shut the network down, the central servers simply needed to be disconnected from the network or unplugged.

Second-Generation P2P Networks and Applications

The next attempt to create a P2P network was spearheaded at Nullsoft, which was a subsidiary of America Online (AOL). Gnutella was both a remarkably simple P2P file-sharing protocol and an experimental application that avoided the use of a vulnerable centralized directory search index.

Gnutella worked by connecting a given application to a certain number of nodes, which in turn were connected to other nodes, and so on. The decentralized directory was spread across the network to individual nodes and did not rely on any central servers (see Figure A-5).

Figure A-5. P2P network with no centralized directory

Although this second-generation P2P network did not suffer from the same vulnerabilities as Napster, the design of the protocol was not at all efficient. Searches were slow, and groups of nodes tended to create disconnected islands of subnetworks that couldn't be searched.

Each time a user wanted to search for a file in the network, the user's application had to broadcast the request to the nodes to which it was connected. These nodes in turn had to propagate the request again, and so on until the file was located, if at all.

Gnutella was released briefly in March 2000. Incidentally, this was the same moment that AOL was merging with Time Warner Music and Napster was being investigated. When it became apparent to AOL that Gnutella was theoretically capable of the same types of potential copyright infringement that Napster was being investigated for, it requested that Nullsoft prevent people from downloading it any further.

But the cat was out of the bag. Scores of people had already downloaded the Gnutella application in the short period of time it was available. Soon thereafter, hackers reverse-engineered the Gnutella application, and as a result, the protocol is now in the public domain. Even so, scalability and performance issues have prevented the broad adoption of the Gnutella P2P network.

Third-Generation P2P Networks and Applications

With Gnutella having paved the road for a decentralized approach to managing the directory search index, the FastTrack protocol emerged as the subsequent generation of P2P technology. FastTrack worked with several well-known applications, the most famous being KaZaA.

The third generation of P2P networks were more evolved because they supported supernodes, which offered significant enhancements over the previous two generations of P2P networks (see Figure A-6). Supernodes allowed for improved search performance, reduced file-transfer latency, network scalability, and the ability to resume interrupted downloads and simultaneously download segments of one file from multiple peers.

Figure A-6. Series of network clusters joined by supernodes

A supernode is an ordinary node that, under particular circumstances, can take on special tasks. Supernodes improve network scalability by helping nearby nodes join dynamically. Supernodes detect which applications are online, establish connections among them, and guide encrypted traffic efficiently.

Supernodes also work in concert to support a new type of decentralized directory called a global index (see Figure A-7). Unlike the second generation of P2P networks, the global index is managed by a hierarchical arrangement of all available supernodes and is not hosted on central servers. When a more powerful computer with a fast Internet connection runs the application software, it may automatically "wake up" as a supernode to act as a temporary directory index server for nearby applications, based on available memory, bandwidth, and uptime.

Figure A-7. Global index

Like second-generation P2P networks, third-generation networks don't run the risk of being shut down from a central location. Supernodes also enable third-generation P2P applications to work behind most firewalls and Network Address Translation (NAT) devices. So long as two applications can establish an outgoing connection to the Internet, they can communicate regardless of whether they can connect directly with each other.