Section 17.1. Protocol Support

17.1. Protocol Support

The Berkeley socket API was designed as a gateway to multiple protocols. Although this does necessitate extra complexity in the interface, it is much easier than inventing (or learning) a new interface for every new protocol you encounter. Linux uses the socket API for many protocols, including TCP/IP (both version 4 and version 6), AppleTalk, and IPX.

We discuss using sockets for two of the protocols available through Linux's socket implementation. The most important protocol that Linux supports is TCP/IP,^[1] which is the protocol that drives the Internet. We also cover Unix domain sockets, an IPC mechanism restricted to a single machine. Although they do not work across networks, Unix domain sockets are widely used for applications that run on a single computer.

^[1] The 2.6.x kernels covered by this book support both version 4 and version 6 (commonly referred to as IPv6 of the TCP/IP suite).

Protocols normally come in groups, or protocol families. The popular TCP/IP protocol family includes the TCP and UDP protocols (among others). Making sense of the various protocols requires you to know a few networking terms.

17.1.1. Nice Networking

Most users consider networking protocols to provide the equivalent of Unix pipes between machines. If a byte (or sequence of bytes) goes in one end of the connection, it is guaranteed to come out the other end. Not only is it guaranteed to come out the other end, but it also comes out right after the byte that was sent before it and immediately before the byte that was sent after it. Of course, all of these bytes should be received exactly as they were sent; no bytes should change. Also, no other process should be able to interject extra bytes into the conversation; it should be restricted to the original two parties.

A good visualization of this idea is the telephone. When you speak to your friends, you expect them to hear the same words you speak, and in the order you speak them.^[2] Not only that, you do not expect your mother to pick up her phone (assuming she is not in the same house as you) and start chatting away happily to you and your friend.

^[2] Well, this depends on the character of the friends and how late they were out the night before the conversation.

17.1.2. Real Networking

Although this may seem pretty basic, it is not at all how underlying computer networks work. Networks tend to be chaotic and random. Imagine a first-grade class at recess, except they are not allowed to speak to each other and they have to stay at least five feet apart. Now, chances are those kids are going to find some way to communicate perhaps even with paper airplanes!

Imagine that whenever students want to send letters to one another they simply write the letters on pieces of paper, fold them into airplanes, write the name of the intended recipient on the outside, and hurl them toward someone who is closer to the final recipient than the sender is. This intermediate looks at the airplane, sees who the intended target is, and sends it toward the next closest person. Eventually, the intended recipient will (well, may) get the airplane and unfold it to read the message.

Believe it or not, this is almost exactly how computer networks operate.^[3] The intermediaries are called routers and the airplanes are called packets, but the rest is the same. Just as in the first-grade class, some of those airplanes (or packets) are going to get lost. If a message is too long to fit in a single packet, it must be split across multiple ones (each of which may be lost). All the students in between can read the packets if they like^[4] and may simply throw the message away rather than try to deliver it. Also, anyone can interrupt your conversation by sending new packets into the middle of it.

^[3] This is how packet-switched networks work, anyway. An alternative design, circuit-switched networks, acts more like telephone connections. They are not widely used in computer networking, however.

^[4] This is why cryptography has gained so much importance since the advent of the Internet.

17.1.3. Making Reality Play Nice

Confronted with the reality of millions of paper airplanes, protocol designers endeavor to present a view of the network more on par with the telephone than the first-grade class. Various terms have evolved to describe networking protocols.

Connection-oriented protocols have two endpoints, like a telephone conversation. The connection must be established before any communication takes place, just as you answer the phone by saying "hello" rather than just talking immediately. Other users cannot (or should not be able to) intrude into the connection. Protocols that do not have these characteristics are known as connectionless.
Protocols provide sequencing if they ensure the data arrives in the same order it was sent.
Protocols provide error control if they automatically discard messages that have been corrupted and arrange to retransmit the data.
Streaming protocols recognize only byte boundaries. Sequences of bytes may be split up and are delivered to the recipient as the data arrives.
Packet-based protocols handle packets of data, preserving the packet boundaries and delivering complete packets to the receiver. Packet-based protocols normally enforce a maximum packet size.

Although each of these attributes is independent of the others, two major types of protocols are commonly used by applications. Datagram protocols are packet-oriented transports that provide neither sequencing nor error control; UDP, part of the TCP/IP protocol family, is a widely used datagram protocol. Stream protocols, such as the TCP portion of TCP/IP, are streaming protocols that provide both sequencing and error control.

Although datagram protocols, such as UDP, can be useful,^[5] we focus on using stream protocols because they are easier to use for most applications. More information on protocol design and the differences between various protocols is available from many books [Stevens, 2004] [Stevens, 1994].

^[5] Many higher-level protocols, such as BOOTP and NFS, are built on top of UDP.

17.1.4. Addresses

As every protocol has its own definition of a network address, the sockets API must abstract addresses. It uses a struct sockaddr as the basic form of an address; its contents are defined differently for each protocol family. Whenever a struct sockaddr is passed to a system call, the process also passes the size of the address that is being passed. The type socklen_t is defined as a numeric type large enough to hold the size of any socket address used by the system.

All struct sockaddr types conform to the following definition:

 #include <sys/socket.h> struct sockaddr {     unsigned short sa_family;     char sa_data[MAXSOCKADDRDATA]; }

The first two bytes (the size of a short) specifies the address family this address belongs to. A list of the common address families that Linux applications use is in Table 17.1, on page 413.

Table 17.1. Protocol and Address Families
Address	Protocol	Protocol Description
`AF_UNIX`	`PF_UNIX`	Unix domain
`AF_INET`	`PF_INET`	TCP/IP (version 4)
`AF_INET6`	`PF_INET6`	TCP/IP (version 6)
`AF_AX25`	`PF_AX25`	AX.25, used by amateur radio
`AF_IPX`	`PF_IPX`	Novell IPX
`AF_APPLETALK`	`PF_APPLETALK`	AppleTalk DDS
`AF_NETROM`	`PF_NETROM`	NetROM, used by amateur radio

17.1. Protocol Support

17.1.1. Nice Networking

17.1.2. Real Networking

17.1.3. Making Reality Play Nice

17.1.4. Addresses

Table 17.1. Protocol and Address Families