Section 11.1. Interprocess-Communication Model

11.1. Interprocess-Communication Model

There were several goals in the design of the interprocess-communication enhancements to UNIX. The most immediate need was to provide access to communication networks such as the Internet [Cerf, 1978]. Previous work in providing network access had focused on the implementation of the network protocols, exporting the transport facilities to applications via special-purpose and often awkward interfaces [Cohen, 1977; Gurwitz, 1981]. As a result, each new network implementation resulted in a different application interface, requiring most existing programs to be altered significantly or rewritten completely. For 4.2BSD the interprocess-communication facilities were intended to provide a sufficiently general interface to allow network-based applications to be constructed independently of the underlying communication facilities.

The second goal was to allow multiprocess programs, such as distributed databases, to be implemented. The UNIX pipe requires all communicating processes to be derived from a common parent process. The use of pipes forced systems to be designed with a somewhat contorted structure. New communication facilities were needed to support communication between unrelated processes residing locally on a single host computer and residing remotely on multiple host machines.

Finally, it became important to provide new communication facilities to allow construction of local-area network services, such as file servers. The intent was to provide facilities that could be used easily in supporting resource sharing in a distributed environment and not to build a distributed UNIX system.

The interprocess-communication facilities were designed to support the following:

Transparency: Communication between processes should not depend on whether the processes are on the same machine.
Efficiency: The applicability of any interprocess-communication facility is limited by the performance of the facility. A naive implementation of interprocess-communication often results in a very modular but inefficient implementation because most interprocess communication facilities, especially those related to networks, are broken down into many layers. At each layer boundary the software must perform some work, either adding information to a message or removing it. FreeBSD only introduces layers where they are absolutely necessary for the proper functioning of the system and does not introduce arbitrary layers where they are not necessary.
Compatibility: Existing naive processes should be usable in a distributed environment without change. A naive process is characterized as a process that performs its work by reading from the standard input file and writing to the standard output file. A sophisticated process uses knowledge about the richer set of interfaces provided by the Operating System to do its work. A major reason why UNIX has been successful is the operating system's support for modularity by naive processes that act as byte-stream filters. Although sophisticated applications such as Web servers and screen editors exist, they are far outnumbered by the collection of naive application programs.

While designing the interprocess-communication facilities, the developers identified the following requirements to support these goals, and they developed a unifying concept for each:

The system must support communication networks that use different sets of protocols, different naming conventions, different hardware, and so on. The notion of a communication domain was defined for these reasons. A communication domain embodies the standard semantics of communication and naming. Different networks have different standards for naming communication end-points, which may also vary in their properties. In one network, a name may be a fixed address for a communication endpoint, whereas in another it may be used to locate a process that can move between locations. The semantics of communication can include the cost associated with the reliable transport of data, the support for multicast transmissions, the ability to pass access rights or capabilities, and so on.
A unified abstraction for an endpoint of communication is needed that can be manipulated with a file descriptor. The socket is the abstract object from which messages are sent and received. Sockets are created within a communication domain, just as files are created within a filesystem. Unlike files, however, sockets exist only as long as they are referenced.
The semantic aspects of communication must be made available to applications in a controlled and uniform way. Applications must be able to request different styles of communication, such as reliable byte stream or unreliable datagram, and these styles must be provided consistently across all communication domains. All sockets are typed according to their communication semantics. Types are defined by the semantic properties that a socket supports. These properties are
1. In-order delivery of data
2. Unduplicated delivery of data
3. Reliable delivery of data
4. Connection-oriented communication
5. Preservation of message boundaries
6. Support for out-of-band messages

Pipes have the first four properties, but not the fifth or sixth. An out-of-band message is one that is delivered to the receiver outside the normal stream of incoming, in-band data. It usually is associated with an urgent or exceptional condition. A connection is a mechanism that protocols use to avoid having to transmit the identity of the sending socket with each packet of data. Instead, the identity of each endpoint of communication is exchanged before transmission of any data and is maintained at each end so that it can be presented at any time. On the other hand, connectionless communications require a source and destination address associated with each transmission. A datagram socket provides unreliable, connectionless packet communication; a stream socket provides a reliable, connection-oriented byte stream that may support out-of-band data transmission; and a sequenced packet socket provides a sequenced, reliable, unduplicated connection-based communication that preserves message boundaries. Other types of sockets are desirable and can be added.

Processes must be able to locate endpoints of communication so that they can rendezvous without being related, so sockets can be named. A socket's name is meaningfully interpreted only within the context of the communication domain in which the socket is created. The names used by most applications are human-readable strings. However, the name for a socket that is used within a communication domain is usually a low-level address. Rather than placing name-to-address translation functions in the kernel, FreeBSD 5.2 provides functions for application programs to use in translating names to addresses. In the remainder of this chapter, we refer to the name of a socket as an address.

Use of Sockets

Over the last several years a number of excellent books have been written about socket programming from the user's perspective [Stevens, 1998]. This section includes a brief description of a client and server program communicating over a reliable byte stream in the Internet communication domain. For more detailed information on writing network applications, please see the cited references. The client is described first and the server second.

A program that wants to use a socket creates it with the socket system call:

 int sock, domain = AF_INET, type = SOCK_STREAM, protocol = 0; sock = socket(domain, type, protocol);

The type of socket is selected according to the characteristic properties required by the application. In this example, reliable communication is required, so a stream socket (type = SOCK_STREAM) is selected. The domain parameter specifies the communication domain (or protocol family; see Section 11.4) in which the socket should be created, in this case the IPv4 Internet (domain = AF_INET). The final parameter, the protocol, can be used to indicate a specific communication protocol for use in supporting the socket's operation. Protocols are indicated by well-known (standard) constants specific to each communication domain. When zero is used, the system picks an appropriate protocol. The socket system call returns a file descriptor (a small integer; see Section 6.4) that is then used in all subsequent socket operations.

After a socket has been created, the next step depends on the type of socket being used. Since this example is connection oriented, the sockets require a connection before being used. Creating a connection between two sockets usually requires that each socket have an address bound to it, which is simply a way of identifying each end of the communication.

Applications may explicitly specify a socket's address or may permit the system to assign one. The address to be used with a socket must be given in a socket address structure. The format of addresses can vary among domains; to permit a wide variety of different formats, the system treats addresses as variable-length byte arrays, which are prefixed with a length and a tag that identifies their format. Each domain has its own addressing format, which can always be mapped into the most generic one.

A connection is initiated with a connect system call:

 int error; int sock; /* Previously created by a socket() call. */ struct sockaddr_in rmtaddr; /* Assigned by the program. */ int rmtaddrlen = sizeof (struct sockaddr_in); error =      connect(sock, (struct sockaddr *)&rmtaddr, rmtaddrlen);

When the connect call completes, the client has a fully functioning communication endpoint on which it can send and receive data.

A server follows a different path once it has created a socket. It must bind itself to an address and then accept incoming connections from clients. The call to bind an address to a socket is as follows:

 int error; int sock; struct sockaddr_in addr; int addrlen = sizeof (struct sockaddr_in); error =     bind(sock, (struct sockaddr*)&localaddr, localaddrlen);

where sock is the descriptor created by a previous call to socket.

For several reasons, binding a name to a socket was separated from creating a socket. First, sockets are potentially useful without names. If all sockets had to be named, users would be forced to devise meaningless names without reason. Second, in some communication domains, it may be necessary to supply additional information to the system before binding a name to a socket for example, the "type of service" required when a socket is used. If a socket's name had to be specified at the time that the socket was created, supplying this information would not be possible without further complicating the interface.

In the server process, the socket must be marked to specify that incoming connections are to be accepted on it:

 int error, sock, backlog; error = listen(sock, backlog);

The backlog parameter in the listen call specifies an upper bound on the number of pending connections that should be queued for acceptance.

Connections are then received, one at a time, with

 int newsock, sock; struct sockaddr_in clientaddr; int clientaddrlen = sizeof(struct sockaddr_in); newsock =     accept(sock, (struct sockaddr *)&clientaddr, clientaddrlen);

The accept call returns a new connected socket, as well as the address of the client by specifying the clientaddr and clientaddrlen parameters. The new socket is the one through which communication can take place. The original socket, sock, is used solely for managing the queue of connection requests in the server.

A variety of calls are available for sending and receiving data; these calls are summarized in Table 11.1. The richest of these interfaces are the sendto and recvfrom calls. Besides scatter-gather operations being possible, an address may be specified or received, optional flags are available, and specially interpreted ancillary data or control information may be passed (see Figure 11.1). Ancillary data may include protocol-specific data, such as addressing or options, and also specially interpreted data, called access rights.

Table 11.1. Sending and receiving data on a socket.
Routine	Connected	Disconnected	Address Info
read	Y	N	N
readv	Y	N	N
write	Y	N	N
writev	Y	N	N
recv	Y	Y	N
send	Y	Y	N
recvmsg	Y	Y	Y
sendmsg	Y	Y	Y

Figure 11.1. Data structures for the sendmsg and recvmsg system calls.

In addition to these system calls, several other calls are provided to access miscellaneous services. The getsockname call returns the locally bound address of a socket, whereas the getpeername call returns the address of the socket at the remote end of a connection. The shutdown call terminates data transmission or reception at a socket, and two ioctl-style calls setsockopt and getsockopt can be used to set and retrieve various parameters that control the operation of a socket or of the underlying network protocols. Sockets are discarded with the normal close system call.

The interface to the interprocess-communication facilities was purposely designed to be orthogonal to the existing standard system interfaces that is, to the open, read, and write system calls. This decision was made to avoid overloading the familiar interface with undue complexity. In addition, the developers thought that using an interface that was completely independent of the filesystem would improve the portability of software because, for example, pathnames would not be involved. Backward compatibility, for the sake of naive processes, was still deemed important; thus, the familiar read-write interface was augmented to permit access to the new communication facilities wherever that made sense (e.g., when connected stream sockets were used).

11.1. Interprocess-Communication Model

Use of Sockets

Table 11.1. Sending and receiving data on a socket.

Figure 11.1. Data structures for the sendmsg and recvmsg system calls.