27.1 Introduction | Linux Network Architecture

An application programming interface (API) is required to enable application programmers to access the network functionality implemented in the operating system. One of the most common interfaces to access transport protocols in the UNIX domain is Berkeley sockets or BSD sockets, which obtained their names from the UNIX variant Berkeley Software Distribution, where they were implemented for the first time.

The design of Berkeley sockets (in the following discussion called sockets for short) follows the UNIX paradigm: Ideally, map all objects that are read or write accessed to files, so that they can be processed by use of the regular file write and read operations. Sending or receiving in a communication relationship can be easily mapped to write and read operations. The objects manipulated by such operations in the context of transport protocols are the endpoints of a communication relationship; these are represented by sockets.

27.1.1 Socket Addresses

A communication endpoint in the transport layer is described by three parameters in the Internet: the protocol used, an IP address, and a port number. These parameters therefore have to be allocated to a socket, before it can be used for communication. In building a communication relationship, we additionally have to specify the communication partner's endpoint address.

`struct sockaddr`	/usr/include/sys/socket.h

The data structure used to represent socket addresses was kept quite general, because the socket interface can also support other protocols, in addition to Internet protocols:

 typedef unsigned short sa_family_t; struct sockaddr {        sa_family_t        sa_family;        char               sa_data[14]; };

The sa_family element registers the address family (e.g., AF_INET for the family of Internet protocols). The exact address format is not yet defined in detail in the general sockaddr structure. For this reason, there is a more specific variant for Internet addresses, called sockaddr_in.

`struct sockaddr_1n`	/usr/include/netinet/in.h

 struct in_addr {                     __u32 s_addr; }; struct sockaddr_in   {    sa_family_t            sin_family;      /* Address family: AF_INET */    unsigned short int     sin_port;        /* Port number */    struct in_addr         sin_addr;        /* Internet address */    /* Pad to size of 'struct sockaddr' . */    unsigned char sin_zero[sizeof (struct sockaddr) -                           sizeof (sa_family_t) -                           sizeof (uint16_t) -                           sizeof (struct in_addr)]; };

The address family is at the same position as above, and the IP address is stored as a 32-bit number in the element sin_addr .s_addr. The 16-bit port number is in the sin_port element. The remaining free space is not used. Notice that the addresses and port numbers have to be specified in the network byte order; see Section 27.2.3.

27.1.2 Socket Operations

Sockets are represented by normal file descriptors at the programming interface. These file descriptors can be used to perform write and read operations. However, the establishment of a communication relationship is different from opening a file, so, from the application's view, additional system calls are available for sockets.

Figure 27-1 shows the system calls employed during use of a socket and their order. We distinguish between the client role (left) and the server role (right). This distinction does not refer to the payload transfer, but merely to the establishment of a communication relationship: A client actively initiates the establishment of a communication relationship, but a server initially remains passive, waiting for incoming communication requests.

Figure 27-1. System calls at the socket interface; grayed calls are not required for connectionless protocols (e.g., UDP).

We will briefly explain the meaning of each of these calls below:

A new socket is initially created by the socket() system call, which requires information about the protocol to be used (TCP or UDP in the Internet). The result of this operation is a file descriptor, which is used in the further course to identify the socket and which has to be specified in all subsequent calls.
bind() is used to allocate a local address to the socket. For Internet sockets, this address consists of the IP address of a network interface of the local system and a port number. Clients can do without bind() call, because their exact address often does not play any role; an address is then allocated to them automatically.
The listen() call is used by a server to inform the operating system that connections should be accepted at the socket. This is meaningful only for connection-oriented protocols (currently, for TCP only, where it causes transition of the protocol state machine into the LISTEN state see Section 24.3).
An active connection establishment (e.g., in TCP) to an address passed as parameter is initiated by the connect() call. For connectionless protocols (e.g., UDP), connect() can be used to specify a destination address for all packets subsequently transmitted.
accept() is used by a server to accept a connection, provided that it had previously received a connection request. Otherwise, the call will block until a connection request has been received.
The socket is copied when a connection is accepted: The original socket remains in the LISTEN state, but the new socket is in the CONNECTED state. A new file descriptor for the second socket is returned by the accept() call. This duplication of sockets during the accepting of a connection allows a server application to continue accepting new connections without having first to close previous connections.
Notice that accept() is not used by sockets for connectionless protocols.
Now that a communication relationship has been established, data can be transmitted. If a connection exists, or if connect() defined the destination address for a connectionless protocol, then the write() and read() file operations are applicable. Otherwise, the functions sendto() and recvfrom() can be used, which require a destination address to be specified for each data unit to be sent or supply a source address for each data unit received.
When a socket is no longer needed, then the descriptor can be released by close(). This function also closes the connection, if one is still open.
The following section discusses each of these system calls for the socket interface in more detail.