25.2 Data Structures | Linux Network Architecture

The implementation of UDP in the Linux kernel does not require any additional or particularly complex data structures. This section describes the data structure used to pass payload at the socket interface, the UDP datagram itself, which is included in the general socket buffer structure, and the data structure instances used to integrate the protocol into the network architecture.

25.2.1 Passing the Payload

The payload is given for the sendmsg() system call at the socket interface in the form of an msghdr structure, which is checked by the socket interface and copied into the kernel (except for the actual payload that initially remains in the user address space). Otherwise, the structure is passed, as is, to the udp_sendmsg() function for sending UDP packets.

`struct msghdr`	include/linux/socket.h

 struct msghdr {        void                  *msg_name;        int                   msg_namelen;        struct iovec          *msg_iov;        __kernel_size_t       msg_iovlen;        void                  *msg_control;        __kernel_size_t       msg_controllen;        unsigned              msg_flags; };

For sending of UDP packets, msg_name is not really a name, but a pointer to a sockaddr_in structure (see Section 27.1.1), which contains an IP address and a port number; msg_namelen describes the length of this structure. The msg_iov pointer refers to an array of iovec structures, which reference the payload. This means that this payload can be present in a series of individual blocks, where each block is denoted in an iovec structure by its initial address (iov_base) and its length (iov_len):

 struct iovec {        void                   *iov_base;        __kernel_size_t        iov_len; };

The buffer specified by msg_control and msg_controllen can be used to pass protocol-specific control messages. We will not discuss the format of these messages; see detailed information in the recv() system call manpage.

The msg_flags element can be used to pass different flags both from the user process to the kernel and in the opposite direction. For example, the kernel evaluates the following flags:

MSG_DONTROUTE specifies that the destination must be in the local area network and that, for this reason, the datagram should not be sent over a router to its destination.
MSG_DONTWAIT prevents the system call from blocking if, for example, there are no data to be received.
MSG_ERRQUEUE means that no packet should be fetched, but instead a detailed error message, which might be available at the socket.

The following flag is an example of flags returned by the kernel to the user process:

MSG_TRUNC indicates that the buffer space provided for receiving was insufficient, so that some of the packet data were lost.

The flags discussed above are only some examples; we will not describe all possible flags and their meanings here. Readers can find more information in the system calls' manpages.

25.2.2 The UDP Datagram

`struct udphdr`	include/linux/udp.h

The union element h of the sk_buff structure includes a pointer, struct udphdr *uh, which references the UDP header within the packet data. The udphdr structure is declared as follows, based on the packet format shown in Figure 25-1:

 struct udphdr {        __u16 source;        __u16 dest;        __u16 len;        __u16 check; };

`struct udpfakehdr`	net/ipv4/udp.c

When sending a packet and computing the checksum, as required, the data structure used is somewhat more complex. In addition to a udphdr structure, where the packet header is built, and from which it is copied into the packet later, it also includes the other data required to create the pseudo IP packet header:

 struct udpfakehdr {        struct udphdr uh;        u32 saddr;        u32 daddr;        struct iovec *iov;        u32 wcheck; };

The IP source and destination addresses are stored in saddr and daddr, the payload can be reached over the iovec structure, and the checksum is computed in wcheck during the sending.

25.2.3 Integration of UDP into the Network Architecture

As a transport protocol, UDP has two interfaces: one "downwards" to the network layer (the Internet Protocol) and one "upwards" to the application layer. The latter is formed by the sockets described in Chapter 26 more specifically, by the sockets of the PF_INET protocol family. (See Section 26.3.1.)

Interface to the Application Layer

The socket implementation uses the proto structure, which is defined in net/ipv4/udp.c for UDP, to access the functionality of transport protocols:

 struct proto udp_prot = {        name:             "UDP",        close:            udp_close,        connect:          udp_connect,        disconnect:       udp_disconnect,        ioctl:            udp_ioctl,        setsockopt:       ip_setsockopt,        getsockopt:       ip_getsockopt,        sendmsg:          udp_sendmsg,        recvmsg:          udp_recvmsg,        backlog_rcv:      udp_queue_rcv_skb,        hash:             udp_v4_hash,        unhash:           udp_v4_unhash,        get_port:         udp_v4_get_port, };

The elements missing here (e.g., bind and accept), compared to the complete proto structure shown in Section 26.3.1, are initialized to zero, which means that no transport-protocol-specific handling of the corresponding events is done.

Socket-state information is stored in the sock data structure mentioned in Section 26.3.1. The simplicity of UDP means that no protocol-specific additional data is required, so there is no UDP-specific part of the tp_pinfo field.

Most of the functions referenced in the proto structure are not very complex, so we will discuss them here only briefly:

udp_close(): During the closing of a UDP socket, only the function inet_sock_release() (net/ipv4/af_inet.c), which is the same for all PF_INET sockets, is invoked to release the socket data structure. From there, udp_v4_unhash() is invoked (as described later).
udp_connect(): UDP is a connectionless protocol, so the connect() system call at the application layer interface, which is used in connection-oriented protocols to establish a connection, has a slightly different meaning: It can be used to define the destination of all UDP packets subsequently sent over a socket, so that it doesn't have to be specified each time. Accordingly, the destination address and destination port are stored in the sock data structure within udp_connect(). The fact that this optional definition has taken place is registered by entering the state identifier TCP_ESTABLISHED, which is "borrowed" from TCP, in the state field of the sock structure. In addition, a routing cache entry is constructed by using ip_route_connect() and stored in the sock structure. This entry is used when packets are being sent, so that some overhead is avoided.
udp_disconnect(): The state is set to TCP_CLOSE, the destination address and destination port are deleted, and a stored routing cache entry is released.
udp_ioctl(): The ioctl() system call can be used here to poll the lengths of transmit and receive queues.
ip_setsockopt() and ip_getsockopt(): There are no socket options on the UDP level, so these two entries refer directly to the general handling routines of the IP level. (See also man setsockopt.)
udp_sendmsg() and udp_recvmsg(): These two functions implement the sending and receiving of UDP packets. Section 25.3 will discuss them in more detail.
udp_queue_rcv_skb(): This function will be discussed together with the description of udp_recvmsg() in Section 25.3.2.
udp_v4_hash(): During the receiving of UDP packets, a decision must be made about which socket these packets should be assigned to, so that they can be placed in that socket's receive queue to be fetched by the user process later. To facilitate this assignment to a socket, the sock structures of all UDP sockets are registered in a hash table, struct sock *udp_hash[UDP_HTABLE_SIZE]. The port number modulo UDP_HTABLE_SIZE is used as hash value.
Within the proto structure, the hash entry could actually reference a function that enters a socket into the hash table. However, the socket had already been entered into the table by udp_v4_get_port() (described later) when the port number was assigned in UDP, so udp_v4_hash() actually is not needed and is never used.
udp_v4_unhash(): This function is invoked when a socket is released, to remove the sock structure from the hash table.
udp_v4_get_port(): This function is invoked by the PF_INET implementation in net/ipv4/af_inet.c whenever a local port number has to be assigned to a socket. The desired port number passed here can also be zero. In this case, a free port is selected, with a position in the hash table where as few sockets as possible are linked.

Interface to IP

The interface used to accept UDP packets received by the IP layer is defined by the inet_protocol structure described in Section 14.2.5 in connection with the function inet_add_protocol() and shown in Figure 14-5. It is contained in net/ipv4/protocol.c for UDP and all protocols running directly on top of IP, appearing as follows:

 static struct inet_protocol udp_protocol = {        handler:             udp_rcv,        err_handler:         udp_err,        next:                IPPROTO_PREVIOUS,        protocol:            IPPROTO_UDP,        name:                "UDP" };

The function udp_rcv() serves to receive incoming packets; Section 25.3.2 will discuss it in more detail. udp_err() handles ICMP error messages communicated by the IP layer.

To send packets over IP, UDP uses the function ip_build_xmit() from net/ipv4/ip_output.c. In contrast to ip_queue_xmit(), this function does not take the complete IP payload as parameter, but instead a callback function, which it can use to request this data. In addition, it uses a routing cache entry, also provided as a parameter, instead of handling the routing itself.