Section 13.3. Internet Protocol (IP) | The Design and Implementation of the FreeBSD Operating System

13.3. Internet Protocol (IP)

Having examined the operation of a simple transport protocol, we continue with a discussion of the network-layer protocol [Postel, 1981a; Postel et al., 1981]. The Internet Protocol (IP) is the level responsible for host-to-host addressing and routing, packet forwarding, and packet fragmentation and reassembly. Unlike the transport protocols, it does not always operate for a socket on the local host; it may forward packets, receive packets for which there is no local socket, or generate error packets in response to these situations.

The functions done by IP are illustrated by the contents of its packet header, shown in Figure 13.4. The header identifies source and destination hosts and the destination protocol, and it contains header and packet lengths. The identification and fragment fields are used when a packet or fragment must be broken into smaller sections for transmission on its next hop and to reassemble the fragments when they arrive at the destination. The fragmentation flags are Don't Fragment and More Fragments; the latter flag plus the offset are enough information to assemble the fragments of the original packet at the destination.

Figure 13.4. IPv4 header. IHL is the Internet header length specified in units of four octets. Options are delimited by IHL. All field lengths are given in bits.

IP options are present in an IP packet if the header length field has a value larger than the minimum, which is 20 bytes. The no-operation option and the end-of-option-list option are each one octet in length. All other options are self-encoding, with a type and length preceding any additional data. Hosts and routers are thus able to skip over options that they do not implement. Examples of existing options are the timestamp and record-route options, which are updated by each router that forwards a packet, and the source-route options, which supply a complete or partial route to the destination.

In practice these are rarely used and most network operators silently drop packets with the source-route option because it makes it difficult to manage traffic on the network.

Output

We have already seen the calling convention for the IP output routine, which is

 int ip_output(     struct mbuf *msg,     struct mbuf *opt,     struct route *ro,     int flags,     struct ip_moptions *imo,     struct inpcb *inp);

As described in the subsection on output in Section 13.2, the parameter msg is an mbuf chain containing the packet to be sent, including a skeletal IP header; opt is an optional mbuf containing IP options to be inserted after the header. If the route ro is given, it may contain a reference to a routing entry (rtentry structure), which specifies a route to the destination from a previous call, and in which any new route will be left for future use. Since cached routes were removed from the inpcb structure in FreeBSD 5.2, this cached route is seldomly used. The flags may allow the use of broadcast or may indicate that the routing tables should be bypassed. If present, imo includes options for multicast transmissions. The protocol control block, inp, is used by the IPSec subsystem (see Section 13.10) to hold data about security associations for the packet.

The outline of the work done by ip_output() is as follows:

Insert any IP options.
Fill in the remaining header fields (IP version, zero offset, header length, and a new packet identification) if the packet contains an IP pseudo-header.
Determine the route (i.e., outgoing interface and next-hop destination).
Check whether the destination is a multicast address. If it is, determine the outgoing interface and hop count.
Check whether the destination is a broadcast address; if it is, check whether broadcast is permitted.
Do any IPSec manipulations that are necessary on the packet such as encryption.
See if there are any filtering rules that would modify the packet or prevent us from sending it.
If the packet size is no larger than the maximum packet size for the outgoing interface, compute the checksum and call the interface output routine.
If the packet size is larger than the maximum packet size for the outgoing interface, break the packet into fragments and send each in turn.

We shall examine the routing step in more detail. First, if no route reference is passed as a parameter, an internal routing reference structure is used temporarily. A route structure that is passed from the caller is checked to see that it is a route to the same destination and that it is still valid. If either test fails, the old route is freed. After these checks, if there is no route, rtalloc_ign() is called to allocate a route. The route returned includes a pointer to the outgoing interface. The interface information includes the maximum packet size, flags including broadcast and multicast capability, and the output routine. If the route is marked with the RTF_GATEWAY flag, the address of the next-hop router is given by the route; otherwise, the packet's destination is the next-hop destination. If routing is to be bypassed because of a MSG_DONTROUTE option (see Section 11.1) or a SO_DONTROUTE option, a directly attached network shared with the destination is sought; if there is no directly attached network, an error is returned. Once the outgoing interface and next-hop destination are found, enough information is available to send the packet.

As described in Chapter 12, the interface output routine normally validates the destination address and places the packet on its output queue, returning errors only if the interface is down, the output queue is full, or the destination address is not understood.

Input

In Chapter 12, we described the reception of a packet by a network interface and the packet's placement on the input queue for the appropriate protocol. The network-interface handler then schedules the protocol to run by setting a corresponding bit in the network status word and scheduling the network thread. The IPv4 input routine is invoked via this software interrupt when network in`terfaces receive messages for the IPv4 protocol. The input routine, ip_input(), is called with an mbuf that contains the packet it is to process. The dequeueing of packets and the calls into the input routine are handled by the network thread calling netisr_dispatch(). A packet is processed in one of four ways: it is passed as input to a higher-level protocol, it encounters an error that is reported back to the source, it is dropped because of an error, or it is forwarded to the next hop on its path to its destination. In outline form, the steps in the processing of a packet on input are as follows:

1. Verify that the packet is at least as long as an IPv4 header and ensure that the header is contiguous.

2. Checksum the header of the packet and discard the packet if there is an error.

3. Verify that the packet is at least as long as the header indicates and drop the packet if it is not. Trim any padding from the end of the packet.

4. Do any filtering or security functions required by ipfw or IPSec.

5. Process any options in the header.

6. Check whether the packet is for this host. If it is, continue processing the packet. If it is not, and if acting as a router, try to forward the packet. Otherwise, drop the packet.

7. If the packet has been fragmented, keep it until all its fragments are received and reassembled, or until it is too old to keep.

8. Pass the packet to the input routine of the next-higher-level protocol.

When the incoming packet is passed into the input routine, it is accompanied by a pointer to the interface on which the packet was received. This information is passed to the next protocol, to the forwarding function, or to the error-reporting function. If any error is detected and is reported to the packet's originator, the source address of the error message will be set according to the packet's destination and the incoming interface.

The decision whether to accept a received packet for local processing by a higher-level protocol is not as simple as one might think. If a host has multiple addresses, the packet is accepted if its destination matches any one of those addresses. If any of the attached networks support broadcast and the destination is a broadcast address, the packet is also accepted.

The IPv4 input routine uses a simple and efficient scheme for locating the input routine for the receiving protocol of an incoming packet. The protocol field in the packet is 8 bits long; thus, there are 256 possible protocols. Fewer than 256 protocols are defined or implemented, and the Internet protocol switch has far fewer than 256 entries. Therefore, ip_input() uses a 256-element mapping array to map from the protocol number to the protocol-switch entry of the receiving protocol. Each entry in the array is initially set to the index of a raw IP entry in the protocol switch. Then, for each protocol with a separate implementation in the system, the corresponding map entry is set to the index of the protocol in the IP protocol switch. When a packet is received, IP simply uses the protocol field to index into the mapping array and calls the input routine of the appropriate protocol.

Forwarding

Implementations of IPv4 traditionally have been designed for use by either hosts or routers, rather than by both. That is, a system was either an endpoint for packets (as source or destination) or a router (which forwards packets between hosts on different networks but only uses upper-level protocols for maintenance functions). Traditional host systems do not incorporate packet-forwarding functions; instead, if they receive packets not addressed to them, they simply drop the packets. 4.2BSD was the first common implementation that attempted to provide both host and router services in normal operation. This approach had advantages and disadvantages. It meant that 4.2BSD hosts connected to multiple networks could serve as routers as well as hosts, reducing the requirement for dedicated router hardware. Early routers were expensive and not especially powerful. Alternatively, the existence of router-function support in ordinary hosts made it more likely for misconfiguration errors to result in problems on the attached networks. The most serious problem had to do with forwarding of a broadcast packet because of a misunderstanding by either the sender or the receiver of the packet's destination. The packet-forwarding router functions are disabled by default in FreeBSD. They may be enabled at run time with the sysctl call. Hosts not configured as routers never attempt to forward packets or to return error messages in response to misdirected packets. As a result, far fewer misconfiguration problems are capable of causing synchronized or repetitive broadcasts on a local network, called broadcast storms.

The procedure for forwarding IP packets received at a router but destined for another host is the following:

1. Check that forwarding is enabled. If it is not, drop the packet.

2. Check that the destination address is one that allows forwarding. Packets destined for network 0, network 127 (the official loopback network), or illegal network addresses cannot be forwarded.

3. Save at most 64 octets of the received message in case an error message must be generated in response.

4. Determine the route to be used in forwarding the packet.

5. If the outgoing route uses the same interface as that on which the packet was received, and if the originating host is on that network, send an ICMP redirect message to the originating host. (ICMP is described in Section 13.8.)

6. Handle any IPSec updates that must be made to the packet header.

7. Call ip_output() to send the packet to its destination or to the next-hop gateway.

8. If an error is detected, send an ICMP error message to the source host.

Multicast transmissions are handled separately from other packets. Systems may be configured as multicast routers independently from other routing functions. Multicast routers receive all incoming multicast packets, and forward those packets to local receivers and group members on other networks according to group memberships and the remaining hop count of incoming packets.