Section 18.5. UDP


18.5. UDP

Apart from the framework improvements, the Solaris 10 product contained additional changes in the UDP packet flow through the stack. The internal code name for the project was Yosemite. Before the Solaris 10 release, the UDP processing cost was evenly divided between per-packet processing cost and per-byte processing cost. The packet processing cost was generally due to STREAMS, the stream head processing, and packet drops in the stack and driver. The per-byte processing cost was due to lack of hardware checksum and unoptimized code branches throughout the network stack.

18.5.1. UDP Packet Drop within the Stack

Although UDP is supposed to be unreliable, local area networks (LANs) have become quite reliable, and applications tend to assume that there will be no packet loss in a LAN environment. This assumption was largely true, but the pre-Solaris 10 stack was not very effective in dealing with UDP overload and tended to drop packets within the stack itself.

With inbound flow, packets were dropped at more than one layer throughout the receive path. For UDP, the most common and obvious place was at the IP layer, which lacked the resources needed to queue the packets. Another important area of packet drops was at the network adapter layer. This type of drop was fairly common when the machine was dealing with a high rate of incoming packets.

The UDP sockfs extension (sockudp) is an alternative path to socktpi used for handling socket-based UDP applications. It provides a more direct channel between the application and the network stack by eliminating the stream head and TPI message-passing interface. This channel allows direct data and function access throughout the socket and transport layers. That way, the stack becomes more efficient and, coupled with UDP hardware checksum offload (even for fragmented UDP), ensures that UDP packets are rarely dropped within the stack.

18.5.2. UDP Module

In Solaris 10, a fully-multithreaded UDP module runs under the same protection domain as IP. Solaris 10 more tightly integrates transport (UDP) with the layers above and below it, allowing socktpi to make direct calls to UDP. Similarly UDP can also make direct calls to the data link layer. With the latest generic LAN driver (GLDv3, see Section 18.8), the data link layer can also directly call to the transport. In addition, utility functions can be called directly instead of from a message-based interface.

UDP needs exclusive operations on endpoints when executing functions that modify the endpoint state. The udp_rput_other() function deals with packets with IP options, and when processing these packets, ends up having to update the endpoint's option-related state. The udp_wput_other() function deals with control operations from the top, such as connect(), which need to update the endpoint state. In the STREAMS world this synchronization was achieved by means of shared inner-perimeter entry points and with qwriter_inner() to gain exclusive access to the endpoint.

The Solaris 10 model uses an internal, STREAMS-independent perimeter to achieve the above synchronization and is described below.

  • udp_enter(). Enter the UDP endpoint perimeter.

  • udp_become_writer(). Become exclusive on the UDP endpoint. Specifies a function that will be called exclusively either immediately or later when the perimeter is available exclusively.

  • udp_exit(). Exit the UDP endpoint perimeter.

Entering UDP from the top or from the bottom must be done with udp_enter(). As in the general cases, no locks may be held across these perimeters. When the exclusive mode is no longer required, udp_exit() must be called to exit from the perimeter. To support this, the new UDP model employs two modes of operation: UDP MT HOT mode and UDP SQUEUE mode.

In the UDP MT HOT mode, multiple threads may enter a UDP endpoint concurrently. This mode is used for sending or receiving normal data and is similar to the putshared() STREAMS entry points. Control operations and other special cases call udp_become_writer() to become exclusive to an endpoint, and this results in a transition to the UDP SQUEUE mode. An squeue, by definition, serializes access to the conn_t structure. When no more messages are pending on the squeue for the UDP connection, the endpoint reverts to MT HOT mode. When not all of the MT threads of an endpoint have finished, messages are queued in the endpoint and the UDP is in one of two transient modes: UDP MT QUEUED or UDP QUEUED SQUEUE mode.

While in stable modes, UDP keeps track of the number of threads operating on the endpoint. The udp_reader_count variable represents the number of threads entering the endpoint as readers while it is in UDP MT HOT mode. Transitioning to UDP SQUEUE happens when there is only a single reader, that is, when the counter drops to 1. Likewise, udp_squeue_count represents the number of threads operating on the endpoint's squeue while it is in UDP SQUEUE mode. The mode transitions to UDP MT HOT after the last thread exits the endpoint.

Though UDP and IP are running in the same protection domain, they are still separate STREAMS modules. Therefore, STREAMS plumbing is kept unchanged, and a UDP module instance is always pushed above IP. Although this behavior causes an extra open and close for every UDP endpoint, it provides backward compatibility for some applications that rely on such plumbing geometry to do certain things, for example, issuing I POP on the stream to obtain direct access to IP9.

The actual UDP processing is done within the IP instance. The UDP module instance possesses no state about the endpoint and merely acts as a dummy module, whose presence keeps the STREAMS plumbing appearance unchanged.

Solaris 10 permits two plumbing modes:

  • Normal. IP is opened first, and UDP is later pushed directly on top. This is the default action that occurs when a UDP socket or device is opened.

  • SNMP. UDP is pushed on top of a module other than IP. When this happens, UDP supports only SNMP semantics.

These modes imply that we don't support any intermediate module between IP and UDP. But in fact, no Solaris release has ever supported such a scenario, because the interlayer communication semantics between IP and transport modules are private.

18.5.3. UDP and Socket Interaction

A significant event that takes place during the socket() system call is the plumbing of modules associated with the socket's address family and protocol type. A TCP or UDP socket will most likely result in sockfs residing directly on top of the corresponding transport module. Before the Solaris 10 release, the socket layer used STREAMS primitives to communicate with the UDP module. Solaris 10 OS allows for a functionally callable interface, which eliminates the need to use T UNITDATA REQ messages for metadata during each transmit from sockfs to UDP. Instead, data and its ancillary information (that is, remote socket address) is provided directly to an alternative UDP entry point, thereby avoiding the extra allocation cost.

Transport modules, being directly beneath sockfs, can use synchronous STREAMS. This enables the transport layer to buffer incoming data for later retrieval (through synchronous STREAMS) when a read operation is issued, thereby shortening the receive processing time.




SolarisT Internals. Solaris 10 and OpenSolaris Kernel Architecture
Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture (2nd Edition)
ISBN: 0131482092
EAN: 2147483647
Year: 2004
Pages: 244

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net