Chapter 5: Socket Options

 < Day Day Up > 



In this chapter, we investigate the socket options that are available for each of the layers of the TCP/IP stack. These options can affect the behavior of the Sockets layer, Transport layer, and IP layer. The most common use of socket options is to increase the performance of a connection, but they’re also widely used in other scenarios. We look at the plethora of options available, and provide examples for each. All code for this chapter can be found on the companion CD-ROM at /software/ch5.

Socket Options API

Getting and setting options for a given socket are performed through two functions, getsockopt and setsockopt. These functions provide a single interface for getting and setting a variety of options using a number of different structures. The socket option prototypes are defined as:

#include <sys/types.h> #include <sys/socket.h> int getsockopt( int sock, int level, int optname,                  void *optval, socklen_t *optlen ); int setsockopt( int sock, int level, int optname,                  void *optval, socklen_t optlen );

All socket options require that the application specify the socket for which the option is to be applied; this is argument one of the call (sock). The level refers to the layer of protocol to which this option will be applied (see Table 5.1 for a list of the protocols and the symbolic constants used by the API). The option name is defined by optname. This is the particular option to be used. Numerous options exist and they are divided by the protocol of interest. The optval argument specifies the value to be set or the location to store the option in a GET request. Finally, the optlen defines the length of the option structure. As there are a number of different structures that can be used to set or get options, this parameter defines the length to avoid the call from overrunning the buffer.

Table 5.1: Level ARGUMENT FOR THE setsockopt/getsockopt FUNCTIONS

Level

Description

Option Prefix

SOL_SOCKET

Sockets layer

SO_

IPPROTO_TCP

TCP Transport layer

TCP_

IPPROTO_IP

IP Network layer

IP_

Within each of the option levels, a number of options can be manipulated. The options are split by level because they affect the operation of the stack at the indicated level. The following sections investigate the socket options that are available and illustrate how to manipulate them.

Sockets Layer Options

The Sockets layer options are those defined within the context of level SOL_SOCKET and focus on the Sockets API layer. The typical options for the Sockets layer are defined in Figure 5.1.

start figure

Option Name

Description

get/set

value

SO_BROADCAST

Permits transmit of broadcast datagrams

g/s

int

SO_DEBUG

Enables debug logging

g/s

int

SO_DONTROUTE

Enables bypass of routing tables

g/s

int

SO_ERROR

Retrieve the current socket error

g

int

SO_LINGER

Enables linger on close if data present

g/s

struct linger

SO_KEEPALIVE

Enables TCP Keepalive probes

g/s

int

SO_RCVBUF

Modifies the size of the socket receive buffer

g/s

int

SO_SNDBUF

Modifies the size of the socket send buffer

g/s

int

SO_RCVLOWAT

Sets the minimum byte count for input

g/s

int

SO_SNDLOWAT

Sets the minimum byte count for output

g/s

int

SO_SNDTIMEO

Sets the timeout value for output

g/s

struct timeval

SO_RCVTIMEO

Sets the timeout value for input

g/s

struct timeval

SO_REUSEADDR

Enables local address reuse

g/s

int

SO_TYPE

Retrieves the socket type

g

int

end figure

Figure 5.1: Sockets layer options.

The get/set column in Figure 5.1 defines whether the option can be retrieved, set, or both. The value column defines what is expected to retrieve or set the option. Let’s now look at examples of each of the options and better understand what effect they have.

SO_BROADCAST Option

The purpose of SO_BROADCAST is to permit a socket to send datagrams to a broadcast address. In order to send a broadcast datagram, the application must specify the destination of the datagram as the broadcast address. If the SO_BROADCAST socket option is not enabled, the broadcast datagrams will be dropped. If set, the datagrams are permitted to be sent. An example of using the SO_BROADCAST option and then sending a broadcast datagram is shown in Listing 5.1.

Listing 5.1 SO_BROADCAST and sending a broadcast datagram.

start example
int    sock, cnt, addrLen, on=1; struct sockaddr_in addr; char   buffer[512]; sock = socket( AF_INET, SOCK_DGRAM, 0 ); /* Permit sending broadcast datagrams */ setsockopt( sock, SOL_SOCKET, SO_BROADCAST,               &on, sizeof(on) ); memset(&addr, 0, sizeof(addr)); addr.sin_family = AF_INET; addr.sin_port = htons(BCAST_PORT); addr.sin_addr.s_addr = inet_addr("255.255.255.255"); addrLen = sizeof(addr); ... /* Send a broadcast datagram */ cnt = sendto(sock, buffer, strlen(buffer), 0,               (struct sockaddr_in *)&addr, addrLen);
end example

SO_DEBUG Option

The SO_DEBUG socket option enables internally logging of interesting events within the TCP layer of a stack within a circular buffer. This logging is commonly compiled away, through conditional compilation, so in a production environment the data is rarely available. If the stack has been compiled with debug logging enabled, enabling this option allows the stack to log this data for later collection. Enabling the option is performed as shown in the code fragment in Listing 5.2.

Listing 5.2 Enabling TCP layer debugging with SO_DEBUG.

start example
int    sock, on=1; sock = socket( AF_INET, SOCK_DGRAM, 0 ); /* Enable TCP layer debugging */ setsockopt( sock, SOL_SOCKET, SO_DEBUG, &on, sizeof(on) );
end example

The method for retrieving the logged data is different for each stack implementation and the relevant documentation should be consulted.

SO_DONTROUTE Option

The SO_DONTROUTE option is used to disable the underlying routing algorithms for a given socket. Before a datagram is emitted onto the physical medium, a set of algorithms is employed to determine where the datagram should be directed. In some cases, the default gateway is used to route datagrams on to their destination. If SO_DONTROUTE is set, the datagram is given to the interface that matches the network portion of the destination address and the route is never used.

The SO_DONTROUTE option is a simple integer option, and is retrieved and set as shown in Listing 5.3.

Listing 5.3 Manipulating the SO_DONTROUTE socket option.

start example
int sock; int val, len, ret; sock = socket( AF_INET, SOCK_DGRAM, 0 ); ... len = sizeof( val ); ret = getsockopt( sock, SOL_SOCKET, SO_DONTROUTE,                     (void *)&val, &len ); printf(" so_dontroute = %d\n", val ); val = 1; ret = setsockopt( sock, SOL_SOCKET, SO_DONTROUTE,                      (void *)&val, sizeof(int) );
end example

If no interface matches the network portion of the destination address, an error message is returned to the application (commonly “Network Unreachable”).

SO_ERROR Option

The SO_ERROR socket option permits the application to retrieve the last error that was recorded by the stack for the given socket. In Linux, the errno variable can be used for this purpose (checked after an error return from a socket call). Other stacks provide specialized functions to return the last error experienced for the socket. In special cases, for example to determine if a nonblocking connect has completed, the SO_ERROR can provide useful error codes.

Retrieving the SO_ERROR value for a given socket is shown in Listing 5.4.

Listing 5.4 Retrieving the SO_ERROR value for a socket.

start example
int sock; int val, len, ret; sock = socket( AF_INET, SOCK_DGRAM, 0 ); ... len = sizeof( val ); ret = getsockopt( sock, SOL_SOCKET, SO_ERROR,                     (void *)&val, &len ); printf( "so_error = %d\n", val );
end example

Setting of the error variable with setsockopt is not permitted. After the error value has been read in the getsockopt function, it is cleared. Prior to utilizing this functionality in your application, a review of the source or documentation should be done to ensure that SO_ERROR is indeed supported.

SO_LINGER Option

Before discussing the purpose of the SO_LINGER option, let’s review what happens when the application performs a socket write operation (write, send, sendto, and so on). The data is moved from the application into the context of the stack and buffered awaiting transmission to the peer. Based upon a number of factors including the advertised window from the peer (receiver flow control) and the congestion window (sender flow control), the data may not be sent immediately. What happens if this buffered data is still waiting to be sent when the sending application closes its end of the socket? This is where SO_LINGER comes into play. The SO_LINGER option tells the stack how to deal with this data that remains to be sent (the default action is to continue to try to send the data).

The SO_LINGER option includes two elements, an enable and a time value. The enable value (l_onoff) is obvious and enables or disables this option for lingering on close with send data present. The time value (l_linger) specifies the number of seconds to linger before closing the socket and discarding the unsent data. This data is encapsulated into a structure called linger on most systems:

struct linger {     int l_onoff;    // enable(1)/disable(0)     int l_linger;    // Linger time in seconds };

Three interesting cases are important to understand when using the SO_LINGER option. These are shown in Table 5.2.

Table 5.2: INTERESTING COMBINATIONS FOR THE SO_LINGER SOCKET OPTION

l_onoff

l_linger

Description

0

N/A

Linger is disabled, normal behavior

1

0

Discard data immediately after close is issued

1

> 0

Linger for the number of seconds defined and then close

Let’s look at the third case in a simple example. In this example, we’ll give the socket ten seconds before discarding any data (see Listing 5.5).

Listing 5.5 Example of the SO_LINGER socket option.

start example
int sock; int ret; struct linger ling; sock = socket( AF_INET, SOCK_STREAM, 0 ); ... ling.l_onoff = 1; ling.l_linger = 10; ret = setsockopt( sock, SOL_SOCKET, SO_LINGER,                    (void *)&ling, sizeof(ling) ); ... ret = close( sock );
end example

The close function commonly blocks until the buffered data has been sent. If the time value (l_linger) specified with SO_LINGER times out, the close function will fail with an error.

SO_KEEPALIVE Option

The SO_KEEPALIVE option is used to enable or disable the TCP keep-alive probes. These probes are used to maintain a TCP connection and regularly test the connection to ensure that it’s still available. The keep-alive probe packet solicits an Ack from the peer, identifying that the connection (and sometimes the peer) is still available.

The keep-alive probe is sent once every two hours, but only if there is no traffic on the given connection. If traffic exists on the connection, there is no point for the keep-alive probe because the peer stack should be acknowledging data and can, therefore, be ruled alive. This option is enabled by default, but can be disabled using this socket option. In the TCP socket options section, the TCP_KEEPALIVE can be used to modify the time between keep-alive probes.

The following example (Listing 5.6) illustrates how to disable the keep-alive probes for a given connection.

Listing 5.6 Disabling the TCP keep-alive probes.

start example
int sock; int ret, on; sock = socket( AF_INET, SOCK_STREAM, 0 ); ... on = 1; ret = setsockopt( sock, SOL_SOCKET, SO_KEEPALIVE,                    (void *)&on, sizeof( on ) );
end example

SO_SNDBUF/SO_RCVBUF Options

The SO_SNDBUF and SO_RCVBUF options permit an application to change the size of the socket buffers used to queue data for transmission and queue data for receipt. These options are a very important mechanism to increase the performance of a connection (explained in more detail in the Chapter 7, Optimizing Sockets Applications).

Because these options change the size of the queue between the Sockets layer and the transport protocol, they must be defined prior to a connection being established. This means that a client must set these options before the connect function is called and a server must perform it before the accept function is called.

An example of setting the send and receive buffers to 32 KB is shown in Listing 5.7.

Listing 5.7 Modifying the send and receive socket buffer sizes.

start example
int sock; int value, ret; sock = socket( AF_INET, SOCK_STREAM, 0 ); ... value = 32768; ret = setsockopt( sock, SOL_SOCKET, SO_SNDBUF,                    (void *)&value, sizeof(value) ); value = 32768; ret = setsockopt( sock, SOL_SOCKET, SO_RCVBUF,                    (void *)&value, sizeof(value) );
end example

An application can also retrieve the default socket buffer sizes using the getsockopt function. This value typically differs based upon the stack being used. Therefore, the relevant socket option documentation should be consulted.

SO_RCVLOWAT Option

The SO_RCVLOWAT socket option defines the minimum number of bytes that should be used for input operations with the select function. Receive calls block if no data is available to be read. If data is available, the call will return the smaller of the number of bytes requested in the receive call or the SO_RECVLOWAT count.

An example of setting the SO_RCVLOWAT option to a value of 48 (wait for at least 48 bytes before returning) is shown in Listing 5.8.

Listing 5.8 Setting SO_RCVLOWAT to await 48 bytes before read operation return.

start example
int sock; int value, ret; sock = socket( AF_INET, SOCK_STREAM, 0 ); ... value = 48; ret = setsockopt( sock, SOL_SOCKET, SO_RCVLOWAT,                    (void *)&value, sizeof( value ) );
end example

For Listing 5.8, the read API function will return when at least 48 bytes have been received for the particular connection. If an error occurs for the given connection, fewer bytes may be returned. The default value for SO_RCVLOWAT is one.

SO_SNDLOWAT Option

The SO_SNDLOWAT option is the opposite of the SO_RCVLOWAT option discussed previously. This option sets the minimum number of bytes necessary for output operations. See Listing 5.9 for an example of setting a minimum of 48 bytes for a select write operation.

Listing 5.9 Setting SO_SNDLOWAT to await 48 bytes before write operation.

start example
int sock; int value, ret; sock = socket( AF_INET, SOCK_STREAM, 0 ); ... value = 48; ret = setsockopt( sock, SOL_SOCKET, SO_SNDLOWAT,                    (void *)&value, sizeof( value) );
end example

SO_SNDTIMEO/SO_RCVTIMEO Options

The SO_SNDTIMEO and SO_RCVTIMEO socket options are used to retrieve the timeout values that apply to input and output operations. For SO_SNDTIMEO, the values represent the timeout value that will be observed for blocking send operations. If a send operation blocks for SO_SNDTIMEO or more, a partial send could occur or an EWOULDBLOCK error if no data was sent (on Linux systems).

The SO_RCVTIMEO provides the same timeout functionality for receive operations. If a blocking-receive operation requires more time than is defined by the SO_RCVTIMEO operation, a short count or an EWOULDBLOCK is returned.

Getting the SO_SNDTIMEO and SO_RCVTIMEO sock options is illustrated in Listing 5.10.

Listing 5.10 Retrieving the SO_SNDTIMEO timeout values.

start example
#include <sys/socket.h> #include <sys/time.h> int sock, ret, len; struct timeval timeo; sock = socket( AF_INET, SOCK_STREAM, 0 ); len = sizeof( timeo ); ret = getsockopt( sock, SOL_SOCKET, SO_SNDTIMEO,                    (void *)&timeo, &len ); printf( "Timeout %d seconds, %d microseconds\n",          timeo.tv_sec, timeo.tv_usec );
end example

These options can only be retrieved, and can never be set by the application. Before attempting to use it, the availability of this socket option should be verified with the stack implementation.

SO_REUSEADDR Option

The SO_REUSEADDR socket option is used to permit reuse of local addresses within the bind function. By local address, we refer here to the address that was bound to a local socket.

To better understand why this option is important, let’s look at a couple of examples that illustrate its purpose. In the first example, we show how the problem most commonly occurs, in which SO_REUSEADDR provides a workaround.

When a server application binds a local address to a socket and then begins accepting connections on it, the local address is bound to the local socket. If we were to halt the server Socket application and restart, the bind would fail. Let’s first look at the server code to better understand why (see Listing 5.11).

Listing 5.11 Sample server code for the “address in use” error.

start example
int sock, ret; struct sockaddr_in servaddr; sock = socket( AF_INET, SOCK_STREAM, 0 ); memset( &servaddr, 0, sizeof(servaddr) ); servaddr.sin_family = AF_INET; servaddr.sin_addr.s_addr = htonl( INADDR_ANY ); servaddr.sin_port = htons( MY_PORT ); ret = bind( sock, (struct sockaddr_in *)&servaddr,              sizeof(servaddr) ); printf( "bind returned %d\n", ret ); ...
end example

This standard server code pattern illustrates creating a socket and then binding a local address to it. If we execute this code (and the subsequent accept function that would be present), and then fail the sequence for some reason, a subsequent attempt to bind the local address will fail. This is commonly known as the “address in use” error. The reason that this occurs is that the server socket is in a state known as a “WAIT_STATE.” For two minutes, the socket remains in this state, and is then freed, permitting the local address to be reused. To force the ability to reuse the local address before the expiration of the two-minute period, the SO_REUSEADDR socket option can be used. To enable reuse, the option must be enabled prior to the call to bind (see Listing 5.12).

Listing 5.12 Enabling local address reuse with SO_REUSEADDR.

start example
int sock, ret; int on; struct sockaddr_in servaddr; sock = socket( AF_INET, SOCK_STREAM, 0 ); on = 1; ret = setsockopt( sock, SOL_SOCKET, SO_REUSEADDR,                    (void *)&on, sizeof( on ) ); memset( &servaddr, 0, sizeof(servaddr) ); servaddr.sin_family = AF_INET; servaddr.sin_addr.s_addr = htonl( INADDR_ANY ); servaddr.sin_port = htons( MY_PORT ); ret = bind( sock, (struct sockaddr_in *)&servaddr,              sizeof(servaddr) ); printf( "bind returned %d\n", ret );
end example

This particular option is very common and can be observed in the initialization code of almost any socket server.

TCP Layer Options

The TCP layer options are those defined within the context of level IPPROTO_TCP and focus on the TCP layer. The typical options for the TCP layer are defined in Figure 5.2.

start figure

Option Name

Description

get/set

value

TCP_KEEPALIVE

Modifies number of seconds between TCP keepalives

g/s

int

TCP_MAXRT

Modifies the maximum TCP retransmit time

g/s

int

TCP_MAXSEG

Modifies the TCP maximum segment size

g/s

int

TCP_NODELAY

Enable/Disable TCP’s Nagle algorithm

g/s

int

end figure

Figure 5.2: TCP layer options.

TCP_KEEPALIVE Option

The TCP_KEEPALIVE option is used to define the number of seconds that the keep-alive probes will be sent when the SO_KEEPALIVE socket option is enabled. Recall from the discussion of SO_KEEPALIVE, that these probes are sent only when the particular connection is inactive.

The TCP_KEEPALIVE option can be set and retrieved. Setting the TCP_KEEPALIVE socket option is illustrated in Listing 5.13.

Listing 5.13 Setting keep-alive probes to 10-second intervals.

start example
int sock, ret, interval; struct sockaddr_in servaddr; sock = socket( AF_INET, SOCK_STREAM, 0 ); interval = 1; ret = setsockopt( sock, IPPROTO_TCP, TCP_KEEPALIVE,                    (void *)&interval, sizeof( interval ) );
end example

TCP_MAXRT Option

The TCP_MAXRT option can be used to define how long to retransmit data over a TCP connection. Once retransmission occurs on a TCP connection, the value specified with the TCP_MAXRT option defines the desired behavior (see Table 5.3).

Table 5.3: MEANING OF VALUES FOR THE TCP_MAXRT SOCKET OPTION

Option Value

Description

-1

Retransmit forever

0

Use the system default behavior

> 0

Number of seconds before the connection is broken

An example of setting the TCP_MAXRT option to three seconds is shown in Listing 5.14.

Listing 5.14 Defining a three-second TCP_MAXRT.

start example
int sock, ret, duration; struct sockaddr_in servaddr; sock = socket( AF_INET, SOCK_STREAM, 0 ); duration = 1; ret = setsockopt( sock, IPPROTO_TCP, TCP_KEEPALIVE,                    (void *)&duration, sizeof( duration ) );
end example

TCP_NODELAY Option

The TCP_NODELAY option permits us to enable or disable the Nagle algorithm within the TCP layer of the stack. The Nagle algorithm (created by John Nagle in the early 1980s at Ford Aerospace) was an important optimization in the TCP stack because it minimizes the number of small segments that can be sent by a device.

Consider an application that sends a small amount of data for each send operation. Without the Nagle algorithm, these small amounts of data would be packaged within TCP and IP headers and sent onto the wire. The Nagle algorithm delays the transmission of data, with the hope that within some small amount of time, more data will arrive for the socket that can be accumulated and sent as a larger packet.

This is an important optimization because maximizing network utilization depends upon the transmission of maximum-sized segments onto the wire (the maximum amount of payload data that can be sent within a packet).

Disabling the Nagle algorithm, via the TCP_NODELAY socket option, can be done as is shown in Listing 5.15.

Listing 5.15 Disabling the Nagle algorithm.

start example
int sock, ret, off; struct sockaddr_in servaddr; sock = socket( AF_INET, SOCK_STREAM, 0 ); off = 1; ret = setsockopt( sock, IPPROTO_TCP, TCP_NODELAY,                    (void *)&off, sizeof( off ) );
end example

We discuss the Nagle algorithm in Chapter 7, Optimizing Sockets Applications, and identify where and when this socket option should be used.

TCP_MAXSEG Option

The TCP_MAXSEG option permits us to change the size of the Maximum Segment Size, otherwise known as the MSS. Let’s first understand the purpose of the MSS and how it relates to other elements.

First, an interface operates with what’s known as a Maximum Transmission Unit, or MTU. This is the largest size packet that may be communicated over the particular interface (known as the “Interface MTU”). When communicating over a network, our packet may encounter a device whose Interface MTU is smaller than yours. If so, the device will fragment your packet into two or more packets to ensure that they fit the given MTU. If we extend this out to the endpoint, the smallest MTU that is supported is called the “Path MTU.”

Returning to the MSS, the MSS is the MTU minus the packet headers (the payload size of a packet). Figure 5.3 illustrates this concept.

click to expand
Figure 5.3: Relationship of a packet with MTU and MSS.

The stack automatically determines the MSS for a given connection (because the Path MTU can be different for each connection in addition to the size of the packet headers). Using the TCP_MAXSEG option, we can statically define this to a size of our liking. This is illustrated in Listing 5.16.

Listing 5.16 Defining a static MSS for a given socket.

start example
int sock, ret, sz; struct sockaddr_in servaddr; sock = socket( AF_INET, SOCK_STREAM, 0 ); sz = 128; ret = setsockopt( sock, IPPROTO_TCP, TCP_MAXSEG,                    (void *)&sz, sizeof( sz ) );
end example

The MSS can be both set and retrieved, but setting it should be done with caution. If the MSS is set to a value that causes the MTU to exceed the Path MTU, fragmentation will occur with a result of performance loss.

IP Layer Options

The IP layer options are those defined within the context of level IPPROTO_IP and focus on the IP layer. The typical options for the IP layer are defined in Figure 5.4.

start figure

Option Name

Description

get/set

value

IP_HDRINCL

IP header precedes data in buffer

g/s

int

IP_TOS

Modifies the IP Type-Of-Service header field

g/s

int

IP_TTL

Modifies the IP Time-To-Live header field

g/s

int

IP_ADD_MEMBERSHIP

Join a multicast group

s

struct mreq

IP_DROP_MEMBERSHIP

Leave a multicast group

s

struct mreq

IP_MULTICAST_IF

Modify the outgoing multicast interface

g/s

struct in_addr

IP_MULTICAST_TTL

Modify the outgoing multicast TTL

g/s

int

IP_MULTICAST_LOOP

Enable/Disable loopback of outgoing

g/s datagrams

int

end figure

Figure 5.4: IP layer options.

IP_HDRINCL Option

The IP_HDRINCL option permits an application developer to write raw IP frames onto the wire and provide the IP header that will be attached. Because the point of this option is to provide our IP header given a raw socket (which is prefixed to each outgoing datagram), the argument for this option is whether an IP header is included in the outgoing datagram. In Listing 5.17, the sample source illustrates sending a raw IP datagram with an IP header of our choosing.

Listing 5.17 Providing an IP header for an IP datagram (ipdgram.c).

start example
#include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> /* Standard IP Header */ typedef struct {   unsigned char verHdrLen;   unsigned char tos;   unsigned short len;   unsigned short ident;   unsigned short flags;   unsigned char ttl;   unsigned char protocol;   unsigned short checksum;   struct in_addr sourceIpAdrs;   struct in_addr destIpAdrs; } ipHdr_t; int main() {   int sock, on, ret;   char buffer[255];   ipHdr_t *ipDatagram;   struct sockaddr_in addr;   extern int errno;   /* Checksum function on the CD-ROM... */   unsigned short checksum( unsigned short *, int );   ipDatagram = (ipHdr_t *)buffer;   sock = socket( AF_INET, SOCK_RAW, 255 );   on = 1;   ret = setsockopt( sock, IPPROTO_IP, IP_HDRINCL,                       &on, sizeof(on) );   ipDatagram->verHdrLen = 0x45;   ipDatagram->tos = 0;   ipDatagram->len = 20;                /* Just a header */   ipDatagram->ident = htons( 1 );   ipDatagram->flags = htons( 0x4000 ); /* Don't fragment */   ipDatagram->ttl = 64;   ipDatagram->protocol = 255;   ipDatagram->checksum = 0;   ipDatagram->sourceIpAdrs.s_addr = 0;   addr.sin_family = AF_INET;   addr.sin_addr.s_addr = inet_addr("192.168.1.1");   ipDatagram->destIpAdrs.s_addr = addr.sin_addr.s_addr;   ipDatagram->checksum =     checksum( (unsigned short *)&ipDatagram,                 sizeof(ipHdr_t) );   ret = sendto( sock, buffer, sizeof(ipHdr_t), 0,                  (struct sockaddr *)&addr, sizeof(addr) );   close( sock );   return 0; }
end example

In Listing 5.17, the first step is to create a socket of type SOCK_RAW. This socket permits us to communicate using IP datagrams. Next, we enable the IP_HDRINCL option using the setsockopt function. Next, we create our IP header using the previously defined ipHdr_t typedef. We populate the IP header with a set of common values. The source and destination IP addresses are interesting fields to note. For the source address (sourceIpAdrs), we leave this field blank (set to zero), which notifies the stack to fill this in with the local source address for the outgoing interface. The destination address is constructed using the inet_addr function. We must provide a standard sockaddr_in structure with the sendto function, so we piggyback the generation of the destination address to also fill in the destination field of the IP header (destIpAdrs).

Note that in some cases, the htons function is used (ident and flags). These are required because what we provide will be the IP header for the datagram. Therefore, the fields that are expected to be in network byte order are converted. The checksum of the IP header must also be calculated; here, we use a checksum routine that is provided on the CD-ROM at /software/ch5/ipdgram.c.

Finally, we send our IP datagram using the sendto function. We include the destination address in the argument list (even though it also exists in our embedded IP header).

IP_TOS Option

The IP_TOS option permits an application to change the Type of Service (TOS) field within the IP header for a given socket. The TOS field is commonly used to specify service precedence within networks that support this feature. The TOS field permits segmenting traffic using quality of service parameters (see Figure 5.5).

click to expand
Figure 5.5: IP TOS field in detail.

The three-bit precedence field defines a category of service (see Figure 5.6). The Delay, Throughput, and Reliability bits indicate a requested quality of service (along three axes).


Figure 5.6: Precedence values and meanings.

Setting the IP_TOS value is performed easily through an integer option, as shown in Listing 5.18.

Listing 5.18 Defining an IP Type of Service for a given connection.

start example
int sock, ret, tos; #define LOW_DELAY           0x10 #define HIGH_THROUGPUT      0x08 #define HIGH_RELIABILITY    0x04 sock = socket( AF_INET, SOCK_STREAM, 0 ); tos = (HIGH_THROUGHPUT | HIGH_RELIABILITY); ret = setsockopt( sock, IPPROTO_IP, IP_TOS,                    (void *)&tos, sizeof( tos ) );
end example

The IP TOS field is commonly ignored on the WAN, but can be used in LAN environments to segment traffic based upon quality of service needs. Some implementations provide quality of service APIs that simplify the manipulation and management of this field.

IP_TTL Option

The IP_TTL option is used to initialize the Time To Live (TTL) field within the IP header. As an IP datagram traverses a network, the TTL field is decremented for each device that it passes through. Once the TTL field reaches zero, the datagram is dropped. The purpose of this field is to prevent a datagram from cycling forever through a network, and specifies the maximum number of hops an IP datagram may take.

Setting the IP_TTL value is performed easily through an integer option, as shown in Listing 5.19. This restricts datagrams to the current subnet. No datagrams will traverse a router or gateway on the current subnet.

Listing 5.19 Defining an IP Time To Live of one for a given connection.

start example
int sock, ret, ttl; sock = socket( AF_INET, SOCK_DGRAM, 0 ); ttl = 1; ret = setsockopt( sock, IPPROTO_IP, IP_TTL,                    (void *)&ttl, sizeof( ttl ) );
end example

IP_ADD_MEMBERSHIP Option

The IP_ADD_MEMBERSHIP option is used to enable receipt of multicast packets for a given multicast address. The mreq structure is used to configure multicast packet receipt and has the following structure:

    struct ip_mreq {         struct in_addr imr_multiaddr;         struct in_addr imr_interface; };

The imr_multiaddr is a 32-bit network-byte-order IP address that represents the multicast address for which we want to subscribe. The imr_interface represents the interface on which we want to subscribe to that multicast communication (such as is done with the bind function). This can be the address of an available interface on the host (such as “192.168.1.1”) or INADDR_ANY.

Joining a multicast group is illustrated in Listing 5.20. In this example, the multicast group address “239.255.255.253” is joined over all available interfaces on the current host (defined by INADDR_ANY). This provides the ability to receive packets sent to address “239.255.255.253” through the socket defined by sock.

Listing 5.20 Subscribing to a multicast group for a given connection.

start example
int sock, ret; struct ip_mreq mreq; sock = socket( AF_INET, SOCK_DGRAM, 0 ); ... bzero( (void *)&mreq, sizeof(mreq) ); mreq.imr_multiaddr.s_addr = inet_addr("239.255.255.253"); mreq.imr_interface.s_addr = htonl( INADDR_ANY ); ret = setsockopt( sock, IPPROTO_IP, IP_ADD_MEMBERSHIP,                    (void *)&mreq, sizeof( mreq ) );
end example

If we want to restrict multicast receipt to a specific interface, we would specify the address in the imr_interface field of the mreq structure, such as:

    mreq.imr_interface.s_addr = inet_addr( "192.168.1.1" );

The IP_ADD_MEMBERSHIP can be set with setsockopt, but not read with getsockopt.

IP_DROP_MEMBERSHIP Option

The IP_DROP_MEMBERSHIP option is used to drop membership to a multicast group for a given socket, as shown in Listing 5.21. The preparation for this call is identical to the IP_ADD_MEMBERSHIP function. We specify the multicast address in imr_multiaddr and the interface over which we’ve subscribed as imr_interface.

Listing 5.21 Leaving a multicast group for a given connection.

start example
int sock, ret; struct ip_mreq mreq; sock = socket( AF_INET, SOCK_DGRAM, 0 ); ... bzero( (void *)&mreq, sizeof(mreq) ); mreq.imr_multiaddr.s_addr = inet_addr("239.255.255.253"); mreq.imr_interface.s_addr = htonl( INADDR_ANY ); ret = setsockopt( sock, IPPROTO_IP, IP_ADD_MEMBERSHIP,                    (void *)&mreq, sizeof( mreq ) ); ... ret = setsockopt( sock, IPPROTO_IP, IP_DROP_MEMBERSHIP,                    (void *)&mreq, sizeof( mreq ) );
end example

As shown in Listing 5.21, the ip_mreq structure used to join a multicast group should be used to leave it.

The IP_DROP_MEMBERSHIP can be set with setsockopt, but not read with getsockopt.

IP_MULTICAST_IF Option

The previous options, IP_ADD_MEMBERSHIP and IP_DROP_MEMBERSHIP, are used to configure receipt of multicast datagrams. The IP_MULTICAST_IF option is used to configure which interface will be used to send multicast datagrams (for hosts that have multiple interfaces). The primary interface is the default interface from which multicast datagrams are transmitted, but this can be changed using the IP_MULTICAST_IF option.

The IP_MULTICAST_IF option uses an in_addr structure to define the interface for outgoing datagrams. The in_addr structure must contain the IP address for the interface of choice. Consider the following example in Listing 5.22.

Listing 5.22 Setting the outgoing multicast interface for a given connection.

start example
int sock, ret; struct in_addr intf_addr; sock = socket( AF_INET, SOCK_DGRAM, 0 ); ... intf_addr.s_addr = inet_addr("192.168.1.2"); ret = setsockopt( sock, IPPROTO_IP, IP_MULTICAST_IF,                    (void *)&intf_addr, sizeof( intf_addr ) );
end example

In Listing 5.22, we initialize our s_addr structure (intf_addr) with a network byte order IP address (converted by inet_addr). The setsockopt call is then used with intf_addr to configure the interface represented by “192.168.1.2” as our outgoing multicast interface.

IP_MULTICAST_TTL Option

The IP_MULTICAST_TTL option is used to change the TTL field for outgoing multicast packets. This value defaults to one for multicast sockets, but can be adjusted up to 255 (representing the number of multicast router hops possible through the network before the packet is dropped). Listing 5.23 illustrates the use of the IP_MULTICAST_TTL option.

Listing 5.23 Setting the IP_MULTICAST_TTL.

start example
int sock, ret, mttl; sock = socket( AF_INET, SOCK_DGRAM, 0 ); mttl = 10; /* 10 hops */ ret = setsockopt( sock, IPPROTO_IP, IP_MULTICAST_TTL,                    (void *)&mttl, sizeof( mttl ) );
end example

In the example from Listing 5.23, a maximum of 10 hops is defined for the given multicast socket.

IP_MULTICAST_LOOP Option

For hosts on which multiple applications subscribe to a given multicast address, all outgoing datagrams are looped back so that other applications can read it (the default is enabled). The IP_MULTICAST_LOOP option permits the disabling of the loopback. This can be done to optimize, because not looping datagrams back increases the performance of the application (and underlying multicast layer).

The following example, Listing 5.24, illustrates disabling the loopback option.

Listing 5.24 Setting the IP_MULTICAST_LOOP.

start example
int sock, ret, on; sock = socket( AF_INET, SOCK_DGRAM, 0 ); on = 0; ret = setsockopt( sock, IPPROTO_IP, IP_MULTICAST_LOOP,                    (void *)&on, sizeof( on ) );
end example

This option should be used with care, because disabling loopback means that all outgoing traffic will be seen by other subscribing hosts, but not applications on the current host.



 < Day Day Up > 



BSD Sockets Programming from a Multi-Language Perspective
Network Programming for Microsoft Windows , Second Edition (Microsoft Programming Series)
ISBN: 1584502681
EAN: 2147483647
Year: 2003
Pages: 225
Authors: Jim Ohlund

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net