Connection-Oriented Protocols | Linux Server Hacks, Volume Two: Tips & Tools for Connecting, Monitoring, and Troubleshooting

In this first section, we'll cover the Winsock functions necessary for both receiving connections and establishing connections. We'll first discuss how to listen for client connections and explore the process for accepting or rejecting a connection. Then we'll describe how to initiate a connection to a server. Finally, we will discuss how data is transferred in a connection session.

Server API Functions

A server is a process that waits for any number of client connections with the purpose of servicing their requests. A server must listen for connections on a well-known name. In TCP/IP, this name is the IP address of the local interface and a port number. Every protocol has a different addressing scheme and therefore a different naming method. The first step in Winsock is to bind a socket of the given protocol to its wellknown name, which is accomplished with the bind API call. The next step is to put the socket into listening mode, which is performed (appropriately enough) with the listen API function. Finally, when a client attempts a connection, the server must accept the connection with either the accept or the WSAAccept call. In the next few sections, we will discuss each API call that is required for binding and listening and for accepting a client connection. Figure 7-1 illustrates the basic calls a server and a client must perform in order to establish a communication channel.

click to view at full size.

Figure 7-1. Winsock basics for server and client

bind

Once the socket of a particular protocol is created, you must bind the socket to a well-known address. The bind function associates the given socket with a well-known address. This function is declared as

 int bind(   SOCKET  s,  const struct sockaddr FAR* name,  int  namelen );

The first parameter, s, is the socket on which you want to wait for client connections. The second parameter is of type struct sockaddr, which is simply a generic buffer. You must actually fill out an address buffer specific to the protocol you are using and cast that as a struct sockaddr when calling bind. The Winsock header file defines the type SOCKADDR as struct sockaddr. We'll use this type throughout the chapter for brevity. The third parameter is simply the size of the protocol-specific address structure being passed. For example, the following code illustrates how this is done on a TCP connection:

 SOCKET s; struct sockaddr_in tcpaddr; int port = 5150; s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); tcpaddr.sin_family = AF_INET; tcpaddr.sin_port = htons(port); tcpaddr.sin_addr.s_addr = htonl(INADDR_ANY); bind(s, (SOCKADDR *)&tcpaddr, sizeof(tcpaddr));

If the structure sockaddr_in looks mysterious to you, consult the TCP/IP addressing section in Chapter 6. From the example, you'll see a stream socket being created, followed by setting up the TCP/IP address structure on which client connections will be accepted. In this case, the socket is being bound to the default IP interface on port number 5150. The call to bind formally establishes this association of the socket with the IP interface and port.

On error, bind returns SOCKET_ERROR. The most common error encountered with bind is WSAEADDRINUSE. When you're using TCP/IP, the WSAEADDRINUSE error indicates that another process is already bound to the local IP interface and port number or that the IP interface and port number are in the TIME_WAIT state. If you call bind again on a socket that is already bound, WSAEFAULT will be returned.

listen

The next piece of the equation is to put the socket into listening mode. The bind function merely associates the socket with a given address. The API function that tells a socket to wait for incoming connections is listen, which is defined as

 int listen(   SOCKET  s,  int  backlog );

Again, the first parameter is a bound socket. The backlog parameter specifies the maximum queue length for pending connections. This is important when several simultaneous requests are made to the server. For example, let's say the backlog parameter is set to 2. If three client requests are made at the same time, the first two will be placed in a "pending" queue so that the application can service their requests. The third connection request will fail with WSAECONNREFUSED. Note that once the server accepts a connection, the connection request is removed from the queue so that others can make a request. The backlog parameter is silently limited to a value determined by the underlying protocol provider. Illegal values are replaced with their nearest legal values. Additionally, there is no standard provision for finding the actual backlog value.

The errors associated with listen are fairly straightforward. By far the most common is WSAEINVAL, which usually indicates that you forgot to call bind before listen. Otherwise, it is possible to receive the WSAEADDRINUSE error on the listen call as opposed to the bind call. This error occurs most often on the bind call.

accept and WSAAccept

Now you're ready to accept client connections. This is accomplished with either the accept or the WSAAccept function. The prototype for accept is

 SOCKET accept(  SOCKET  s,  struct sockaddr FAR*  addr,  int FAR*  addrlen );

Parameter s is the bound socket that is in a listening state. The second parameter should be the address of a valid SOCKADDR_IN structure, while addrlen should be a reference to the length of the SOCKADDR_IN structure. For a socket of another protocol, substitute the SOCKADDR_IN with the SOCKADDR structure corresponding to that protocol. A call to accept services the first connection request in the queue of pending connections. When the accept function returns, the addr structure contains the IP address information of the client making the connection request, while the addrlen parameter indicates the size of the structure. Additionally, accept returns a new socket descriptor that corresponds to the accepted client connection. For all subsequent operations with this client, the new socket should be used. The original listening socket is still used to accept other client connections and is still in listening mode.

Winsock 2 introduced the function WSAAccept, which has the ability to conditionally accept a connection based on the return value of a condition function. The prototype for this new function is

 SOCKET WSAAccept(  SOCKET  s,  struct sockaddr FAR *  addr,  LPINT  addrlen,  LPCONDITIONPROC  lpfnCondition,  DWORD  dwCallbackData );

The first three parameters are the same as the Winsock 1 version of accept. The lpfnCondition argument is a pointer to a function that is called upon a client request. This function determines whether to accept the client's connection request. The prototype for this function is

 int CALLBACK ConditionFunc( LPWSABUF lpCallerId, LPWSABUF lpCallerData, LPQOS lpSQOS, LPQOS lpGQOS, LPWSABUF lpCalleeId, LPWSABUF lpCalleeData, GROUP FAR * g, DWORD dwCallbackData );

The lpCallerId parameter is a value parameter that contains the address of the connecting entity. The WSABUF structure is commonly used by many Winsock 2 functions. It is declared as

 typedef struct __WSABUF { u_long len; char FAR * buf; } WSABUF, FAR * LPWSABUF;

Depending on its use, the len field refers either to the size of the buffer pointed to by the buf field or to the amount of data contained in the data buffer buf.

For lpCallerId, the buf pointer points to an address structure for the given protocol on which the connection is made. To correctly access the information, simply cast the buf pointer to the appropriate SOCKADDR type. In the case of TCP/IP, this is, of course, a SOCKADDR_IN structure that will contain the IP address of the client making the connection. Most network protocols can be expected to support caller ID information at connection-request time.

The lpCallerData parameter contains any connection data sent by the client along with the connection request. If caller data was not specified, this parameter is NULL. Be aware that most network protocols, such as TCP, do not support connect data. Whether a protocol supports connect or disconnect data can be determined by consulting its entry in the Winsock catalog with the WSAEnumProtocols function. See Chapter 5 for the specifics.

The next two parameters, lpSQOS and lpGQOS, specify any quality of service (QOS) parameters that are being requested by the client. Both parameters reference a QOS structure that contains information regarding bandwidth requirements for both sending and receiving data. If the client is not requesting QOS, these parameters will be NULL. The difference between these two parameters is that lpSQOS refers to a single connection, while lpGQOS is used for socket groups. Socket groups are not implemented or supported in Winsock 1 or 2. (See Chapter 12 for further details about QOS.)

The lpCalleeId is another WSABUF structure containing the local address to which the client has connected. Again, the buf field of this structure points to a SOCKADDR object of the appropriate address family. This information is useful in the event that the server is running on a multihomed machine. Remember that if a server binds to the address INADDR_ANY, connection requests are serviced on any network interface. This parameter will contain the specific interface on which the connection occurred.

The lpCalleeData parameter is the complement of lpCallerData. The lpCalleeData parameter points to a WSABUF structure that the server can use to send data back to the client as a part of the connection request process. If the service provider supports this option, the len field indicates the maximum number of bytes the server can send back to the client as a part of this connection request. In this case, the server would copy any number of bytes up to this amount into the buf portion of the WSABUF structure and update the len field to indicate the number of bytes being transferred. If the server does not want to return any connect data, the conditional accept function should set the len field to 0 before returning. If the provider does not support connect data, the len field will be 0. Again, most protocols do not support data exchange upon accept. In fact, none of the currently supported protocols on any Win32 platform support this feature.

Once the server has processed parameters passed into the conditional function, the server must indicate whether to accept, reject, or defer the client's connection request. If the server is accepting the connection, the conditional function should return CF_ACCEPT. Upon rejection, the function should return CF_REJECT. If for some reason the decision cannot be made at this time, CF_DEFER can be returned. When the server is prepared to handle this connection request, it should call WSAAccept. Note that the condition function runs in the same thread as the WSAAccept function and should return as soon as possible. Also be aware that for the protocols supported by the current Win32 platforms, the conditional accept function does not imply that the client's connection request is delayed until a value is returned from this conditional function. In most cases, the underlying network stack has already accepted the connection at the time the conditional accept function is called. If the value CF_REJECT is returned, the underlying stack simply closes the connection. We won't go into the detailed usage of the conditional acceptance function now, as this information will be more useful in Chapter 12.

If an error occurs, INVALID_SOCKET is returned. The most common error encountered is WSAEWOULDBLOCK if the listening socket is in asynchronous or nonblocking mode and there is no connection to be accepted. When a conditional function returns CF_DEFER, WSAAccept returns the error WSATRY_AGAIN. If the condition function returns CF_REJECT, the WSAAccept error is WSAECONNREFUSED.

Client API Functions

The client is much simpler and involves fewer steps to set up a successful connection. There are only three steps for a client:

Create a socket with socket or WSASocket.

Resolve the server's name (dependent on underlying protocol).

Initiate the connection with connect or WSAConnect.

You already know from Chapter 6 how to create the socket and resolve an IP host name, so the only remaining step is establishing a connection. Chapter 6 also covers the various name-resolution methods for other protocol families.

TCP States
As a Winsock programmer, you are not required to know the actual TCP states, but by knowing them you will gain a better understanding of how the Winsock API calls effect change in the underlying protocol. Additionally, many programmers run into a common problem when closing sockets; the TCP states surrounding a socket closure are of the most interest.

The start state of every socket is the CLOSED state. When a client initiates a connection, it sends a SYN packet to the server and puts the client socket in the SYN_SENT state. When the server receives the SYN packet, it sends a SYN-and-ACK packet, which the client responds to with an ACK packet. At this point, the client's socket is in the ESTABLISHED state. If the server never sends a SYN-ACK packet, the client times out and reverts to the CLOSED state.

When a server's socket is bound and is listening on a local interface and port, the state of the socket is LISTEN. When a client attempts a connection, the server receives a SYN packet and responds with a SYN-ACK packet. The state of the server's socket changes to SYN_RCVD. Finally, the client sends an ACK packet, which causes the state of the server's socket to change to ESTABLISHED.

Once the application is in the ESTABLISHED state, there are two paths for closure. If your application initiates the closure, the closure is known as an active socket closure; otherwise, the socket closure is passive. Figure 7-2 illustrates both an active and a passive closure. If you actively initiate a closure, your application sends a FIN packet. When your application calls closesocket or shutdown (with SD_SEND as its second argument), your application sends a FIN packet to the peer, and the state of your socket changes to FIN_WAIT_1. Normally, the peer responds with an ACK packet, and your socket's state becomes FIN_WAIT_2. If the peer also closes the connection, it sends a FIN packet and your computer responds by sending an ACK packet and placing your socket in the TIME_WAIT state.

The TIME_WAIT state is also called the 2MSL wait state. MSL stands for Maximum Segment Lifetime and represents the amount of time a packet can exist on the network before being discarded. Each IP packet has a time-to-live (TTL) field, which when decremented to 0 causes the packet to be discarded. Each router on the network that handles the packet decrements the TTL by 1 and passes the packet on. Once an application enters the TIME_WAIT state, it remains there for twice the MSL time. This allows TCP to re-send the final ACK in case it's lost, causing the FIN to be retransmitted. After the 2MSL wait state completes, the socket goes to the CLOSED state.

On an active close, two other paths lead to the TIME_WAIT state. In our previous discussion, only one side issues a FIN and receives an ACK response, but the peer is still free to send data until it too closes. This is where the other two paths come into play. In one path—the simultaneous close—a computer and its peer at the other side of a connection issue a close at the same time: the computer sends a FIN packet to the peer and receives a FIN packet from the peer. Then the computer sends an ACK packet in response to the peer's FIN packet and changes its socket to the CLOSING state. Once the computer receives the last ACK packet from the peer, the computer's socket state becomes TIME_WAIT.

Figure 7-2. TCP socket closure states

The other path for an active closure is just a variation on the simultaneous close: the socket transitions from the FIN_WAIT_1 state directly to the TIME_WAIT state. This occurs when an application sends a FIN packet but shortly thereafter receives a FIN-ACK packet from the peer. In this case, the peer is acknowledging the application's FIN packet and sending its own, to which the application responds with an ACK packet.

The major effect of the TIME_WAIT state is that while a TCP connection is in the 2MSL wait state, the socket pair defining that connection cannot be reused. A socket pair is the combination of local IP_local port and remote IPremote port. Some TCP implementations do not allow the reuse of any port number in a socket pair in the TIME_WAIT state. Microsoft's implementation does not suffer from this deficiency. However, if a connection is attempted in which the socket pair is already in the TIME_WAIT state, the connection attempt will fail with error WSAEADDRINUSE. One way around this (besides waiting for the socket pair that is using that local port to leave the TIME_WAIT state) is to use the socket option SO_REUSEADDR. Chapter 9 covers the SO_REUSEADDR option in detail.

The last point of discussion for socket states is the passive closure. In this scenario, an application receives a FIN packet from the peer and responds with an ACK packet. At this point, the application's socket changes to the CLOSE_WAIT state. Because the peer has closed its end, it can't send any more data, but the application still can until it also closes its end of the connection. To close its end of the connection, the application sends its own FIN, causing the application's TCP socket state to become LAST_ACK. After the application receives an ACK packet from the peer, the application's socket reverts to the CLOSED state.

For more information regarding the TCP/IP protocol, consult RFC 793. This RFC and others can be found at http://www.rfc-editor.org.

connect and WSAConnect

The only new step is the connect. This is accomplished by calling either connect or WSAConnect. First we'll look at the Winsock 1 version of this function, which is defined as

 int connect( SOCKET s, const struct sockaddr FAR* name, int namelen );

The parameters are fairly self-explanatory: s is the valid TCP socket on which to establish the connection, name is the socket address structure (SOCKADDR_IN) for TCP that describes the server to connect to, and namelen is the length of the name variable. The Winsock 2 version is defined as

 int WSAConnect( SOCKET s, const struct sockaddr FAR * name, int namelen, LPWSABUF lpCallerData, LPWSABUF lpCalleeData, LPQOS lpSQOS, LPQOS lpGQOS );

The first three parameters are exactly the same as the connect API function. The next two, lpCallerData and lpCalleeData, are string buffers used to send and receive data at the time of the connection request. The lpCallerData parameter is a pointer to a buffer that holds data the client sends to the server with the connection request. The lpCalleeData parameter points to a buffer that will be filled with any data sent back from the server at the time of connection setup. Both of these variables are WSABUF structures, so the len field needs to be set to the length of data in the buf field that is to be transferred in the case of lpCallerData. For lpCalleeData, the len field refers to the length of the buffer in buf that can receive data back from the server. The last two parameters, lpSQOS and lpGQOS, refer to QOS structures that define the bandwidth requirements for both sending and receiving data on the connection to be established. The parameter lpSQOS is used to specify requirements for the socket s, while lpGQOS specifies the requirements for socket groups. Socket groups are not currently supported. A null value for lpSQOS indicates no application-specific QOS.

If the computer you're attempting to connect to does not have a process listening on the given port, the connect call fails with the error WSAECONNREFUSED. The other error you might encounter is WSAETIMEDOUT, which occurs if the destination you're trying to reach is unavailable (either because of a communication-hardware failure on the route to the host or because the host is not currently on the network).

Data Transmission

Sending and receiving data is what network programming is all about. For sending data on a connected socket, there are two API functions: send and WSASend. The second function is specific to Winsock 2. Likewise, two functions are for receiving data on a connected socket: recv and WSARecv. The latter is also a Winsock 2 call.

An important thing to keep in mind is that all buffers associated with sending and receiving data are of the simple char type. That is, there are no UNICODE versions of these functions. This is especially significant on Windows CE, as it uses UNICODE by default. In situations in which you are using UNICODE, you have the option of sending a character string as is or casting it as a char *. The catch is that if you use the string length function to tell the Winsock API functions how many characters to send or receive, you must multiply this value by 2 because each character occupies 2 bytes of the string array. The other option is to use WideCharToMultiByte to convert UNICODE to ASCII before passing the string data to the Winsock API functions.

Additionally, the error code returned by all send and receive functions is SOCKET_ERROR. Once an error is returned, call WSAGetLastError to obtain extended error information. The most common errors encountered are WSAECONNABORTED and WSAECONNRESET. Both of these deal with the connection being closed—either through a timeout or through the peer closing the connection. Another common error is WSAEWOULDBLOCK, which is normally encountered when either nonblocking or asynchronous sockets are used. This error basically means that the specified function cannot be completed at this time. In Chapter 8, we will describe various Winsock I/O methods that can help you avoid some of these errors.

send and WSASend

The first API function to send data on a connected socket is send, which is prototyped as

 int send( SOCKET s, const char FAR * buf, int len, int flags );

The SOCKET parameter is the connected socket to send the data on. The second parameter, buf, is a pointer to the character buffer that contains the data to be sent. The third parameter, len, specifies the number of characters in the buffer to send. Finally, the flags parameter can be either 0, MSG_DONTROUTE, or MSG_OOB. Alternatively, the flags parameter can be a bitwise ORing of any of those flags. The MSG_DONTROUTE flag tells the transport not to route the packets it sends. It is up to the underlying transport to honor this request (for example, if the transport doesn't support this option, it will be ignored). The MSG_OOB flag signifies that the data should be sent out of band.

On a good return, send returns the number of bytes sent; otherwise, if an error occurs, SOCKET_ERROR is returned. A common error is WSAECONNABORTED, which occurs when the virtual circuit terminates because of a timeout failure or a protocol error. When this occurs, the socket should be closed, as it is no longer usable. The error WSAECONNRESET occurs when the application on the remote host resets the virtual circuit by executing a hard close or terminating unexpectedly, or when the remote host is rebooted. Again, the socket should be closed after this error occurs. The last common error is WSAETIMEDOUT, which occurs when the connection is dropped because of a network failure or the remote connected system going down without notice.

The Winsock 2 version of the send API function, WSASend, is defined as

 int WSASend( SOCKET s, LPWSABUF lpBuffers, DWORD dwBufferCount, LPDWORD lpNumberOfBytesSent, DWORD dwFlags, LPWSAOVERLAPPED lpOverlapped, LPWSAOVERLAPPED_COMPLETION_ROUTINE lpCompletionROUTINE );

The socket is a valid handle to a connection session. The second parameter is a pointer to one or more WSABUF structures. This can be either a single structure or an array of such structures. The third parameter indicates the number of WSABUF structures being passed. Remember that each WSABUF structure is itself a character buffer and the length of that buffer. You might wonder why you would want to send more than one buffer at a time. This is called scatter-gather I/O and will be discussed later in this chapter; however, in the case of data sent using multiple buffers on a connected socket, each buffer is sent from the first to the last WSABUF structure in the array. The lpNumberOfBytesSent is a pointer to a DWORD that on return from the WSASend call contains the total number of bytes sent. The dwFlags parameter is equivalent to its counterpart in send. The last two parameters, lpOverlapped and lpCompletionROUTINE, are used for overlapped I/O. Overlapped I/O is one of the asynchronous I/O models supported by Winsock and is discussed in detail in Chapter 8.

The WSASend function sets lpNumberOfBytesSent to the number of bytes written. The function returns 0 on success and SOCKET_ERROR on any error, and generally encounters the same errors as the send function.

WSASendDisconnect

This function is rather specialized and not generally used. The function prototype is

 int WSASendDisconnect ( SOCKET s, LPWSABUF lpOUT boundDisconnectData );

Out-of-Band Data
When an application on a connected stream socket needs to send data that is more important than regular data on the stream, it can mark the important data as out-of-band (OOB) data. The application on the other end of a connection can receive and process OOB data through a separate logical channel that is conceptually independent of the data stream.

In TCP, OOB data is implemented via an urgent 1-bit marker (called URG) and a 16-bit pointer in the TCP segment header that identify a specific downstream byte as urgent data. Two specific ways of implementing urgent data currently exist for TCP. RFC 793, which describes TCP and introduces the concept of urgent data, indicates that the urgent pointer in the TCP header is a positive offset to the byte that follows the urgent data byte. However, RFC 1122 describes the urgent offset as pointing to the urgent byte itself.

The Winsock specification uses the term OOB to refer to both protocol-independent OOB data and TCP's implementation of OOB data (urgent data). In order to check whether pending data contains urgent data, you must call the ioctlsocket function with the SIOCATMARK option. Chapter 9 discusses how to use SIOCATMARK.

Winsock provides several methods for obtaining the urgent data. Either the urgent data is inlined so that it appears in the normal data stream, or in-lining can be turned off so that a discrete call to a receive function returns only the urgent data. The socket option SO_OOBINLINE, also discussed in detail in Chapter 9, controls the behavior of OOB data.

Telnet and Rlogin use urgent data for several reasons. However, unless you plan on writing your own Telnet or Rlogin, you should stay away from urgent data. It's not well defined and might be implemented differently on platforms other than Win32. If you require a method of signaling the peer for urgent reasons, implement a separate control socket for this urgent data and reserve the main socket connection for normal data transfers.

The function initiates a shutdown of the socket and sends disconnect data. Of course, this function is available only to those transport protocols that support graceful close and disconnect data. None of the transport providers currently support disconnect data. The WSASendDisconnect function behaves like a call to the shutdown function with an SD_SEND argument, but it also sends the data contained in its boundDisconnectData parameter. Subsequent sends are not allowed on the socket. Upon failure, WSASendDisconnect returns SOCKET_ERROR. This function can encounter some of the same errors as the send function.

recv and WSARecv

The recv function is the most basic way to accept incoming data on a connected socket. This function is defined as

 int recv( SOCKET s, char FAR* buf, int len, int flags );

The first parameter, s, is the socket on which data will be received. The second parameter, buf, is the character buffer that will receive the data, while len is either the number of bytes you want to receive or the size of the buffer, buf. Finally, the flags parameter can be one of the following values: 0, MSG_PEEK, or MSG_OOB. Additionally, you can bitwise OR any one of these flags together. Of course, 0 specifies no special actions. MSG_PEEK causes the data that is available to be copied into the supplied receive buffer, but this data is not removed from the system's buffer. The number of bytes pending is also returned.

Message peeking is bad. Not only does it degrade performance, as you now need to make two system calls (one to peek and one without the MSG_PEEK flag to actually remove the data), but it is also unreliable under certain circumstances. The data returned might not reflect the entire amount available. Also, by leaving data in the system buffers, the system has less and less space to contain incoming data. As a result, the system reduces the TCP window size for all senders. This prevents your application from achieving the maximum possible throughput. The best thing to do is to copy all the data you can into your own buffer and manipulate it there. You have seen the MSG_OOB flag before in the discussion on sending data. Refer to the previous section for more information.

There are some considerations when using recv on a message- or datagram-based socket. In the event that the data pending is larger than the supplied buffer, the buffer is filled with as much data as it will contain. In this event, the recv call generates the error WSAEMSGSIZE. Note that the message-size error occurs with message-oriented protocols. Stream protocols buffer incoming data and will return as much data as the application requests, even if the amount of pending data is greater. Thus, for streaming protocols you will not encounter the WSAEMSGSIZE error.

The WSARecv function adds some new capabilities over recv, such as overlapped I/O and partial datagram notifications. The definition of WSARecv is

 int WSARecv( SOCKET s, LPWSABUF lpBuffers, DWORD dwBufferCount, LPDWORD lpNumberOfBytesRecvd, LPDWORD lpFlags, LPWSAOVERLAPPED lpOverlapped, LPWSAOVERLAPPED_COMPLETION_ROUTINE lpCompletionROUTINE );

Parameter s is the connected socket. The second and third parameters are the buffers to receive the data. The lpBuffers parameter is an array of WSABUF structures, while dwBufferCount indicates the number of WSABUF structures in the array. The lpNumberOfBytesReceived parameter points to the number of bytes received by this call if the receive operation completes immediately. The lpFlags parameter can be one of the values MSG_PEEK, MSG_OOB, or MSG_PARTIAL or a bitwise ORed combination of those values. The MSG_PARTIAL flag has several different meanings depending on where it is used or encountered. For message-oriented protocols, this flag is set upon return from WSARecv (if the entire message could not be returned in this call because of insufficient buffer space). In this case, subsequent WSARecv calls set this flag until the entire message is returned, when the MSG_PARTIAL flag is cleared. If this flag is passed as an input parameter, the receive operation should complete as soon as data is available, even if it is only a portion of the entire message. The MSG_PARTIAL flag is used only with message-oriented protocols, not with streaming ones. Additionally, not all protocols support partial messages. The protocol entry for each protocol contains a flag indicating whether it supports this feature. See Chapter 5 for more information. The lpOverlapped and lpCompletionROUTINE parameters are used in overlapped I/O operations, discussed in Chapter 8.

WSARecvDisconnect

This function is the opposite of WSASendDisconnect and is defined as follows:

 int WSARecvDisconnect( SOCKET s, LPWSABUF lpInboundDisconnectData );

Like its sending counterpart, the parameters are the connected socket handle and a valid WSABUF structure with the data to be received. The data received can only be disconnect data sent by a WSASendDisconnect on the other side; it cannot be used to receive normal data. Additionally, once the data is received, this function disables reception from the remote party, which is equivalent to calling the shutdown function with SD_RECV.

WSARecvEx

The WSARecvEx function is a Microsoft-specific extension of Winsock 1 and is identical to the recv function except that the flags parameter is passed by reference. This allows the underlying provider to set the MSG_PARTIAL flag. The function prototype is as follows:

 int PASCAL FAR WSARecvEx( SOCKET s, char FAR * buf, int len, int *flags );

The MSG_PARTIAL flag is returned in the flags parameter if the data received is not a complete message. This flag is of interest for message-oriented (nonstream) protocols. If the MSG_PARTIAL flag is passed as a part of the flags parameter and a partial message is received, the call returns immediately with that data. If the supplied receive buffer is not large enough to hold an entire message, WSARecvEx fails with the WSAEMSGSIZE error and the remaining data is truncated. Note that the difference between a MSG_PARTIAL flag and a WSAEMSGSIZE error is that with the error, the whole message arrives but the supplied data buffer is too small to receive it. The MSG_PEEK and MSG_OOB flags can also be used with WSARecvEx.

Stream Protocols

Because most connection-oriented protocols are also streaming protocols, we'll mention stream protocols here. The main thing to be aware of with any function that sends or receives data on a stream socket is that you are not guaranteed to read or write the amount of data you request. Let's say you have a character buffer with 2048 bytes of data you want to send with the send function. The code to send this is

 char sendbuff[2048]; int nBytes = 2048; // Fill sendbuff with 2048 bytes of data // Assume s is a valid, connected stream socket ret = send(s, sendbuff, nBytes, 0);

It is possible for send to return having sent less than 2048 bytes. The ret variable will be set to the number of bytes sent because the system allocates a certain amount of buffer space for each socket to send and receive data. In the case of sending data, the internal buffers hold data to be sent until such time as the data can be placed on the wire. Several common situations can cause this. For example, simply transmitting a huge amount of data will cause these buffers to become filled quickly. Also, for TCP/IP, there is what is known as the window size. The receiving end will adjust this window size to indicate how much data it can receive. If the receiver is being flooded with data, it might set the window size to 0 in order to catch up with the pending data. This will force the sender to stop until it receives a new window size greater than 0. In the case of our send call, there might only be buffer space to hold 1024 bytes, in which case you would have to resubmit the remaining 1024 bytes. The following code ensures that all your bytes are sent.

 char sendbuff[2048]; int nBytes = 2048, nLeft, idx; // Fill sendbuff with 2048 bytes of data // Assume s is a valid, connected stream socket nLeft = nBytes; idx = 0; while (nLeft > 0) { ret = send(s, &sendbuff[idx], nLeft, 0); if (ret == SOCKET_ERROR) { // Error } nLeft -= ret; idx += ret; }

The foregoing holds true for receiving data on a stream socket but is less significant. Because stream sockets are a continuous stream of data, when an application reads it isn't generally concerned with how much data it should read. If your application requires discrete messages over a stream protocol, you might have to do a little work. If all the messages are the same size, life is pretty simple, and the code for reading, say, 512-byte messages would look like this:

 char recvbuff[1024]; int ret, nLeft, idx; nLeft = 512; idx = 0; while (nLeft > 0) { ret = recv(s, &recvbuff[idx], nLeft, 0); if (ret == SOCKET_ERROR) { // Error } idx += ret; nLeft -= ret; }

Things get a little complicated if your message sizes vary. It is necessary to impose your own protocol to let the receiver know how big the forthcoming message will be. For example, the first 4 bytes written to the receiver will always be the integer size in bytes of the forthcoming message. The receiver will start every read by looking at the first 4 bytes, converting them to an integer, and determining how many additional bytes that message comprises.

Scatter-Gather I/O
Scatter-gather support is a concept originally introduced in Berkeley Sockets with the functions recv and writev. This feature is available with the Winsock 2 functions WSARecv, WSARecvFrom, WSASend, and WSASendTo. It is most useful for applications that send and receive data that is formatted in a very specific way. For example, messages from a client to a server might always be composed of a fixed 32-byte header specifying some operation, followed by a 64-byte data block and terminated with a 16-byte trailer. In this example, WSASend can be called with an array of three WSABUF structures, each corresponding to the three message types. On the receiving end, WSARecv is called with three WSABUF structures, each containing data buffers of 32 bytes, 64 bytes, and 16 bytes.

When using stream-based sockets, scatter-gather operations simply treat the supplied data buffers in the WSABUF structures as one contiguous buffer. Also, the receive call might return before all buffers are full. On message-based sockets, each call to a receive operation receives a single message up to the buffer size supplied. If the buffer space is insufficient, the call fails with WSAEMSGSIZE and the data is truncated to fit the available space. Of course, with protocols that support partial messages, the MSG_PARTIAL flag can be used to prevent data loss.

Breaking the Connection

Once you are finished with a socket connection, you must close the connection and release any resources associated with that socket handle. To actually release the resources associated with an open socket handle, use the closesocket call. Be aware, however, that closesocket can have some adverse affects—depending on how it is called—that can lead to data loss. For this reason, a connection should be gracefully terminated with the shutdown function before a call to the closesocket function. These two API functions are discussed next.

shutdown

To ensure that all data an application sends is received by the peer, a well-written application should notify the receiver that no more data is to be sent. Likewise, the peer should do the same. This is known as a graceful close and is performed by the shutdown function, defined as

 int shutdown( SOCKET s, int how );

The how parameter can be SD_RECEIVE, SD_SEND, or SD_BOTH. For SD_RECEIVE, subsequent calls to any receive function on the socket are disallowed. This has no effect on the lower protocol layers. Additionally for TCP sockets, if data is queued for receive or if data subsequently arrives, the connection is reset. However, on UDP sockets incoming data is still accepted and queued. For SD_SEND, subsequent calls to any send function are disallowed. For TCP sockets, this causes a FIN packet to be generated after all data is sent and acknowledged by the receiver. Finally, specifying SD_BOTH disables both sends and receives.

closesocket

The closesocket function closes a socket and is defined as

 int closesocket (SOCKET s);

Calling closesocket releases the socket descriptor and any further calls using the socket fail with WSAENOTSOCK. If there are no other references to this socket, all resources associated with the descriptor are released. This includes discarding any queued data.

Pending asynchronous calls issued by any thread in this process are canceled without posting any notification messages. Pending overlapped operations are also canceled. Any event, completion routine, or completion port that is associated with the overlapped operation is performed but will fail with the error WSA_OPERATION_ABORTED. Asynchronous and nonblocking I/O models are discussed in greater depth in Chapter 8. Additionally, one other factor influences the behavior of closesocket: whether the socket option SO_LINGER has been set. Consult the description for the SO_LINGER option in Chapter 9 for a complete explanation.

Putting It All Together

You might be a bit overwhelmed by the multitude of functions for sending and receiving data, but in reality most applications only need either recv or WSARecv for receiving data and either send or WSASend for sending. The other functions are specialized with unique features not commonly used (or supported by the transport protocols). With this said, we'll discuss a simple client/server example using the principles and functions we've covered so far. Figure 7-3 contains the code for a simple echo server. This application creates a socket, binds to a local IP interface and port, and listens for client connections. Upon receipt of a client connection request, a new socket is created that is passed into a client thread that is spawned. The thread simply reads data and sends it back to the client.

Figure 7-3. Echo server code

 // Module Name: Server.c // // Description: // This example illustrates a simple TCP server that accepts // incoming client connections. Once a client connection is // established, a thread is spawned to read data from the // client and echo it back (if the echo option is not // disabled). // // Compile: // cl -o Server Server.c ws2_32.lib // // Command line options: // server [-p:x] [-i:IP] [-o] // -p:x Port number to listen on // -i:str Interface to listen on // -o Receive only; don't echo the data back // #include <winsock2.h> #include <stdio.h> #include <stdlib.h> #define DEFAULT_PORT 5150 #define DEFAULT_BUFFER 4096 int iPort = DEFAULT_PORT; // Port to listen for clients on BOOL bInterface = FALSE, // Listen on the specified interface bRecvOnly = FALSE; // Receive data only; don't echo back char szAddress[128]; // Interface to listen for clients on // // Function: usage // // Description: // Print usage information and exit // void usage() { printf("usage: server [-p:x] [-i:IP] [-o]\n\n"); printf(" -p:x Port number to listen on\n"); printf(" -i:str Interface to listen on\n"); printf(" -o Don't echo the data back\n\n"); ExitProcess(1); } // // Function: ValidateArgs // // Description: // Parse the command line arguments, and set some global flags // to indicate what actions to perform // void ValidateArgs(int argc, char **argv) { int i; for(i = 1; i < argc; i++) { if ((argv[i][0] == '-') || (argv[i][0] == '/')) { switch (tolower(argv[i][1])) { case 'p': iPort = atoi(&argv[i][3]); break; case 'i': bInterface = TRUE; if (strlen(argv[i]) > 3) strcpy(szAddress, &argv[i][3]); break; case 'o': bRecvOnly = TRUE; break; default: usage(); break; } } } } // // Function: ClientThread // // Description: // This function is called as a thread, and it handles a given // client connection. The parameter passed in is the socket // handle returned from an accept() call. This function reads // data from the client and writes it back. // DWORD WINAPI ClientThread(LPVOID lpParam) { SOCKET sock=(SOCKET)lpParam; char szBuff[DEFAULT_BUFFER]; int ret, nLeft, idx; while(1) { // Perform a blocking recv() call // ret = recv(sock, szBuff, DEFAULT_BUFFER, 0); if (ret == 0) // Graceful close break; else if (ret == SOCKET_ERROR) { printf("recv() failed: %d\n", WSAGetLastError()); break; } szBuff[ret] = '\0'; printf("RECV: '%s'\n", szBuff); // // If we selected to echo the data back, do it // if (!bRecvOnly) { nLeft = ret; idx = 0; // // Make sure we write all the data // while(nLeft > 0) { ret = send(sock, &szBuff[idx], nLeft, 0); if (ret == 0) break; else if (ret == SOCKET_ERROR) { printf("send() failed: %d\n", WSAGetLastError()); break; } nLeft -= ret; idx += ret; } } } return 0; } // // Function: main // // Description: // Main thread of execution. Initialize Winsock, parse the // command line arguments, create the listening socket, bind // to the local address, and wait for client connections. // int main(int argc, char **argv) { WSADATA wsd; SOCKET sListen, sClient; int iAddrSize; HANDLE hThread; DWORD dwThreadId; struct sockaddr_in local, client; ValidateArgs(argc, argv); if (WSAStartup(MAKEWORD(2,2), &wsd) != 0) { printf("Failed to load Winsock!\n"); return 1; } // Create our listening socket // sListen = socket(AF_INET, SOCK_STREAM, IPPROTO_IP); if (sListen == SOCKET_ERROR) { printf("socket() failed: %d\n", WSAGetLastError()); return 1; } // Select the local interface, and bind to it // if (bInterface) { local.sin_addr.s_addr = inet_addr(szAddress); if (local.sin_addr.s_addr == INADDR_NONE) usage(); } else local.sin_addr.s_addr = htonl(INADDR_ANY); local.sin_family = AF_INET; local.sin_port = htons(iPort); if (bind(sListen, (struct sockaddr *)&local, sizeof(local)) == SOCKET_ERROR) { printf("bind() failed: %d\n", WSAGetLastError()); return 1; } listen(sListen, 8); // // In a continuous loop, wait for incoming clients. Once one // is detected, create a thread and pass the handle off to it. // while (1) { iAddrSize = sizeof(client); sClient = accept(sListen, (struct sockaddr *)&client, &iAddrSize); if (sClient == INVALID_SOCKET) { printf("accept() failed: %d\n", WSAGetLastError()); break; } printf("Accepted client: %s:%d\n", inet_ntoa(client.sin_addr), ntohs(client.sin_port)); hThread = CreateThread(NULL, 0, ClientThread, (LPVOID)sClient, 0, &dwThreadId); if (hThread == NULL) { printf("CreateThread() failed: %d\n", GetLastError()); break; } CloseHandle(hThread); } closesocket(sListen); WSACleanup(); return 0; }

The client for this example, provided in Figure 7-4, is even more basic. The client creates a socket, resolves the server name passed into the application, and connects to the server. Once the connection is made, a number of messages are sent. After each send, the client waits for an echo response from the server. The client prints all data read from the socket.

The echo client and server don't fully illustrate the streaming nature of TCP. This is because a read operation follows every write operation, at least in the client's case. Of course, it is the other way around for the server. Thus, each call to the read function by the server will almost always return the full message that the client sent. Don't be misled by this. If the client's messages become large enough to exceed the maximum transmission unit for TCP, the message will be broken up into separate packets on the wire, in which case the receiver needs to perform a receive call multiple times. In order to better illustrate streaming, run the client and the server with the -o option. This causes the client to only send data and the receiver to only read data. Execute the server like this:

 server -p:5150 -o

and the client like this:

 client -p:5150 -s:IP -n:10 -o

What you'll most likely see is that the client calls send 10 times, but the server reads all 10 messages in one or two recv calls.

Figure 7-4. Echo client code

 // Module Name: Client.c // // Description: // This sample is the echo client. It connects to the TCP server, // sends data, and reads data back from the server. // // Compile: // cl -o Client Client.c ws2_32.lib // // Command Line Options: // client [-p:x] [-s:IP] [-n:x] [-o] // -p:x Remote port to send to // -s:IP Server's IP address or host name // -n:x Number of times to send message // -o Send messages only; don't receive // #include <winsock2.h> #include <stdio.h> #include <stdlib.h> #define DEFAULT_COUNT 20 #define DEFAULT_PORT 5150 #define DEFAULT_BUFFER 2048 #define DEFAULT_MESSAGE "This is a test of the emergency \ broadcasting system" char szServer[128], // Server to connect to szMessage[1024]; // Message to send to sever int iPort = DEFAULT_PORT; // Port on server to connect to DWORD dwCount = DEFAULT_COUNT; // Number of times to send message BOOL bSendOnly = FALSE; // Send data only; don't receive // // Function: usage: // // Description: // Print usage information and exit // void usage() { printf("usage: client [-p:x] [-s:IP] [-n:x] [-o]\n\n"); printf(" -p:x Remote port to send to\n"); printf(" -s:IP Server's IP address or host name\n"); printf(" -n:x Number of times to send message\n"); printf(" -o Send messages only; don't receive\n"); ExitProcess(1); } // // Function: ValidateArgs // // Description: // Parse the command line arguments, and set some global flags // to indicate what actions to perform // void ValidateArgs(int argc, char **argv) { int i; for(i = 1; i < argc; i++) { if ((argv[i][0] == '-') || (argv[i][0] == '/')) { switch (tolower(argv[i][1])) { case 'p': // Remote port if (strlen(argv[i]) > 3) iPort = atoi(&argv[i][3]); break; case 's': // Server if (strlen(argv[i]) > 3) strcpy(szServer, &argv[i][3]); break; case 'n': // Number of times to send message if (strlen(argv[i]) > 3) dwCount = atol(&argv[i][3]); break; case 'o': // Only send message; don't receive bSendOnly = TRUE; break; default: usage(); break; } } } } // // Function: main // // Description: // Main thread of execution. Initialize Winsock, parse the // command line arguments, create a socket, connect to the // server, and then send and receive data // int main(int argc, char **argv) { WSADATA wsd; SOCKET sClient; char szBuffer[DEFAULT_BUFFER]; int ret, i;   struct sockaddr_in server; struct hostent *host = NULL; // Parse the command line, and load Winsock // ValidateArgs(argc, argv); if (WSAStartup(MAKEWORD(2,2), &wsd) != 0) { printf("Failed to load Winsock library!\n"); return 1; } strcpy(szMessage, DEFAULT_MESSAGE); // // Create the socket, and attempt to connect to the server // sClient = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); if (sClient == INVALID_SOCKET) { printf("socket() failed: %d\n", WSAGetLastError()); return 1; } server.sin_family = AF_INET; server.sin_port = htons(iPort); server.sin_addr.s_addr = inet_addr(szServer); // // If the supplied server address wasn't in the form // "aaa.bbb.ccc.ddd," it's a host name, so try to resolve it // if (server.sin_addr.s_addr == INADDR_NONE) { host = gethostbyname(szServer); if (host == NULL) { printf("Unable to resolve server: %s\n", szServer); return 1; } CopyMemory(&server.sin_addr, host->h_addr_list[0], host->h_length); } if (connect(sClient, (struct sockaddr *)&server, sizeof(server)) == SOCKET_ERROR) { printf("connect() failed: %d\n", WSAGetLastError()); return 1; } // Send and receive data // for(i = 0; i < dwCount; i++) { ret = send(sClient, szMessage, strlen(szMessage), 0); if (ret == 0) break; else if (ret == SOCKET_ERROR) { printf("send() failed: %d\n", WSAGetLastError()); break; } printf("Send %d bytes\n", ret); if (!bSendOnly) { ret = recv(sClient, szBuffer, DEFAULT_BUFFER, 0); if (ret == 0) // Graceful close break; else if (ret == SOCKET_ERROR) { printf("recv() failed: %d\n", WSAGetLastError()); break; } szBuffer[ret] = '\0'; printf("RECV [%d bytes]: '%s'\n", ret, szBuffer); } } closesocket(sClient); WSACleanup(); return 0; }