Socket IO Models | Linux Server Hacks, Volume Two: Tips & Tools for Connecting, Monitoring, and Troubleshooting

Essentially five types of socket I/O models are available that allow Winsock applications to manage I/O: select, WSAAsyncSelect, WSAEventSelect, overlapped, and completion port. This section explains the features of each I/O model and outlines how to use the model to develop an application that can manage one or more socket requests. On the companion CD, you will find one or more sample applications for each I/O model demonstrating how to develop a simple TCP echo server using the principles described in each model.

The select Model

The select model is the most widely available I/O model in Winsock. We call it the select model because it centers on using the select function to manage I/O. The design of this model originated on Unix-based computers featuring Berkeley socket implementations. The select model was incorporated into Winsock 1.1 to allow applications that want to avoid blocking on socket calls the ability to manage multiple sockets in an organized manner. Because Winsock 1.1 is backward-compatible with Berkeley socket implementations, a Berkeley socket application that uses the select function should technically be able to run without modification.

The select function can be used to determine whether there is data on a socket and whether a socket can be written to. The whole reason for having this function is to prevent your application from blocking on an I/O bound call such as send or recv when a socket is in a blocking mode and to prevent the WSAEWOULDBLOCK error when a socket is in nonblocking mode. The select function blocks for I/O operations until the conditions specified as parameters are met. The function prototype for select is as follows:

 int select(     int nfds,     fd_set FAR * readfds,     fd_set FAR * writefds,     fd_set FAR * exceptfds,     const struct timeval FAR * timeout );

The first parameter, nfds, is ignored and is included only for compatibility with Berkeley socket applications. You'll notice that there are three fd_set parameters: one for checking readability (readfds), one for writability (writefds), and one for out-of-band data (exceptfds). Essentially, the fd_set data type represents a collection of sockets. The readfds set identifies sockets that meet one of the following conditions:

Data is available for reading.

Connection has been closed, reset, or terminated.

If listen has been called and a connection is pending, the accept function will succeed.

The writefds set identifies sockets in which one of the following is true:

Data can be sent.

If a nonblocking connect call is being processed, the connection has succeeded.

Finally, the exceptfds set identifies sockets in which one of the following is true:

If a nonblocking connect call is being processed, the connection attempt failed.

Out-of-band (OOB) data is available for reading.

For example, when you want to test a socket for readability, you must add your socket to the readfds set and wait for the select function to complete. When the select call completes, you have to determine whether your socket is still part of the readfds set. If so, the socket is readable—you can begin to retrieve data from the socket. Any two of the three parameters (readfds, writefds, exceptfds) can be null values (at least one must not be null), and any non-null set must contain at least one socket handle; otherwise, the select function won't have anything to wait for. The final parameter, timeout, is a pointer to a timeval structure that determines how long the select function will wait for I/O to complete. If timeout is a null pointer, select will block indefinitely until at least one descriptor meets the specified criteria. The timeval structure is defined as

 struct timeval  {     long tv_sec;      long tv_usec; };

The tv_sec field indicates how long to wait in seconds; the tv_usec field indicates how long to wait in milliseconds. The timeout value {0, 0} indicates select will return immediately, allowing an application to poll on the select operation. This should be avoided for performance reasons. When select completes successfully, it returns the total number of socket handles that have I/O operations pending in the fd_set structures. If the timeval limit expires, it returns 0. If select fails for any reason, it returns SOCKET_ERROR.

Before you can begin to use select to monitor sockets, your application has to set up either one or all of the read, write, and exception fd_set structures by assigning socket handles to a set. When you assign a socket to one of the sets, you are asking select to let you know whether the I/O activities described above have occurred on a socket. Winsock provides the following set of macros to manipulate and check the fd_set sets for I/O activity.

FD_CLR(s, *set) Removes socket s from set

FD_ISSET(s, *set) Checks to see whether s is a member of set and returns TRUE if so

FD_SET(s, *set) Adds socket s to set

FD_ZERO(*set) Initializes set to the empty set

For example, if you want to find out when it is safe to read data from a socket without blocking, simply assign your socket to the fd_read set using the FD_SET macro and then call select. To test whether your socket is still part of the fd_read set, use the FD_ISSET macro. The following steps describe the basic flow of an application that uses select with one or more socket handles:

Initialize each fd_set of interest, using the FD_ZERO macro.

Assign socket handles to each of the fd_set sets of interest, using the FD_SET macro.

Call the select function, and wait until I/O activity sets one or more of the socket handles in each fd_set set provided. When select completes, it returns the total number of socket handles that are set in all of the fd_set sets and updates each set accordingly.

Using the return value of select, your application can determine which application sockets have I/O pending by checking each fd_set set using the FD_ISSET macro.

After determining which sockets have I/O pending in each of the sets, process the I/O and go to step 1 to continue the select process.

When select returns, it modifies each of the fd_set structures by removing the socket handles that do not have pending I/O operations. This is why you should use the FD_ISSET macro as in step 4 above to determine whether a particular socket is part of a set. Figure 8-4 outlines the basic steps needed to set up the select model for a single socket. Adding more sockets to this application simply involves maintaining a list or an array of additional sockets.

Figure 8-4. Managing I/O on a socket using select

 SOCKET  s; fd_set  fdread; int     ret; // Create a socket, and accept a connection // Manage I/O on the socket while(TRUE) {     // Always clear the read set before calling      // select()     FD_ZERO(&fdread);     // Add socket s to the read set     FD_SET(s, &fdread);     if ((ret = select(0, &fdread, NULL, NULL, NULL))          == SOCKET_ERROR)      {         // Error condition     }     if (ret > 0)     {         // For this simple case, select() should return         // the value 1. An application dealing with          // more than one socket could get a value          // greater than 1. At this point, your          // application should check to see whether the          // socket is part of a set.         if (FD_ISSET(s, &fdread))         {             // A read event has occurred on socket s         }     } }

The WSAAsyncSelect Model

Winsock provides a useful asynchronous I/O model that allows an application to receive Windows message-based notification of network events on a socket. This is accomplished by calling the WSAAsyncSelect function after creating a socket. This model originally existed in Winsock 1.1 implementations to help application programmers cope with the cooperative multitasking message-based environment of 16-bit Windows platforms, such as Windows for Workgroups. Applications can still benefit from this model, especially if they manage window messages in a standard Windows procedure, normally referred to as a winproc. This model is also used by the Microsoft Foundation Class (MFC) CSocket object.

Message notification

To use the WSAAsyncSelect model, your application must first create a window using the CreateWindow function and supply a window procedure (winproc) support function for this window. You can also use a dialog box with a dialog procedure instead of a window because dialog boxes are windows. For our purposes, we will demonstrate this model using a simple window with a supporting window procedure. Once you have set up the window infrastructure, you can begin creating sockets and turning on window message notification by calling the WSAAsyncSelect function, which is defined as

 int WSAAsyncSelect(     SOCKET s,     HWND hWnd,     unsigned int wMsg,     long lEvent );

The s parameter represents the socket we are interested in. The hWnd parameter is a window handle identifying the window or the dialog box that receives a message when a network event occurs. The wMsg parameter identifies the message to be received when a network event occurs. This message is posted to the window that is identified by the hWnd window handle. Normally applications set this message to a value greater than the Windows WM_USER value to avoid confusing a network window message with a predefined standard window message. The last parameter, lEvent, represents a bitmask that specifies a combination of network events—listed in Table 8-3—that the application is interested in. Most applications are typically interested in the FD_READ, FD_WRITE, FD_ACCEPT, FD_CONNECT, and FD_CLOSE network event types. Of course, the use of the FD_ACCEPT or the FD_CONNECT type depends on whether your application is a client or a server. If your application is interested in more than one network event, simply set this field by performing a bitwise OR on the types and assigning them to lEvent. For example:

 WSAAsyncSelect(s, hwnd, WM_SOCKET,      FD_CONNECT | FD_READ | FD_WRITE | FD_CLOSE);

This allows our application to get connect, send, receive, and socket-closure network event notifications on socket s. It is impossible to register multiple events one at a time on the socket. Also note that once you turn on event notification on a socket, it remains on unless the socket is closed by a call to closesocket or the application changes the registered network event types by calling WSAAsyncSelect (again, on the socket). Setting the lEvent parameter to 0 effectively stops all network event notification on the socket.

When your application calls WSAAsyncSelect on a socket, the socket mode is automatically changed from blocking to the nonblocking mode that we described earlier. As a result, if a Winsock I/O call such as WSARecv is called and has to wait for data, it will fail with error WSAEWOULDBLOCK. To avoid this error, applications should rely on the user-defined window message specified in the wMsg parameter of WSAAsyncSelect to indicate when network event types occur on the socket.

Table 8-3. Network event types for the WSAAsyncSelect function

Event Type	Meaning
FD_READ	The application wants to receive notification of readiness for reading.
FD_WRITE	The application wants to receive notification of readiness for writing.
FD_OOB	The application wants to receive notification of the arrival of out-of-band (OOB) data.
FD_ACCEPT	The application wants to receive notification of incoming connections.
FD_CONNECT	The application wants to receive notification of a completed connection or a multipoint join operation.
FD_CLOSE	The application wants to receive notification of socket closure.
FD_QOS	The application wants to receive notification of socket Quality of Service (QOS) changes.
FD_GROUP_QOS	The application wants to receive notification of socket group Quality of Service (QOS) changes (reserved for future use with socket groups).
FD_ROUTING_INTERFACE_CHANGE	The application wants to receive notification of routing interface changes for the specified destination(s).
FD_ADDRESS_LIST_CHANGE	The application wants to receive notification of local address list changes for the socket's protocol family.

After your application successfully calls WSAAsyncSelect on a socket, the application begins to receive network event notification as Windows messages in the window procedure associated with the hWnd parameter window handle. A window procedure is normally defined as

 LRESULT CALLBACK WindowProc(     HWND hWnd,     UINT uMsg,     WPARAM wParam,     LPARAM lParam );

The hWnd parameter is a handle to the window that invoked the window procedure. The uMsg parameter indicates which message needs to be processed. In our case, we will be looking for the message defined in the WSAAsyncSelect call. The wParam parameter identifies the socket on which a network event has occurred. This is important if you have more than one socket assigned to this window procedure. The lParam parameter contains two important pieces of information—the low word of lParam specifies the network event that has occurred, and the high word of lParam contains any error code.

When network event messages arrive at a window procedure, the application should first check the lParam high-word bits to determine whether a network error has occurred on the socket. There is a special macro, WSAGETSELECTERROR, which returns the value of the high-word bits error information. After the application has verified that no error occurred on the socket, the application should determine which network event type caused the Windows message to fire by reading the low-word bits of lParam. Another special macro, WSAGETSELECTEVENT, returns the value of the low-word portion of lParam.

Figure 8-5 demonstrates how to manage window messages when using the WSAAsyncSelect I/O model. The figure highlights the steps needed to develop a basic server application and removes the programming details of developing a fully featured Windows application.

Figure 8-5. WSAAsyncSelect server sample code

 #define WM_SOCKET WM_USER + 1 #include <windows.h> int WINAPI WinMain(HINSTANCE hInstance,      HINSTANCE hPrevInstance, LPSTR lpCmdLine,     int nCmdShow) {     SOCKET Listen;     HWND Window;     // Create a window and assign the ServerWinProc     // below to it     Window = CreateWindow();     // Start Winsock and create a socket     WSAStartup(...);     Listen = Socket();     // Bind the socket to port 5150     // and begin listening for connections     InternetAddr.sin_family = AF_INET;     InternetAddr.sin_addr.s_addr = htonl(INADDR_ANY);     InternetAddr.sin_port = htons(5150);     bind(Listen, (PSOCKADDR) &InternetAddr,         sizeof(InternetAddr));     // Set up window message notification on     // the new socket using the WM_SOCKET define     // above     WSAAsyncSelect(Listen, Window, WM_SOCKET,         FD_ACCEPT | FD_CLOSE);     listen(Listen, 5);     // Translate and dispatch window messages     // until the application terminates } BOOL CALLBACK ServerWinProc(HWND hDlg,WORD wMsg,     WORD wParam, DWORD lParam) {     SOCKET Accept;     switch(wMsg)     {         case WM_PAINT:             // Process window paint messages             break;         case WM_SOCKET:             // Determine whether an error occurred on the             // socket by using the WSAGETSELECTERROR() macro             if (WSAGETSELECTERROR(lParam))             {                 // Display the error and close the socket                 closesocket(wParam);                 break;             }             // Determine what event occurred on the             // socket             switch(WSAGETSELECTEVENT(lParam))             {                 case FD_ACCEPT:                     // Accept an incoming connection                     Accept = accept(wParam, NULL, NULL);                     // Prepare accepted socket for read,                     // write, and close notification                     WSAAsyncSelect(Accept, hwnd, WM_SOCKET,                         FD_READ | FD_WRITE | FD_CLOSE);                     break;                 case FD_READ:                     // Receive data from the socket in                     // wParam                     break;                 case FD_WRITE:                     // The socket in wParam is ready                     // for sending data                     break;                 case FD_CLOSE:                     // The connection is now closed                     closesocket(wParam);                     break;             }             break;     }     return TRUE; }

One final detail worth noting is how applications should process FD_WRITE event notifications. FD_WRITE notifications are sent under only three conditions:

After a socket is first connected with connect or WSAConnect

After a socket is accepted with accept or WSAAccept

When a send, WSASend, sendto, or WSASendTo operation fails with WSAEWOULDBLOCK and buffer space becomes available

Therefore, an application should assume that sends are always possible on a socket starting from the first FD_WRITE message and lasting until a send, WSASend, sendto, or WSASendTo returns the socket error WSAEWOULDBLOCK. After such failure, another FD_WRITE message notifies the application that sends are once again possible.

The WSAEventSelect Model

Winsock provides another useful asynchronous I/O model that is similar to the WSAAsyncSelect model that allows an application to receive event-based notification of network events on one or more sockets. This model is similar to the WSAAsyncSelect model in that your application receives and processes the same network events listed in Table 8-3 that the WSAAsyncSelect model uses. The major difference with this model is that network events are posted to an event object handle instead of a window procedure.

Event notification

The event notification model requires your application to create an event object for each socket used by calling the WSACreateEvent function, which is defined as

 WSAEVENT WSACreateEvent(void);

The WSACreateEvent function simply returns an event object handle. Once you have an event object handle, you have to associate it with a socket and register the network event types of interest, as shown in Table 8-3. This is accomplished by calling the WSAEventSelect function, which is defined as

 int WSAEventSelect(     SOCKET s,     WSAEVENT hEventObject,     long lNetworkEvents );

The s parameter represents the socket of interest. The hEventObject parameter represents the event object—obtained with WSACreateEvent—to associate with the socket. The last parameter, lNetworkEvents, represents a bitmask that specifies a combination of network event types (listed in Table 8-3) that the application is interested in. For a detailed discussion of these event types, see the WSAAsyncSelect I/O model discussed earlier.

The event created for WSAEventSelect has two operating states and two operating modes. The operating states are known as signaled and nonsignaled. The operating modes are known as manual reset and auto reset. WSACreateEvent initially creates event handles in a nonsignaled operating state with a manual reset operating mode. As network events trigger an event object associated with a socket, the operating state changes from nonsignaled to signaled. Because the event object is created in a manual reset mode, your application is responsible for changing the operating state from signaled to nonsignaled after processing an I/O request. This can be accomplished by calling the WSAResetEvent function, which is defined as

 BOOL WSAResetEvent(WSAEVENT hEvent);

The function takes an event handle as its only parameter and returns TRUE or FALSE based on the success or failure of the call. When an application is finished with an event object, it should call the WSACloseEvent function to free the system resources used by an event handle. The WSACloseEvent function is defined as

 BOOL WSACloseEvent(WSAEVENT hEvent);

This function also takes an event handle as its only parameter and returns TRUE if successful or FALSE if the call fails.

Once a socket is associated with an event object handle, the application can begin processing I/O by waiting for network events to trigger the operating state of the event object handle. The WSAWaitForMultipleEvents function is designed to wait on one or more event object handles and returns either when one or all of the specified handles are in the signaled state or when a specified timeout interval expires. WSAWaitForMultipleEvents is defined as

 DWORD WSAWaitForMultipleEvents(     DWORD cEvents,     const WSAEVENT FAR * lphEvents,     BOOL fWaitAll,     DWORD dwTimeout,     BOOL fAlertable );

The cEvents and lphEvents parameters define an array of WSAEVENT objects in which cEvents represents the number of event objects in the array and lphEvents is a pointer to the array. WSAWaitForMultipleEvents can support only a maximum of WSA_MAXIMUM_WAIT_EVENTS objects, which is defined as 64. Therefore, this I/O model is capable of supporting only a maximum of 64 sockets at a time for each thread that makes the WSAWaitForMultipleEvents call. If you need to have this model manage more than 64 sockets, you should create additional worker threads to wait on more event objects. The fWaitAll parameter specifies how WSAWaitForMultipleEvents waits for objects in the event array. If TRUE, the function returns when all event objects in the lphEvents array are signaled. If FALSE, the function returns when any one of the event objects is signaled. In the latter case, the return value indicates which event object caused the function to return. Typically, applications set this parameter to FALSE and service one socket event at a time. The dwTimeout parameter specifies how long (in milliseconds) WSAWaitForMultipleEvents will wait for a network event to occur. The function returns if the interval expires, even if conditions specified by the fWaitAll parameter are not satisfied. If the timeout value is 0, the function tests the state of the specified event objects and returns immediately, which effectively allows an application to poll on the event objects. Setting the timeout value to 0 should be avoided for performance reasons. If no events are ready for processing, WSAWaitForMultipleEvents returns WSA_WAIT_TIMEOUT. If dwsTimeout is set to WSA_INFINITE, the function returns only when a network event signals an event object. The final parameter, fAlertable, can be ignored when you're using the WSAEventSelect model and should be set to FALSE. It is intended for use in processing completion routines in the overlapped I/O model, which will be described later in this chapter.

When WSAWaitForMultipleEvents receives network event notification of an event object, it returns a value indicating the event object that caused the function to return. As a result, your application can determine which network event type is available on a particular socket by referencing the signaled event in the event array and matching it with the socket associated with the event. When you reference the events in the event array, you should reference them using the return value of WSAWaitForMultipleEvents minus the predefined value WSA_WAIT_EVENT_0. For example:

 Index = WSAWaitForMultipleEvents(...); MyEvent = EventArray[Index _ WSA_WAIT_EVENT_0];

Once you have the socket that caused the network event, you can determine which network events are available by calling the WSAEnumNetworkEvents function, which is defined as

 int WSAEnumNetworkEvents(     SOCKET s,     WSAEVENT hEventObject,     LPWSANETWORKEVENTS lpNetworkEvents );

The s parameter represents the socket that caused the network event. The hEventObject parameter is an optional parameter representing an event handle identifying an associated event object to be reset. Since our event object is in a signaled state, we can pass it in and it will be set to a nonsignaled state. If you don't want to use the hEventObject parameter for resetting events, you can use the WSAResetEvent function, which we described earlier. The final parameter, lpNetworkEvents, takes a pointer to a WSANETWORKEVENTS structure, which is used to retrieve network event types that occurred on the socket and any associated error codes. The WSANETWORKEVENTS structure is defined as

 typedef struct _WSANETWORKEVENTS {     long lNetworkEvents;     int  iErrorCode[FD_MAX_EVENTS]; } WSANETWORKEVENTS, FAR * LPWSANETWORKEVENTS;

The lNetworkEvents parameter is a value that indicates all the network event types (see Table 8-3) that have occurred on the socket.

NOTE
More than one network event type can occur when an event is signaled. For example, a busy server application might receive FD_READ and FD_WRITE notification at the same time.

The iErrorCode parameter is an array of error codes that are associated with the events in lNetworkEvents. For each network event type, there exists a special event index similar to the event type names—except for an additional "_BIT" string appended to the event name. For example, for the FD_READ event type, the index identifier for the iErrorCode array is named FD_READ_BIT. The following code fragment demonstrates this for an FD_READ event:

 // Process FD_READ notification if (NetworkEvents.lNetworkEvents & FD_READ) {     if (NetworkEvents.iErrorCode[FD_READ_BIT] != 0)     {        printf("FD_READ failed with error %d\n",             NetworkEvents.iErrorCode[FD_READ_BIT]);     } }

After you process the events in the WSANETWORKEVENTS structure, your application should continue waiting for more network events on all of the available sockets. Figure 8-6 demonstrates how to develop a server and manage event objects when using the WSAEventSelect I/O model. The figure highlights the steps needed to develop a basic server application capable of managing one or more sockets at a time.

Figure 8-6. WSAEventSelect I/O model server sample code

 SOCKET Socket[WSA_MAXIMUM_WAIT_EVENTS]; WSAEVENT Event[WSA_MAXIMUM_WAIT_EVENTS]; SOCKET Accept, Listen; DWORD EventTotal = 0; DWORD Index; // Set up a TCP socket for listening on port 5150 Listen = socket (PF_INET, SOCK_STREAM, 0); InternetAddr.sin_family = AF_INET; InternetAddr.sin_addr.s_addr = htonl(INADDR_ANY); InternetAddr.sin_port = htons(5150); bind(Listen, (PSOCKADDR) &InternetAddr,     Sizeof(InternetAddr)); NewEvent = WSACreateEvent(); WSAEventSelect(Listen, NewEvent,     FD_ACCEPT | FD_CLOSE); listen(Listen, 5); Socket[EventTotal] = Listen; Event[EventTotal] = NewEvent; EventTotal++; while(TRUE) {     // Wait for network events on all sockets     Index = WSAWaitForMultipleEvents(EventTotal,         EventArray, FALSE, WSA_INFINITE, FALSE);       WSAEnumNetworkEvents(         SocketArray[Index - WSA_WAIT_EVENT_0],         EventArray[Index - WSA_WAIT_EVENT_0],          &NetworkEvents);     // Check for FD_ACCEPT messages          if (NetworkEvents.lNetworkEvents & FD_ACCEPT)     {         if (NetworkEvents.iErrorCode[FD_ACCEPT_BIT] != 0)         {             printf("FD_ACCEPT failed with error %d\n",                  NetworkEvents.iErrorCode[FD_ACCEPT_BIT]);             break;         }         // Accept a new connection, and add it to the         // socket and event lists         Accept = accept(             SocketArray[Index - WSA_WAIT_EVENT_0],             NULL, NULL);         // We cannot process more than          // WSA_MAXIMUM_WAIT_EVENTS sockets, so close         // the accepted socket         if (EventTotal > WSA_MAXIMUM_WAIT_EVENTS)         {             printf("Too many connections");             closesocket(Accept);             break;         }         NewEvent = WSACreateEvent();         WSAEventSelect(Accept, NewEvent,             FD_READ | FD_WRITE | FD_CLOSE);         Event[EventTotal] = NewEvent;         Socket[EventTotal] = Accept;         EventTotal++;         printf("Socket %d connected\n", Accept);     }     // Process FD_READ notification     if (NetworkEvents.lNetworkEvents & FD_READ)     {         if (NetworkEvents.iErrorCode[FD_READ_BIT] != 0)         {             printf("FD_READ failed with error %d\n",                  NetworkEvents.iErrorCode[FD_READ_BIT]);             break;         }         // Read data from the socket         recv(Socket[Index - WSA_WAIT_EVENT_0],             buffer, sizeof(buffer), 0);     }     // Process FD_WRITE notification     if (NetworkEvents.lNetworkEvents & FD_WRITE)     {         if (NetworkEvents.iErrorCode[FD_WRITE_BIT] != 0)         {             printf("FD_WRITE failed with error %d\n",                  NetworkEvents.iErrorCode[FD_WRITE_BIT]);             break;         }         send(Socket[Index - WSA_WAIT_EVENT_0],             buffer, sizeof(buffer), 0);     }     if (NetworkEvents.lNetworkEvents & FD_CLOSE)     {         if (NetworkEvents.iErrorCode[FD_CLOSE_BIT] != 0)         {             printf("FD_CLOSE failed with error %d\n",                  NetworkEvents.iErrorCode[FD_CLOSE_BIT]);             break;         }         closesocket(Socket[Index - WSA_WAIT_EVENT_0]);         // Remove socket and associated event from         // the Socket and Event arrays and decrement         // EventTotal         CompressArrays(Event, Socket, &EventTotal);     } }

The Overlapped Model

The overlapped I/O model in Winsock offers applications better system performance than any of the I/O models explained so far. The basic design of the overlapped model allows your application to post one or more Winsock I/O requests at a time using an overlapped data structure. At a later point, the application can service the submitted requests after they have completed. This model is available on all Windows platforms except Windows CE. The overall design of the model is based on the Win32 overlapped I/O mechanisms available for performing I/O operations on devices using the ReadFile and WriteFile functions.

Originally, the Winsock overlapped I/O model was available only to Winsock 1.1 applications running on Windows NT. Applications could take advantage of the model by calling ReadFile and WriteFile on a socket handle and specifying an overlapped structure that we will describe later. Since the release of Winsock 2, overlapped I/O has been incorporated into new Winsock functions, such as WSASend and WSARecv. As a result, the overlapped I/O model is now available on all Windows platforms that feature Winsock 2.

NOTE
With the release of Winsock 2, overlapped I/O can still be used with the functions ReadFile and WriteFile under Windows NT and Windows 2000. However, this functionality was not added to Windows 95 and Windows 98. For compatibility across platforms and for performance reasons, you should always consider using the WSARecv and WSASend functions instead of the Win32 ReadFile and WriteFile functions. This section will only describe how to use overlapped I/O through the new Winsock 2 functions.

To use the overlapped I/O model on a socket, you must first create a socket by using the flag WSA_FLAG_OVERLAPPED, as follows:

 s = WSASocket(AF_INET, SOCK_STREAM, 0, NULL, 0,     WSA_FLAG_OVERLAPPED);

If you create a socket using the socket function instead of the WSASocket function, WSA_FLAG_OVERLAPPED is implied. After you successfully create a socket and bind it to a local interface, overlapped I/O operations can commence by calling the Winsock functions listed below and specifying an optional WSAOVERLAPPED structure.

WSASend

WSASendTo

WSARecv

WSARecvFrom

WSAIoctl

AcceptEx

TransmitFile

As you probably already know, each one of these functions is associated with sending data, receiving data, and accepting connections on a socket. As a result, this activity can potentially take a long time to complete. This is why each function can accept a WSAOVERLAPPED structure as a parameter. When these functions are called with a WSAOVERLAPPED structure, they complete immediately—regardless of whether the socket is set to blocking mode (described at the beginning of this chapter). They rely on the WSAOVERLAPPED structure to manage the return of an I/O request. There are essentially two methods for managing the completion of an overlapped I/O request: your application can wait for event object notification, or it can process completed requests through completion routines. The functions listed above (except AcceptEx) have another parameter in common: lpCompletionROUTINE. This parameter is an optional pointer to a completion routine function that gets called when an overlapped request completes. We will explore the event notification method next. Later in this chapter, you will learn how to use optional completion routines instead of events to process completed overlapped requests.

Event notification

The event notification method of overlapped I/O requires associating Win32 event objects with WSAOVERLAPPED structures. When I/O calls such as WSASend and WSARecv are made using a WSAOVERLAPPED structure, they return immediately. Typically you will find that these I/O calls fail with the return value SOCKET_ERROR. The WSAGetLastError function reports a WSA_IO_PENDING error status. This error status simply means that the I/O operation is in progress. At some later time, your application will need to determine when an overlapped I/O request completes by waiting on the event object associated with the WSAOVERLAPPED structure. The WSAOVERLAPPED structure provides the communication medium between the initiation of an overlapped I/O request and its subsequent completion, and is defined as

 typedef struct WSAOVERLAPPED {      DWORD    Internal;     DWORD    InternalHigh;     DWORD    Offset;     DWORD    OffsetHigh;     WSAEVENT hEvent; } WSAOVERLAPPED, FAR * LPWSAOVERLAPPED;

The Internal, InternalHigh, Offset, and OffsetHigh fields are all used internally by the system and should not be manipulated or used directly by an application. The hEvent field, on the other hand, is a special field that allows an application to associate an event object handle with a socket. You might be wondering how to get an event object handle to assign to this field. As we described in the WSAEventSelect model, you can use the WSACreateEvent function to create an event object handle. Once an event handle is created, simply assign the overlapped structure's hEvent field to the event handle and begin calling a Winsock function—such as WSASend or WSARecv—using the overlapped structure.

When an overlapped I/O request finally completes, your application is responsible for retrieving the overlapped results. In the event notification method, Winsock will change the event-signaling state of an event object that is associated with a WSAOVERLAPPED structure from nonsignaled to signaled when an overlapped request finally completes. Because an event object is assigned to the WSAOVERLAPPED structure, you can easily determine when an overlapped I/O call completes by calling the WSAWaitForMultipleEvents function, which we also described in the WSAEventSelect I/O model. WSAWaitForMultipleEvents waits a specified amount of time for one or more event objects to become signaled. We can't stress this point enough: remember that WSAWaitForMultipleEvents is capable of waiting on only 64 event objects at a time. Once you determine which overlapped request has completed, you need to determine the success or failure of the overlapped call by calling WSAGetOverlappedResult, which is defined as

 BOOL WSAGetOverlappedResult(      SOCKET s,     LPWSAOVERLAPPED lpOverlapped,      LPDWORD lpcbTransfer,      BOOL fWait,      LPDWORD lpdwFlags );

The s parameter identifies the socket that was specified when the overlapped operation was started. The lpOverlapped parameter is a pointer to the WSAOVERLAPPED structure that was specified when the overlapped operation was started. The lpcbTransfer parameter is a pointer to a DWORD variable that receives the number of bytes that were actually transferred by an overlapped send or receive operation. The fWait parameter determines whether the function should wait for a pending overlapped operation to complete. If fWait is TRUE, the function does not return until the operation has been completed. If fWait is FALSE and the operation is still pending, WSAGetOverlappedResult returns FALSE with the error WSA_IO_INCOMPLETE. Since in our case we waited on a signaled event for overlapped completion, this parameter has no effect. The final parameter, lpdwFlags, is a pointer to a DWORD that will receive resulting flags if the originating overlapped call was made with the WSARecv or the WSARecvFrom function.

If the WSAGetOverlappedResult function succeeds, the return value is TRUE. This means that your overlapped operation has completed successfully and that the value pointed to by lpcbTransfer has been updated. If the return value is FALSE, one of the following statements is true:

The overlapped I/O operation is still pending (as described above).

The overlapped operation completed, but with errors.

The overlapped operation's completion status could not be determined because of errors in one or more of the parameters supplied to WSAGetOverlappedResult.

Upon failure, the value pointed to by lpcbTransfer will not be updated, and your application should call the WSAGetLastError function to determine the cause of the failure.

Figure 8-7 demonstrates how to structure a simple server application that is capable of managing overlapped I/O on one socket, using the event notification described above. The application outlines the following programming steps:

Create a socket, and begin listening for a connection on a specified port.

Accept an inbound connection.

Create a WSAOVERLAPPED structure for the accepted socket, and assign an event object handle to the structure. Also assign the event object handle to an event array to be used later by the WSAWaitForMultipleEvents function.

Post an asynchronous WSARecv request on the socket by specifying the WSAOVERLAPPED structure as a parameter.

NOTE
This function will normally fail with SOCKET_ERROR error status WSA_IO_PENDING.

Call WSAWaitForMultipleEvents using the event array, and wait for the event associated with the overlapped call to become signaled.

After WSAWaitForMultipleEvents completes, reset the event object by using WSAResetEvent with the event array, and process the completed overlapped request.

Determine the return status of the overlapped call by using WSAGetOverlappedResult.

Post another overlapped WSARecv request on the socket.

Repeat steps 5-8.

This example can easily be expanded to handle more than one socket by moving the overlapped I/O processing portion of the code to a separate thread and allowing the main application thread to service additional connection requests.

Figure 8-7. Simple overlapped example using events

 void main(void) {     WSABUF DataBuf;     DWORD EventTotal = 0;     WSAEVENT EventArray[WSA_MAXIMUM_WAIT_EVENTS];     WSAOVERLAPPED AcceptOverlapped;     SOCKET ListenSocket, AcceptSocket;     // Step 1:     //  Start Winsock and set up a listening socket     ...     // Step 2:     //  Accept an inbound connection     AcceptSocket = accept(ListenSocket, NULL, NULL);     // Step 3:     //  Set up an overlapped structure     EventArray[EventTotal] = WSACreateEvent();     ZeroMemory(&AcceptOverlapped,         sizeof(WSAOVERLAPPED));     AcceptOverlapped.hEvent = EventArray[EventTotal];     DataBuf.len = DATA_BUFSIZE;     DataBuf.buf = buffer;     EventTotal++;     // Step 4:     //  Post a WSARecv request to begin receiving data     //  on the socket     WSARecv(AcceptSocket, &DataBuf, 1, &RecvBytes,         &Flags, &AcceptOverlapped, NULL);     // Process overlapped receives on the socket.     while(TRUE)     {         // Step 5:         //  Wait for the overlapped I/O call to complete         Index = WSAWaitForMultipleEvents(EventTotal,             EventArray, FALSE, WSA_INFINITE, FALSE);         // Index should be 0 because we          // have only one event handle in EventArray         // Step 6:         //  Reset the signaled event         WSAResetEvent(             EventArray[Index - WSA_WAIT_EVENT_0]);         // Step 7:         //  Determine the status of the overlapped         //  request         WSAGetOverlappedResult(AcceptSocket,             &AcceptOverlapped, &BytesTransferred,             FALSE, &Flags);              // First check to see whether the peer has closed         // the connection, and if so, close the         // socket         if (BytesTransferred == 0)         {             printf("Closing socket %d\n", AcceptSocket);             closesocket(AcceptSocket);             WSACloseEvent(                 EventArray[Index - WSA_WAIT_EVENT_0]);             return;         }         // Do something with the received data.          // DataBuf contains the received data.         ...         // Step 8:         //  Post another WSARecv() request on the socket         Flags = 0;         ZeroMemory(&AcceptOverlapped,             sizeof(WSAOVERLAPPED));         AcceptOverlapped.hEvent = EventArray[Index -              WSA_WAIT_EVENT_0];         DataBuf.len = DATA_BUFSIZE;         DataBuf.buf = Buffer;         WSARecv(AcceptSocket, &DataBuf, 1,             &RecvBytes, &Flags, &AcceptOverlapped,             NULL);     } }

On Windows NT and Windows 2000, the overlapped I/O model also allows applications to accept connections in an overlapped fashion by calling the AcceptEx function on a listening socket. AcceptEx is a special Winsock 1.1 extension function that is available in the Mswsock.h header file and the Mswsock.lib library file. This function was originally intended to work with Win32 overlapped I/O on Windows NT and Windows 2000, but it also works with overlapped I/O in Winsock 2. AcceptEx is defined as

 BOOL AcceptEx (     SOCKET sListenSocket,     SOCKET sAcceptSocket,     PVOID lpOutputBuffer,     DWORD dwReceiveDataLength,     DWORD dwLocalAddressLength,     DWORD dwRemoteAddressLength,     LPDWORD lpdwBytesReceived,     LPOVERLAPPED lpOverlapped );

The sListenSocket parameter represents a listening socket. The sAcceptSocket parameter is a socket to accept an incoming connection. The AcceptEx function is different from the accept function in that you have to supply the accepted socket instead of having the function create it for you. Supplying the socket requires you to call the socket or WSASocket function to create a socket that you can pass to AcceptEx via the sAcceptSocket parameter. The lpOutputBuffer parameter is a special buffer because it receives three pieces of data: the local address of the server, the remote address of the client, and the first block of data sent on a new connection. The dwReceiveDataLength parameter specifies the number of bytes in lpOutputBuffer used for receiving data. If this parameter is specified as 0, no data will be received in conjunction with accepting the connection. The dwLocalAddressLength and dwRemoteAddressLength parameters represent how many bytes in lpOutputBuffer are reserved for storing local and remote address information when a socket is accepted. These buffer sizes must be at least 16 bytes more than the maximum address length for the transport protocol in use. For example, if you are using the TCP/IP protocol, the size should be set to the size of a SOCKADDR_IN structure + 16 bytes. The lpdwBytesReceived parameter returns the number of data bytes received. This parameter is set only if the operation completes synchronously. If the AcceptEx function returns ERROR_IO_PENDING, this parameter is never set and you must obtain the number of bytes read from the completion notification mechanism. The final parameter, lpOverlapped, is an OVERLAPPED structure that allows AcceptEx to be used in an asynchronous fashion. As we mentioned earlier, this function works with event object notification only in an overlapped application because it does not feature a completion routine parameter.

A Winsock extension function named GetAcceptExSockaddrs parses out the local and remote address elements from lpOutputBuffer. GetAcceptExSockaddrs is defined as

 VOID GetAcceptExSockaddrs(      PVOID lpOutputBuffer,      DWORD dwReceiveDataLength,      DWORD dwLocalAddressLength,      DWORD dwRemoteAddressLength,      LPSOCKADDR *LocalSockaddr,      LPINT LocalSockaddrLength,      LPSOCKADDR *RemoteSockaddr,      LPINT RemoteSockaddrLength );

The lpOutputBuffer parameter should be set to the lpOutputBuffer returned from AcceptEx. The dwReceiveDataLength, dwLocalAddressLength, and dwRemoteAddressLength parameters should be set to the same values as the dwReceiveDataLength, dwLocalAddressLength, and dwRemoteAddressLength parameters that were passed to AcceptEx. The LocalSockaddr and RemoteSockaddr parameters, which are pointers to SOCKADDR structures with the local and remote address information, receive a pointer offset from the originating lpOutputBuffer parameter. This makes it easy to reference the elements of a SOCKADDR structure from the address information contained in lpOutputBuffer. The LocalSockaddrLength and RemoteSockaddrLength parameters receive the size of the local and remote addresses.

Completion routines

Completion routines are the other method your application can use to manage completed overlapped I/O requests. Completion routines are simply functions that you optionally pass to an overlapped I/O request and that the system invokes when an overlapped I/O request completes. Their primary role is to service a completed I/O request using the caller's thread. Additionally, applications can continue overlapped I/O processing through the completion routine.

To use completion routines for overlapped I/O requests, your application must specify a completion routine, along with a WSAOVERLAPPED structure, to an I/O bound Winsock function (described earlier). A completion routine must have the following function prototype:

 void CALLBACK CompletionROUTINE(     DWORD dwError,     DWORD cbTransferred,     LPWSAOVERLAPPED lpOverlapped,     DWORD dwFlags );

When an overlapped I/O request completes using a completion routine, the parameters contain the following information:

The parameter dwError specifies the completion status for the overlapped operation as indicated by lpOverlapped.

The cbTransferred parameter specifies the number of bytes that were transferred during the overlapped operation.

The lpOverlapped parameter is the WSAOVERLAPPED structure passed into the originating I/O call.

The dwFlags parameter is not used and will be set to 0.

There is a major difference between overlapped requests submitted with a completion routine and overlapped requests submitted with an event object. The WSAOVERLAPPED structure's event field, hEvent, is not used, which means you cannot associate an event object with the overlapped request. Once you make an overlapped I/O call with a completion routine, your calling thread must eventually service the completion routine once it has completed. This requires you to place your calling thread in an alertable wait state and process the completion routine later, after the I/O operation has completed. The WSAWaitForMultipleEvents function can be used to put your thread in an alertable wait state. The catch is that you must also have at least one event object available for the WSAWaitForMultipleEvents function. If your application handles only overlapped requests with completion routines, you are not likely to have any event objects around for processing. As an alternative, your application can use the Win32 SleepEx function to set your thread in an alertable wait state. Of course, you can also create a dummy event object that is not associated with anything. If your calling thread is always busy and not in an alertable wait state, no posted completion routine will ever get called.

As you saw earlier, WSAWaitForMultipleEvents normally waits for event objects associated with WSAOVERLAPPED structures. This function is also designed to place your thread in an alertable wait state and to process completion routines for completed overlapped I/O requests if you set the parameter fAlertable to TRUE. When overlapped I/O requests complete with a completion routine, the return value is WSA_IO_COMPLETION instead of an event object index in the event array. The SleepEx function provides the same behavior as WSAWaitForMultipleEvents except that it does not need any event objects. The SleepEx function is defined as

 DWORD SleepEx(     DWORD dwMilliseconds,      BOOL bAlertable );

The dwMilliseconds parameter defines how long in milliseconds SleepEx will wait. If dwMilliseconds is set to INFINITE, SleepEx waits indefinitely. The bAlertable parameter determines how a completion routine will execute. If bAlertable is set to FALSE and an I/O completion callback occurs, the I/O completion function is not executed and the function does not return until the wait period specified in dwMilliseconds has elapsed. If it is set to TRUE, the completion routine executes and the SleepEx function returns WAIT_IO_COMPLETION.

Figure 8-8 outlines how to structure a simple server application that is capable of managing one socket request using completion routines as described above. The application illustrates the following programming steps:

Create a socket and begin listening for a connection on a specified port.

Accept an inbound connection.

Create a WSAOVERLAPPED structure for the accepted socket.

Post an asynchronous WSARecv request on the socket by specifying the WSAOVERLAPPED structure as a parameter and supplying a completion routine.

Call WSAWaitForMultipleEvents with the fAlertable parameter set to TRUE, and wait for an overlapped request to complete. When an overlapped request completes, the completion routine automatically executes and WSAWaitForMultipleEvents returns WSA_IO_COMPLETION. Inside the completion routine, post another overlapped WSARecv request with a completion routine.

Verify that WSAWaitForMultipleEvents returns WSA_IO_COMPLETION.

Repeat steps 5 and 6.

Figure 8-8. Simple overlapped sample using completion routines

 SOCKET AcceptSocket; WSABUF DataBuf; void main(void) {     WSAOVERLAPPED Overlapped;     // Step 1:     //  Start Winsock, and set up a listening socket     ...     // Step 2:     //  Accept a new connection     AcceptSocket = accept(ListenSocket, NULL, NULL);     // Step 3:     //  Now that we have an accepted socket, start     //  processing I/O using overlapped I/O with a     //  completion routine. To get the overlapped I/O     //  processing started, first submit an     //  overlapped WSARecv() request.     Flags = 0;              ZeroMemory(&Overlapped, sizeof(WSAOVERLAPPED));     DataBuf.len = DATA_BUFSIZE;     DataBuf.buf = Buffer;     // Step 4:     //  Post an asynchronous WSARecv() request     //  on the socket by specifying the WSAOVERLAPPED     //  structure as a parameter, and supply       //  the WorkerRoutine function below as the      //  completion routine     if (WSARecv(AcceptSocket, &DataBuf, 1, &RecvBytes,          &Flags, &Overlapped, WorkerRoutine)          == SOCKET_ERROR)     {         if (WSAGetLastError() != WSA_IO_PENDING)         {             printf("WSARecv() failed with error %d\n",                  WSAGetLastError());             return;         }     }     // Since the WSAWaitForMultipleEvents() API     // requires waiting on one or more event objects,     // we will have to create a dummy event object.     // As an alternative, we can use SleepEx()     // instead.     EventArray[0] = WSACreateEvent();      while(TRUE)     {         // Step 5:         Index = WSAWaitForMultipleEvents(1, EventArray,             FALSE, WSA_INFINITE, TRUE);         // Step 6:         if (Index == WAIT_IO_COMPLETION)         {             // An overlapped request completion routine             // just completed. Continue servicing              // more completion routines.             break;         }         else         {             // A bad error occurred--stop processing!             // If we were also processing an event             // object, this could be an index to             // the event array.             return;         }     } } void CALLBACK WorkerRoutine(DWORD Error,                              DWORD BytesTransferred,                              LPWSAOVERLAPPED Overlapped,                             DWORD InFlags) {     DWORD SendBytes, RecvBytes;     DWORD Flags;     if (Error != 0 || BytesTransferred == 0)     {         // Either a bad error occurred on the socket         // or the socket was closed by a peer         closesocket(AcceptSocket);         return;     }     // At this point, an overlapped WSARecv() request     // completed successfully. Now we can retrieve the     // received data that is contained in the variable     // DataBuf. After processing the received data, we      // need to post another overlapped WSARecv() or     // WSASend() request. For simplicity, we will post      // another WSARecv() request.     Flags = 0;              ZeroMemory(&Overlapped, sizeof(WSAOVERLAPPED));     DataBuf.len = DATA_BUFSIZE;     DataBuf.buf = Buffer;     if (WSARecv(AcceptSocket, &DataBuf, 1, &RecvBytes,          &Flags, &Overlapped, WorkerRoutine)          == SOCKET_ERROR)     {         if (WSAGetLastError() != WSA_IO_PENDING )         {             printf("WSARecv() failed with error %d\n",                  WSAGetLastError());             return;         }     } }

The Completion Port Model

The completion port model is by far the most complicated I/O model. However, it offers the best system performance possible when an application has to manage many sockets at once. Unfortunately, it's available only on Windows NT and Windows 2000. Because of the complexity of its design, you should consider using the completion port model only if you need your application to manage hundreds or even thousands of sockets simultaneously and you want your application to scale well when more CPUs are added to the system. The most important point to remember is that the I/O completion port model is your best choice if you are developing a high-performance server for Windows NT or Windows 2000 that is expected to service many socket I/O requests (a Web server, for example).

Essentially the completion port model requires you to create a Win32 completion port object that will manage overlapped I/O requests using a specified number of threads to service the completed overlapped I/O requests. Note that a completion port is actually a Win32, Windows NT, and Windows 2000 I/O construct that is capable of accepting more than just socket handles. However, this section will describe only how to take advantage of the completion port model by using socket handles. To begin using this model, you are required to create an I/O completion port object that will be used to manage multiple I/O requests for any number of socket handles. This is accomplished by calling the CreateIoCompletionPort function, which is defined as

 HANDLE CreateIoCompletionPort(     HANDLE FileHandle,     HANDLE ExistingCompletionPort,     DWORD CompletionKey,     DWORD NumberOfConcurrentThreads );

Before examining the parameters in detail, be aware that this function is actually used for two distinct purposes:

To create a completion port object

To associate a handle with a completion port

When you initially create a completion port, the only parameter of interest is NumberOfConcurrentThreads; the first three parameters are ignored. The NumberOfConcurrentThreads parameter is special in that it defines the number of threads that are allowed to execute concurrently on a completion port. Ideally, you want only one thread per processor to service the completion port to avoid thread context switching. The value 0 for this parameter tells the system to allow as many threads as there are processors in the system. You can use the code below to create an I/O completion port.

 CompletionPort = CreateIoCompletionPort(INVALID_HANDLE_VALUE,     NULL, 0, 0);

This will return a handle that is used to identify the completion port when a socket handle is assigned to it.

Worker threads and completion ports

After a completion port is successfully created, you can begin to associate socket handles with the object. Before associating sockets, though, you have to create one or more worker threads to service the completion port when socket I/O requests are posted to the completion port object. At this point, you might wonder how many threads should be created to service the completion port. This is actually one of the more complicated aspects of the completion port model because the number needed to service I/O requests depends on the overall design of your application. It's important to note the distinction between number of concurrent threads to specify when calling CreateIoCompletionPort vs. the number of worker threads to create; they do not represent the same thing. We recommended earlier that you should have the CreateIoCompletionPort function specify one thread per processor to avoid thread context switching. The NumberOfConcurrentThreads parameter of CreateIoCompletionPort explicitly tells the system to allow only n threads to operate at a time on the completion port. If you create more than n worker threads on the completion port, only n threads will be allowed to operate at a time. (Actually, the system might exceed this value for a short amount of time, but the system will quickly bring it down to the value you specify in CreateIoCompletionPort.) You might be wondering why you would create more worker threads than the number specified by the CreateIoCompletionPort call. As we mentioned earlier, this depends on the overall design of your application. If one of your worker threads calls a function—such as Sleep or WaitForSingleObject—and becomes suspended, another thread will be allowed to operate in its place. In other words, you always want to have as many threads available for execution as the number of threads you allow to execute in the CreateIoCompletionPort call. Thus, if you expect your worker thread to ever become blocked, it is reasonable to create more worker threads than the value specified in CreateIoCompletionPort's NumberOfConcurrentThreads parameter.

Once you have enough worker threads to service I/O requests on the completion port, you can begin to associate socket handles with the completion port. This requires calling the CreateIoCompletionPort function on an existing completion port and supplying the first three parameters—FileHandle, ExistingCompletionPort, and CompletionKey—with socket information. The FileHandle parameter represents a socket handle to associate with the completion port. The ExistingCompletionPort parameter identifies the completion port. The CompletionKey parameter identifies per-handle data that you can associate with a particular socket handle. Applications are free to store any type of information associated with a socket by using this key. We call it per-handle data because it represents data associated with a socket handle. It is useful to store the socket handle using the key as a pointer to a data structure containing the socket handle and other socket-specific information. As we will see later in this chapter, the thread routines that service the completion port can retrieve socket-handle-specific information using this key.

Let's begin to construct a basic application framework from what we've learned so far. Figure 8-9 demonstrates how to start developing an echo server application using the completion port model. In this figure, we take the following preparation steps:

Create a completion port. The fourth parameter is left as 0, specifying that only one worker thread per processor will be allowed to execute at a time on the completion port.

Determine how many processors exist on the system.

Create worker threads to service completed I/O requests on the completion port using processor information in step 2. In the case of this simple example, we create one worker thread per processor because we do not expect our threads to ever get in a suspended condition in which there would not be enough threads to execute for each processor. When the CreateThread function is called, you must supply a worker routine that the thread executes upon creation. We will discuss the worker thread's responsibilities later in this section.

Prepare a listening socket to listen for connections on port 5150.

Accept inbound connections using the accept function.

Create a data structure to represent per-handle data and save the accepted socket handle in the structure.

Associate the new socket handle returned from accept with the completion port by calling CreateIoCompletionPort. Pass the per-handle data structure to CreateIoCompletionPort via the completion key parameter.

Start processing I/O on the accepted connection. Essentially, you want to post one or more asynchronous WSARecv or WSASend requests on the new socket using the overlapped I/O mechanism. When these I/O requests complete, a worker thread services the I/O requests and continues processing future I/O requests, as we will see later in the worker routine specified in step 3.

Repeat steps 5-8 until server terminates.

Figure 8-9. Setting up a completion port

 StartWinsock(); // Step 1: // Create an I/O completion port CompletionPort = CreateIoCompletionPort(     INVALID_HANDLE_VALUE, NULL, 0, 0); // Step 2: // Determine how many processors are on the system GetSystemInfo(&SystemInfo); // Step 3: // Create worker threads based on the number of // processors available on the system. For this // simple case, we create one worker thread for each // processor. for(i = 0; i < SystemInfo.dwNumberOfProcessors;     i++) {     HANDLE ThreadHandle;     // Create a server worker thread, and pass the     // completion port to the thread. NOTE: the     // ServerWorkerThread procedure is not defined     // in this listing.     ThreadHandle = CreateThread(NULL, 0,         ServerWorkerThread, CompletionPort,         0, &ThreadID);     // Close the thread handle     CloseHandle(ThreadHandle); } // Step 4: // Create a listening socket Listen = WSASocket(AF_INET, SOCK_STREAM, 0, NULL, 0,     WSA_FLAG_OVERLAPPED); InternetAddr.sin_family = AF_INET; InternetAddr.sin_addr.s_addr = htonl(INADDR_ANY); InternetAddr.sin_port = htons(5150); bind(Listen, (PSOCKADDR) &InternetAddr,     sizeof(InternetAddr)); // Prepare socket for listening listen(Listen, 5); while(TRUE) {     // Step 5:     // Accept connections and assign to the completion     // port     Accept = WSAAccept(Listen, NULL, NULL, NULL, 0);     // Step 6:     // Create per-handle data information structure to      // associate with the socket     PerHandleData = (LPPER_HANDLE_DATA)          GlobalAlloc(GPTR, sizeof(PER_HANDLE_DATA));     printf("Socket number %d connected\n", Accept);     PerHandleData->Socket = Accept;     // Step 7:     // Associate the accepted socket with the     // completion port     CreateIoCompletionPort((HANDLE) Accept,         CompletionPort, (DWORD) PerHandleData, 0);     // Step 8:     //  Start processing I/O on the accepted socket.     //  Post one or more WSASend() or WSARecv() calls     //  on the socket using overlapped I/O.     WSARecv(...); }

Completion ports and overlapped I/O

After associating a socket handle with a completion port, you can begin processing I/O requests by posting send and receive requests on the socket handle. You can now start to rely on the completion port for I/O completion notification. Essentially, the completion port model takes advantage of the Win32 overlapped I/O mechanism in which Winsock API calls such as WSASend and WSARecv return immediately when called. It is up to your application to retrieve the results of the calls at a later time through an OVERLAPPED structure. In the completion port model, this is accomplished by having one or more worker threads wait on the completion port using the GetQueuedCompletionStatus function, which is defined as

 BOOL GetQueuedCompletionStatus(     HANDLE CompletionPort,      LPDWORD lpNumberOfBytesTransferred,      LPDWORD lpCompletionKey,      LPOVERLAPPED * lpOverlapped,      DWORD dwMilliseconds );

The CompletionPort parameter represents the completion port to wait on. The lpNumberOfBytesTransferred parameter receives the number of bytes transferred after a completed I/O operation, such as WSASend or WSARecv. The lpCompletionKey parameter returns per-handle data for the socket that was originally passed into the CreateIoCompletionPort function. As we mentioned earlier, we recommend saving the socket handle in this key. The lpOverlapped parameter receives the overlapped result of the completed I/O operation. This is actually an important parameter because it can be used to retrieve per I/O_operation data. The final parameter, dwMilliseconds, specifies the number of milliseconds that the caller is willing to wait for a completion packet to appear on the completion port. If you specify INFINITE, the call waits forever.

Per-handle data and per–I/O operation data

When a worker thread receives I/O completion notification from the GetQueuedCompletionStatus API call, the lpCompletionKey and lpOverlapped parameters contain socket information that can be used to continue processing I/O on a socket through the completion port. Two types of important socket data are available through these parameters: per-handle data and per-I/O operation data.

The lpCompletionKey parameter contains what we call per-handle data because the data is related to a socket handle when a socket is first associated with the completion port. This is the data that is passed as the CompletionKey parameter of the CreateIoCompletionPort API call. As we noted earlier, your application can pass any type of socket information through this parameter. Typically, applications will store the socket handle related to the I/O request here.

The lpOverlapped parameter contains an OVERLAPPED structure followed by what we call per-I/O operation data, which is anything that your worker thread will need to know when processing a completion packet (echo the data back, accept the connection, post another read, and so on). Per-I/O operation data is any number of bytes attached to the end of an OVERLAPPED structure that you pass into a function that expects an OVERLAPPED structure. A simple way to make this work is to define a structure and place an OVERLAPPED structure as the first element of the new structure. For example, we declare the following data structure to manage per-I/O operation data:

 typedef struct {     OVERLAPPED Overlapped;     WSABUF     DataBuf;     CHAR       Buffer[DATA_BUFSIZE];     BOOL       OperationType; } PER_IO_OPERATION_DATA;

This structure demonstrates some important data elements you might want to relate to an I/O operation, such as the type of I/O operation (a send or receive request) that just completed. In this structure, we consider the data buffer for the completed I/O operation to be useful. To call a Winsock API function that expects an OVERLAPPED structure, you can either cast your structure as an OVERLAPPED pointer or simply dereference the OVERLAPPED element of your structure. For example,

 PER_IO_OPERATION_DATA PerIoData; // You would call a function either as     WSARecv(socket, ..., (OVERLAPPED *)&PerIoData); // or as     WSARecv(socket, ..., &(PerIoData.Overlapped));

Later in the worker thread, when GetQueuedCompletionStatus returns with an overlapped structure (and completion key), you can determine which operation was posted on this handle by dereferencing the OperationType member. (Just cast the returned overlapped structure to your PER_IO_OPERATION_DATA structure.) One of the biggest benefits of per-I/O operation data is that it allows you to manage multiple I/O operations (read/write, multiple reads, multiple writes, and so on) on the same handle. You might ask why you would want to post more than one I/O operation at a time on a socket. The answer is scalability. For example, if you have a multiple-processor machine with a worker thread using each processor, you could potentially have several processors sending and receiving data on a socket at the same time.

To complete the simple echo server sample from above, we need to supply a ServerWorkerThread function. Figure 8-10 outlines how to develop a worker thread routine that uses per-handle data and per-I/O operation data to service I/O requests.

Figure 8-10. Completion port worker thread

 DWORD WINAPI ServerWorkerThread(     LPVOID CompletionPortID) {     HANDLE CompletionPort = (HANDLE) CompletionPortID;     DWORD BytesTransferred;     LPOVERLAPPED Overlapped;     LPPER_HANDLE_DATA PerHandleData;     LPPER_IO_OPERATION_DATA PerIoData;     DWORD SendBytes, RecvBytes;     DWORD Flags;          while(TRUE)     {         // Wait for I/O to complete on any socket         // associated with the completion port              GetQueuedCompletionStatus(CompletionPort,             &BytesTransferred,(LPDWORD)&PerHandleData,             (LPOVERLAPPED *) &PerIoData, INFINITE);         // First check to see whether an error has occurred         // on the socket; if so, close the          // socket and clean up the per-handle data         // and per-I/O operation data associated with         // the socket         if (BytesTransferred == 0 &&             (PerIoData->OperationType == RECV_POSTED ||              PerIoData->OperationType == SEND_POSTED))         {             // A zero BytesTransferred indicates that the             // socket has been closed by the peer, so             // you should close the socket. Note:              // Per-handle data was used to reference the             // socket associated with the I/O operation.               closesocket(PerHandleData->Socket);             GlobalFree(PerHandleData);             GlobalFree(PerIoData);             continue;         }         // Service the completed I/O request. You can         // determine which I/O request has just         // completed by looking at the OperationType         // field contained in the per-I/O operation data.         if (PerIoData->OperationType == RECV_POSTED)         {             // Do something with the received data             // in PerIoData->Buffer         }         // Post another WSASend or WSARecv operation.         // As an example, we will post another WSARecv()         // I/O operation.         Flags = 0;         // Set up the per-I/O operation data for the next         // overlapped call         ZeroMemory(&(PerIoData->Overlapped),             sizeof(OVERLAPPED));         PerIoData->DataBuf.len = DATA_BUFSIZE;         PerIoData->DataBuf.buf = PerIoData->Buffer;         PerIoData->OperationType = RECV_POSTED;         WSARecv(PerHandleData->Socket,              &(PerIoData->DataBuf), 1, &RecvBytes,             &Flags, &(PerIoData->Overlapped), NULL);     } }

One final detail not outlined in the simple server examples in Figures 8-9 and 8-10 or on the companion CD is how to properly close an I/O completion port, especially if you have one or more threads in progress performing I/O on several sockets. The main thing to avoid is freeing an OVERLAPPED structure when an overlapped I/O operation is in progress. The best way to prevent this is to call closesocket on every socket handle—any overlapped I/O operations pending will complete. Once all socket handles are closed, you need to terminate all worker threads on the completion port. This can be accomplished by sending a special completion packet to each worker thread using the PostQueuedCompletionStatus function, which informs each thread to exit immediately. PostQueuedCompletionStatus is defined as

 BOOL PostQueuedCompletionStatus(     HANDLE CompletionPort,     DWORD dwNumberOfBytesTransferred,      DWORD dwCompletionKey,      LPOVERLAPPED lpOverlapped );

The CompletionPort parameter represents the completion port object to which you want to send a completion packet. The dwNumberOfBytesTransferred, dwCompletionKey, and lpOverlapped parameters each allow you to specify a value that will be sent directly to the corresponding parameter of the GetQueuedCompletionStatus function. Thus, when a worker thread receives the three passed parameters of GetQueuedCompletionStatus, it can determine when it should exit based on a special value set in one of the three parameters. For example, you could pass the value 0 in the dwCompletionKey parameter, which a worker thread could interpret as an instruction to terminate. Once all the worker threads are closed, you can close the completion port using the CloseHandle function and finally exit your program safely.

Other issues

Several techniques can further improve overall I/O performance of a socket application using completion ports. One technique worth considering is experimenting with socket buffer sizes to increase I/O performance and application scalability. For example, if your application uses one large buffer with only one WSARecv request instead of three small buffers with three WSARecv requests, your application will not scale well to multiprocessor machines. This is because a single buffer can be processed by only one thread at a time. Furthermore, the single-buffer approach has performance consequences: you might not be keeping the network protocol driver busy enough if you are doing only one receive operation at a time. That is, if you wait for one WSARecv to complete before you receive more data, you effectively let the protocol rest between the WSARecv completion and the next receive.

Another performance gain worth considering results from using the socket options SO_SNDBUF and SO_RCVBUF to control the size of internal socket buffers. These options allow an application to change the size of the internal data buffer of a socket. If you set this value to 0, Winsock will use your application buffer directly in an overlapped I/O call to transmit data to and from the protocol stack, thereby reducing a buffer copy between your application and Winsock. The following code fragment demonstrates how to call the setsockopt function using the SO_SNDBUF option:

 int nZero = 0; setsockopt(socket, SOL_SOCKET, SO_SNDBUF,      (char *)&nZero, sizeof(nZero));

Note that setting these buffer sizes to 0 has only a positive impact on your application when multiple I/O requests are posted at a given time. Chapter 9 describes these socket options in greater detail.

A final performance gain worth considering results from using the AcceptEx API call for connection requests that deliver small amounts of data. This allows your application to service an accept request and retrieve data through a single API call, thereby reducing the overhead of separate accept and WSARecv calls. As an added benefit, you can service AcceptEx requests using the completion port since it also features an OVERLAPPED structure. AcceptEx is useful if your server expects to handle a small amount of recv-send transactions once a connection is established (as with a Web server). Otherwise, if your application is performing hundreds or thousands of data transfers after accepting a connection, this operation offers no real gain.

On a final note, Winsock applications should not use the ReadFile and WriteFile Win32 functions for processing I/O on a completion port in Winsock. These functions do feature an OVERLAPPED structure and can be successfully used on a completion port; however, the WSARecv and WSASend functions are better optimized for processing I/O in Winsock 2. Using ReadFile and WriteFile involves making many more unnecessary kernel/user mode procedure call transitions, thread context switches, and parameter marshaling, resulting in a significant performance penalty.