Socket IO Models | Network Programming for Microsoft Windows (Microsoft Professional Series)

Socket I/O Models

Essentially, six types of socket I/O models are available that allow Winsock applications to manage I/O: blocking, select, WSAAsyncSelect, WSAEventSelect, overlapped, and completion port. This section explains the features of each I/O model and outlines how to use it to develop an application that can manage one or more socket requests. On the companion CD, you will find sample applications for each I/O model demonstrating how to develop a simple TCP echo server using the principles described in each model.

Note that technically speaking, there could be a straight non-blocking I/O model—that is, an application that places all sockets into non-blocking mode with ioctlsocket. However, this soon becomes unmanageable because the application will spend most of its time cycling through socket handles and I/O operations until they succeed.

The blocking Model

Most Winsock programmers begin with the blocking model because it is the easiest and most straightforward model. The Winsock samples in Chapter 1 use this model. As we have mentioned, applications following this model typically use one or two threads per socket connection for handling I/O. Each thread will then issue blocking operations, such as send and recv.

The advantage to the blocking model is its simplicity. For very simple applications and rapid prototyping, this model is very useful. The disadvantage is that it does not scale up to many connections as the creation of more threads consumes valuable system resources.

The select Model

The select model is another I/O model widely available in Winsock. We call it the select model because it centers on using the select function to manage I/O. The design of this model originated on UNIX-based computers featuring Berkeley socket implementations. The select model was incorporated into Winsock 1.1 to allow applications that want to avoid blocking on socket calls the capability to manage multiple sockets in an organized manner. Because Winsock 1.1 is backward-compatible with Berkeley socket implementations, a Berkeley socket application that uses the select function should technically be able to run without modification.

The select function can be used to determine if there is data on a socket and if a socket can be written to. The reason for having this function is to prevent your application from blocking on an I/O bound call such as send or recv when a socket is in a blocking mode and to prevent the WSAEWOULDBLOCK error when a socket is in a non-blocking mode. The select function blocks for I/O operations until the conditions specified as parameters are met. The function prototype for select is as follows:

int select(     int nfds,     fd_set FAR * readfds,     fd_set FAR * writefds,     fd_set FAR * exceptfds,     const struct timeval FAR * timeout );

The first parameter, nfds, is ignored and is included only for compatibility with Berkeley socket applications. You'll notice that there are three fd_set parameters: one for checking readability (readfds), one for writeability (writefds), and one for out-of-band data (exceptfds). Essentially, the fd_set data type represents a collection of sockets. The readfds set identifies sockets that meet one of the following conditions:

Data is available for reading.
Connection has been closed, reset, or terminated.
If listen has been called and a connection is pending, the accept function will succeed.

The writefds set identifies sockets in which one of the following is true:

Data can be sent.
If a non-blocking connect call is being processed, the connection has succeeded.

Finally, the exceptfds set identifies sockets in which one of the following is true:

If a non-blocking connect call is being processed, the connection attempt failed.
OOB data is available for reading.

For example, when you want to test a socket for readability, you must add it to the readfds set and wait for the select function to complete. When the select call completes, you have to determine if your socket is still part of the readfds set. If so, the socket is readable—you can begin to retrieve data from it. Any two of the three parameters (readfds, writefds, exceptfds) can be null values (at least one must not be null), and any non-null set must contain at least one socket handle; otherwise, the select function won't have anything to wait for. The final parameter, timeout, is a pointer to a timeval structure that determines how long the select function will wait for I/O to complete. If timeout is a null pointer, select will block indefinitely until at least one descriptor meets the specified criteria. The timeval structure is defined as

struct timeval  {     long tv_sec;      long tv_usec; };

The tv_sec field indicates how long to wait in seconds; the tv_usec field indicates how long to wait in milliseconds. The timeout value {0, 0} indicates select will return immediately, allowing an application to poll on the select operation. This should be avoided for performance reasons. When select completes successfully, it returns the total number of socket handles that have I/O operations pending in the fd_set structures. If the timeval limit expires, it returns 0. If select fails for any reason, it returns SOCKET_ERROR.

Before you can begin to use select to monitor sockets, your application has to set up either one or all of the read, write, and exception fd_set structures by assigning socket handles to a set. When you assign a socket to one of the sets, you are asking select to let you know if the I/O activities just described have occurred on a socket. Winsock provides the following set of macros to manipulate and check the fd_set sets for I/O activity.

FD_ZERO(*set) Initializes set to the empty set. A set should always be cleared before using.
FD_CLR(s, *set) Removes socket s from set.
FD_ISSET(s, *set) Checks to see if s is a member of set and returns TRUE if so.
FD_SET(s, *set) Adds socket s to set.

For example, if you want to find out when it is safe to read data from a socket without blocking, simply assign your socket to the fd_read set using the FD_SET macro and then call select. To test whether your socket is still part of the fd_read set, use the FD_ISSET macro. The following five steps describe the basic flow of an application that uses select with one or more socket handles:

Initialize each fd_set of interest by using the FD_ZERO macro.
Assign socket handles to each of the fd_set sets of interest by using the FD_SET macro.
Call the select function and wait until I/O activity sets one or more of the socket handles in each fd_set set provided. When select completes, it returns the total number of socket handles that are set in all of the fd_set sets and updates each set accordingly.
Using the return value of select, your application can determine which application sockets have I/O pending by checking each fd_set set using the FD_ISSET macro.
After determining which sockets have I/O pending in each of the sets, process the I/O and go to step 1 to continue the select process.

When select returns, it modifies each of the fd_set structures by removing the socket handles that do not have pending I/O operations. This is why you should use the FD_ISSET macro as in step 4 to determine if a particular socket is part of a set. The following code sample outlines the basic steps needed to set up the select model for a single socket. Adding more sockets to this application simply involves maintaining a list or an array of additional sockets.

SOCKET  s; fd_set  fdread; int     ret; // Create a socket, and accept a connection // Manage I/O on the socket while(TRUE) {     // Always clear the read set before calling      // select()     FD_ZERO(&fdread);     // Add socket s to the read set     FD_SET(s, &fdread);     if ((ret = select(0, &fdread, NULL, NULL, NULL))          == SOCKET_ERROR)      {         // Error condition     }     if (ret > 0)     {         // For this simple case, select() should return         // the value 1. An application dealing with          // more than one socket could get a value          // greater than 1. At this point, your          // application should check to see whether the          // socket is part of a set.         if (FD_ISSET(s, &fdread))         {             // A read event has occurred on socket s         }     } }

The advantage of using select is the capability to multiplex connections and I/O on many sockets from a single thread. This prevents the explosion of threads associated with blocking sockets and multiple connections. The disadvantage is the maximum number of sockets that may be added to the fd_set structures. By default, the maximum is defined as FD_SETSIZE, which is defined in WINSOCK2.H as 64. To increase this limit, an application might define FD_SETSIZE to something large. This define must appear before including WINSOCK2.H. Also, the underlying provider imposes an arbitrary maximum fd_set size, which typically is 1024 but is not guaranteed to be. Finally, for a large FD_SETSIZE, consider the performance hit of setting 1000 sockets before calling select followed by checking whether each of those 1000 sockets is set after the call returns.

The WSAAsyncSelect Model

Winsock provides a useful asynchronous I/O model that allows an application to receive Windows message–based notification of network events on a socket. This is accomplished by calling the WSAAsyncSelect function after creating a socket. Before we continue, however, we need to make one subtle distinction. The WSAAsyncSelect and WSAEventSelect models provide asynchronous notification of the capability to read or write data. It does not provide asynchronous data transfer like the overlapped and completion port models.

This model originally existed in Winsock 1.1 implementations to help application programmers cope with the cooperative multitasking message-based environment of 16-bit Windows platforms, such as Windows for Workgroups. Applications can still benefit from this model, especially if they manage window messages in a standard Windows procedure, usually referred to as a winproc. This model is also used by the Microsoft Foundation Class (MFC) CSocket object.

Message Notification

To use the WSAAsyncSelect model, your application must first create a window using the CreateWindow function and supply a window procedure (winproc) support function for it. You can also use a dialog box with a dialog procedure instead of a window because dialog boxes are windows. For our purposes, we will demonstrate this model using a simple window with a supporting window procedure. Once you have set up the window infrastructure, you can begin creating sockets and turning on window message notification by calling the WSAAsyncSelect function, which is defined as

 int WSAAsyncSelect(     SOCKET s,     HWND hWnd,     unsigned int wMsg,     long lEvent );

The s parameter represents the socket we are interested in. The hWnd parameter is a window handle identifying the window or the dialog box that receives a message when a network event occurs. The wMsg parameter identifies the message to be received when a network event occurs. This message is posted to the window that is identified by the hWnd window handle. Applications usually set this message to a value greater than the Windows WM_USER value to avoid confusing a network window message with a predefined standard window message. The last parameter, lEvent, represents a bitmask that specifies a combination of network events—listed in Table 5-3—that the application is interested in. Most applications are typically interested in the FD_READ, FD_WRITE, FD_ACCEPT, FD_CONNECT, and FD_CLOSE network event types. Of course, the use of the FD_ACCEPT or the FD_CONNECT type depends on whether your application is a client or a server. If your application is interested in more than one network event, simply set this field by performing a bitwise OR on the types and assigning them to lEvent. For example:

WSAAsyncSelect(s, hwnd, WM_SOCKET,      FD_CONNECT   FD_READ   FD_WRITE   FD_CLOSE);

This allows our application to get connect, send, receive, and socket-closure network event notifications on socket s. It is impossible to register multiple events one at a time on the socket. Also note that once you turn on event notification on a socket, it remains on unless the socket is closed by a call to closesocket or the application changes the registered network event types by calling WSAAsyncSelect (again, on the socket). Setting the lEvent parameter to 0 effectively stops all network event notification on the socket.

When your application calls WSAAsyncSelect on a socket, the socket mode is automatically changed from blocking to the non-blocking mode that we described previously. As a result, if a Winsock I/O call such as WSARecv is called and has to wait for data, it will fail with error WSAEWOULDBLOCK. To avoid this error, applications should rely on the user-defined window message specified in the wMsg parameter of WSAAsyncSelect to indicate when network event types occur on the socket.

**Table 5-3** *Network Event Types for the WSAAsyncSelect Function*
Event Type	Meaning
FD_READ	The application wants to receive notification of readiness for reading.
FD_WRITE	The application wants to receive notification of readiness for writing.
FD_OOB	The application wants to receive notification of the arrival of OOB data.
FD_ACCEPT	The application wants to receive notification of incoming connections.
FD_CONNECT	The application wants to receive notification of a completed connection or a multipoint join operation.
FD_CLOSE	The application wants to receive notification of socket closure.
FD_QOS	The application wants to receive notification of socket QOS changes.
FD_GROUP_QOS	The application wants to receive notification of socket group QOS changes (reserved for future use with socket groups).
FD_ROUTING_INTERFACE_CHANGE	The application wants to receive notification of routing interface changes for the specified destination(s).
FD_ADDRESS_LIST_CHANGE	The application wants to receive notification of local address list changes for the socket's protocol family.

After your application successfully calls WSAAsyncSelect on a socket, the application begins to receive network event notification as Windows messages in the window procedure associated with the hWnd parameter window handle. A window procedure is normally defined as

LRESULT CALLBACK WindowProc(     HWND hWnd,     UINT uMsg,     WPARAM wParam,     LPARAM lParam );

The hWnd parameter is a handle to the window that invoked the window procedure. The uMsg parameter indicates which message needs to be processed. In your case, you will be looking for the message defined in the WSAAsyncSelect call. The wParam parameter identifies the socket on which a network event has occurred. This is important if you have more than one socket assigned to this window procedure. The lParam parameter contains two important pieces of information—the low word of lParam specifies the network event that has occurred, and the high word of lParam contains any error code.

When network event messages arrive at a window procedure, the application should first check the lParam high-word bits to determine whether a network error has occurred on the socket. There is a special macro, WSAGETSELECTERROR, that returns the value of the high-word bits error information. After the application has verified that no error occurred on the socket, the application should determine which network event type caused the Windows message to fire by reading the low-word bits of lParam. Another special macro, WSAGETSELECTEVENT, returns the value of the low-word portion of lParam.

The following example demonstrates how to manage window messages when using the WSAAsyncSelect I/O model. The code highlights the steps needed to develop a basic server application and removes the programming details of developing a fully featured Windows application.

#define WM_SOCKET WM_USER + 1 #include <winsock2.h> #include <windows.h> int WINAPI WinMain(HINSTANCE hInstance,      HINSTANCE hPrevInstance, LPSTR lpCmdLine,     int nCmdShow) {     WSADATA wsd;     SOCKET Listen;     SOCKADDR_IN InternetAddr;     HWND Window;     // Create a window and assign the ServerWinProc     // below to it     Window = CreateWindow();     // Start Winsock and create a socket     WSAStartup(MAKEWORD(2,2), &wsd);     Listen = socket (AF_INET, SOCK_STREAM, IPPROTO_TCP);     // Bind the socket to port 5150     // and begin listening for connections     InternetAddr.sin_family = AF_INET;     InternetAddr.sin_addr.s_addr = htonl(INADDR_ANY);     InternetAddr.sin_port = htons(5150);     bind(Listen, (PSOCKADDR) &InternetAddr,         sizeof(InternetAddr));     // Set up window message notification on     // the new socket using the WM_SOCKET define     // above     WSAAsyncSelect(Listen, Window, WM_SOCKET,         FD_ACCEPT   FD_CLOSE);     listen(Listen, 5);     // Translate and dispatch window messages     // until the application terminates     while (1) {      // ...  } } BOOL CALLBACK ServerWinProc(HWND hDlg,UINT wMsg,     WPARAM wParam, LPARAM lParam) {     SOCKET Accept;     switch(wMsg)     {         case WM_PAINT:             // Process window paint messages             break;         case WM_SOCKET:             // Determine whether an error occurred on the             // socket by using the WSAGETSELECTERROR() macro             if (WSAGETSELECTERROR(lParam))             {                  // Display the error and close the socket                 closesocket( (SOCKET) wParam);                 break;             }             // Determine what event occurred on the             // socket             switch(WSAGETSELECTEVENT(lParam))             {                 case FD_ACCEPT:                     // Accept an incoming connection                     Accept = accept(wParam, NULL, NULL);                     // Prepare accepted socket for read,                     // write, and close notification                     WSAAsyncSelect(Accept, hDlg, WM_SOCKET,                         FD_READ   FD_WRITE   FD_CLOSE);                     break;                 case FD_READ:                     // Receive data from the socket in                     // wParam                     break;                 case FD_WRITE:                     // The socket in wParam is ready                     // for sending data                     break;                 case FD_CLOSE:                     // The connection is now closed                     closesocket( (SOCKET)wParam);                     break;             }             break;     }     return TRUE; }

One final detail worth noting is how applications should process FD_WRITE event notifications. FD_WRITE notifications are sent under only three conditions:

After a socket is first connected with connect or WSAConnect
After a socket is accepted with accept or WSAAccept
When a send, WSASend, sendto, or WSASendTo operation fails with WSAEWOULDBLOCK and buffer space becomes available

Therefore, an application should assume that sends are always possible on a socket starting from the first FD_WRITE message and lasting until a send, WSASend, sendto, or WSASendTo returns the socket error WSAEWOULDBLOCK. After such failure, another FD_WRITE message notifies the application that sends are once again possible.

The WSAAsyncSelect model offers many advantages; foremost is the capability to handle many connections simultaneously without much overhead, unlike the select model's requirement of setting up the fd_set structures. The disadvantages are having to use a window if your application requires no windows (such as a service or console application). Also, having a single window procedure to service all the events on thousands of socket handles can become a performance bottleneck (meaning this model doesn't scale very well).

The WSAEventSelect Model

Winsock provides another useful asynchronous event notification I/O model that is similar to the WSAAsyncSelect model that allows an application to receive event-based notification of network events on one or more sockets. This model is similar to the WSAAsyncSelect model because your application receives and processes the same network events listed in Table 5-3 that the WSAAsyncSelect model uses. The major difference with this model is that network events are notified via an event object handle instead of a window procedure.

Event Notification

The event notification model requires your application to create an event object for each socket used by calling the WSACreateEvent function, which is defined as

WSAEVENT WSACreateEvent(void);

The WSACreateEvent function simply returns a manual reset event object handle. Once you have an event object handle, you have to associate it with a socket and register the network event types of interest, as shown in Table 5-3. This is accomplished by calling the WSAEventSelect function, which is defined as

int WSAEventSelect(     SOCKET s,     WSAEVENT hEventObject,     long lNetworkEvents );

The s parameter represents the socket of interest. The hEventObject parameter represents the event object—obtained with WSACreateEvent—to associate with the socket. The last parameter, lNetworkEvents, represents a bitmask that specifies a combination of network event types (listed in Table 5-3) that the application is interested in. For a detailed discussion of these event types, see the WSAAsyncSelect I/O model discussed previously.

The event created for WSAEventSelect has two operating states and two operating modes. The operating states are known as signaled and non-signaled. The operating modes are known as manual reset and auto reset. WSACreateEvent initially creates event handles in a non-signaled operating state with a manual reset operating mode. As network events trigger an event object associated with a socket, the operating state changes from non-signaled to signaled. Because the event object is created in a manual reset mode, your application is responsible for changing the operating state from signaled to non-signaled after processing an I/O request. This can be accomplished by calling the WSAResetEvent function, which is defined as

BOOL WSAResetEvent(WSAEVENT hEvent);

The function takes an event handle as its only parameter and returns TRUE or FALSE based on the success or failure of the call. When an application is finished with an event object, it should call the WSACloseEvent function to free the system resources used by an event handle. The WSACloseEvent function is defined as

BOOL WSACloseEvent(WSAEVENT hEvent);

This function also takes an event handle as its only parameter and returns TRUE if successful or FALSE if the call fails.

Once a socket is associated with an event object handle, the application can begin processing I/O by waiting for network events to trigger the operating state of the event object handle. The WSAWaitForMultipleEvents function is designed to wait on one or more event object handles and returns either when one or all of the specified handles are in the signaled state or when a specified timeout interval expires. WSAWaitForMultipleEvents is defined as

DWORD WSAWaitForMultipleEvents(     DWORD cEvents,     const WSAEVENT FAR * lphEvents,     BOOL fWaitAll,     DWORD dwTimeout,     BOOL fAlertable );

The cEvents and lphEvents parameters define an array of WSAEVENT objects in which cEvents is the number of event objects in the array and lphEvents is a pointer to the array. WSAWaitForMultipleEvents can support only a maximum of WSA_MAXIMUM_WAIT_EVENTS objects, which is defined as 64. Therefore, this I/O model is capable of supporting only a maximum of 64 sockets at a time for each thread that makes the WSAWaitForMultipleEvents call. If you need to have this model manage more than 64 sockets, you should create additional worker threads to wait on more event objects. The fWaitAll parameter specifies how WSAWaitForMultipleEvents waits for objects in the event array. If TRUE, the function returns when all event objects in the lphEvents array are signaled. If FALSE, the function returns when any one of the event objects is signaled. In the latter case, the return value indicates which event object caused the function to return. Typically, applications set this parameter to FALSE and service one socket event at a time. The dwTimeout parameter specifies how long (in milliseconds) WSAWaitForMultipleEvents will wait for a network event to occur. The function returns if the interval expires, even if conditions specified by the fWaitAll parameter are not satisfied. If the timeout value is 0, the function tests the state of the specified event objects and returns immediately, which effectively allows an application to poll on the event objects. If no events are ready for processing, WSAWaitForMultipleEvents returns WSA_WAIT_TIMEOUT. If dwsTimeout is set to WSA_INFINITE, the function returns only when a network event signals an event object. The final parameter, fAlertable, can be ignored when you're using the WSAEventSelect model and should be set to FALSE. It is intended for use in processing completion routines in the overlapped I/O model, which will be described later in this chapter.

Note that by servicing signaled events one at a time (by setting the fWaitAll parameter to FALSE), it is possible to starve sockets toward the end of the event array. Consider the following code:

WSAEVENT HandleArray[WSA_MAXIMUM_WAIT_EVENTS]; int WaitCount=0, ret, index; // Assign event handles into HandleArray while (1) { ret = WSAWaitForMultipleEvents( WaitCount,  HandleArray,  FALSE,  WSA_INFINITE,  TRUE);     if ((ret != WSA_WAIT_FAILED) && (ret != WSA_WAIT_TIMEOUT)) { index = ret - WSA_WAIT_OBJECT_0; // Service event signaled on HandleArray[index] WSAResetEvent(HandleArray[index]); } }

If the socket connection associated in index 0 of the event array is continually receiving data such that after the event is reset additional data arrives causing the event to be signaled again, the rest of the events in the array are starved. This is clearly undesirable. Once an event within the loop is signaled and handled, all events in the array should be checked to see if they are signaled as well. This can be accomplished by using WSAWaitForMultipleEvents with each individual event handle after the first signaled event and specifying a dwTimeOut of zero.

When WSAWaitForMultipleEvents receives network event notification of an event object, it returns a value indicating the event object that caused the function to return. As a result, your application can determine which network event type is available on a particular socket by referencing the signaled event in the event array and matching it with the socket associated with the event. When you reference the events in the event array, you should reference them using the return value of WSAWaitForMultipleEvents minus the predefined value WSA_WAIT_EVENT_0. For example:

Index = WSAWaitForMultipleEvents(...); MyEvent = EventArray[Index - WSA_WAIT_EVENT_0];

Once you have the socket that caused the network event, you can determine which network events are available by calling the WSAEnumNetworkEvents function, which is defined as

int WSAEnumNetworkEvents(     SOCKET s,     WSAEVENT hEventObject,     LPWSANETWORKEVENTS lpNetworkEvents );

The s parameter represents the socket that caused the network event, and the hEventObject parameter is an optional parameter representing an event handle identifying an associated event object to be reset. Because our event object is in a signaled state, we can pass it in and it will be set to a non-signaled state. The hEventObject parameter is optional in case you wish to reset the event manually via the WSAResetEvent function. The final parameter, lpNetworkEvents, takes a pointer to a WSANETWORKEVENTS structure, which is used to retrieve network event types that occurred on the socket and any associated error codes. The WSANETWORKEVENTS structure is defined as

typedef struct _WSANETWORKEVENTS {     long lNetworkEvents;     int  iErrorCode[FD_MAX_EVENTS]; } WSANETWORKEVENTS, FAR * LPWSANETWORKEVENTS;

The lNetworkEvents parameter is a value that indicates all the network event types (see Table 5-3) that have occurred on the socket.


	More than one network event type can occur whenever an event is signaled. For example, a busy server application might receive FD_READ and FD_WRITE notification at the same time.

The iErrorCode parameter is an array of error codes associated with the events in lNetworkEvents. For each network event type, there is a special event index similar to the event type names—except for an additional “_BIT” string appended to the event name. For example, for the FD_READ event type, the index identifier for the iErrorCode array is named FD_READ_BIT. The following code fragment demonstrates this for an FD_READ event:

// Process FD_READ notification if (NetworkEvents.lNetworkEvents & FD_READ) {     if (NetworkEvents.iErrorCode[FD_READ_BIT] != 0)     {        printf("FD_READ failed with error %d\n",             NetworkEvents.iErrorCode[FD_READ_BIT]);     } }

After you process the events in the WSANETWORKEVENTS structure, your application should continue waiting for more network events on all of the available sockets. The following example demonstrates how to develop a server and manage event objects when using the WSAEventSelect I/O model. The code highlights the steps needed to develop a basic server application capable of managing one or more sockets at a time.

SOCKET SocketArray [WSA_MAXIMUM_WAIT_EVENTS]; WSAEVENT EventArray [WSA_MAXIMUM_WAIT_EVENTS],          NewEvent; SOCKADDR_IN InternetAddr; SOCKET Accept, Listen; DWORD EventTotal = 0; DWORD Index, i; // Set up a TCP socket for listening on port 5150 Listen = socket (PF_INET, SOCK_STREAM, 0); InternetAddr.sin_family = AF_INET; InternetAddr.sin_addr.s_addr = htonl(INADDR_ANY); InternetAddr.sin_port = htons(5150); bind(Listen, (PSOCKADDR) &InternetAddr,     sizeof(InternetAddr)); NewEvent = WSACreateEvent(); WSAEventSelect(Listen, NewEvent,     FD_ACCEPT   FD_CLOSE); listen(Listen, 5); SocketArray[EventTotal] = Listen; EventArray[EventTotal] = NewEvent; EventTotal++; while(TRUE) {     // Wait for network events on all sockets     Index = WSAWaitForMultipleEvents(EventTotal,         EventArray, FALSE, WSA_INFINITE, FALSE);     Index = Index - WSA_WAIT_EVENT_0;     // Iterate through all events to see if more than one is signaled     for(i=Index; i < EventTotal ;i++     {      Index = WSAWaitForMultipleEvents(1, &EventArray[i], TRUE, 1000,        FALSE);      if ((Index == WSA_WAIT_FAILED)   (Index == WSA_WAIT_TIMEOUT))          continue;      else      {          Index = i;          WSAEnumNetworkEvents(              SocketArray[Index],              EventArray[Index],              &NetworkEvents);          // Check for FD_ACCEPT messages               if (NetworkEvents.lNetworkEvents & FD_ACCEPT)          {               if (NetworkEvents.iErrorCode[FD_ACCEPT_BIT] != 0)              {                  printf("FD_ACCEPT failed with error %d\n",                       NetworkEvents.iErrorCode[FD_ACCEPT_BIT]);                  break;              }              // Accept a new connection, and add it to the              // socket and event lists              Accept = accept(                  SocketArray[Index],                  NULL, NULL);              // We cannot process more than               // WSA_MAXIMUM_WAIT_EVENTS sockets, so close              // the accepted socket              if (EventTotal > WSA_MAXIMUM_WAIT_EVENTS)              {                  printf("Too many connections");                  closesocket(Accept);                  break;              }              NewEvent = WSACreateEvent();              WSAEventSelect(Accept, NewEvent,                  FD_READ   FD_WRITE   FD_CLOSE);              EventArray[EventTotal] = NewEvent;              SocketArray[EventTotal] = Accept;              EventTotal++;              printf("Socket %d connected\n", Accept);          }          // Process FD_READ notification          if (NetworkEvents.lNetworkEvents & FD_READ)          {              if (NetworkEvents.iErrorCode[FD_READ_BIT] != 0)              {                  printf("FD_READ failed with error %d\n",                       NetworkEvents.iErrorCode[FD_READ_BIT]);                  break;              }              // Read data from the socket              recv(SocketArray[Index - WSA_WAIT_EVENT_0],                  buffer, sizeof(buffer), 0);          }          // Process FD_WRITE notification          if (NetworkEvents.lNetworkEvents & FD_WRITE)          {              if (NetworkEvents.iErrorCode[FD_WRITE_BIT] != 0)              {                  printf("FD_WRITE failed with error %d\n",                       NetworkEvents.iErrorCode[FD_WRITE_BIT]);                  break;              }              send(SocketArray[Index - WSA_WAIT_EVENT_0],                 buffer, sizeof(buffer), 0);             }             if (NetworkEvents.lNetworkEvents & FD_CLOSE)             {                 if (NetworkEvents.iErrorCode[FD_CLOSE_BIT] != 0)                 {                     printf("FD_CLOSE failed with error %d\n",                          NetworkEvents.iErrorCode[FD_CLOSE_BIT]);                     break;                 }                 closesocket(SocketArray[Index]);                 // Remove socket and associated event from                 // the Socket and Event arrays and decrement                 // EventTotal                 CompressArrays(EventArray, SocketArray, &EventTotal);             }       }     } }

The WSAEventSelect model offers several advantages. It is conceptually simple and it does not require a windowed environment. The only drawback is its limitation of waiting on only 64 events at a time, which necessitates managing a thread pool when dealing with many sockets. Also, because many threads are required to handle a large number of socket connections, this model does not scale as well as the overlapped models discussed next.

The Overlapped Model

The overlapped I/O model in Winsock offers applications better system performance than any of the I/O models explained so far. The overlapped model's basic design allows your application to post one or more asynchronous I/O requests at a time using an overlapped data structure. At a later point, the application can service the submitted requests after they have completed. This model is available on all Windows platforms except Windows CE. The model's overall design is based on the Windows overlapped I/O mechanisms available for performing I/O operations on devices using the ReadFile and WriteFile functions.

Originally, the Winsock overlapped I/O model was available only to Winsock 1.1 applications running on Windows NT. Applications could take advantage of the model by calling ReadFile and WriteFile on a socket handle and specifying an overlapped structure. Since the release of Winsock 2, overlapped I/O has been incorporated into new Winsock functions, such as WSASend and WSARecv. As a result, the overlapped I/O model is now available on all Windows platforms that feature Winsock 2.


	With the release of Winsock 2, overlapped I/O can still be used with the functions ReadFile and WriteFile under Windows NT and Windows 2000. However, this functionality was not added to Windows 95, Windows 98, and Windows Me. For compatibility across platforms, you should always consider using the WSARecv and WSASend functions instead of the Windows ReadFile and WriteFile functions. This section will only describe how to use overlapped I/O through the Winsock 2 functions.

To use the overlapped I/O model on a socket, you must first create a socket that has the overlapped flag set. See Chapter 2 for more information on creating overlapped enabled sockets.

After you successfully create a socket and bind it to a local interface, overlapped I/O operations can commence by calling the Winsock functions listed below and specifying an optional WSAOVERLAPPED structure.

WSASend
WSASendTo
WSARecv
WSARecvFrom
WSAIoctl
WSARecvMsg
AcceptEx
ConnectEx
TransmitFile
TransmitPackets
DisconnectEx
WSANSPIoctl

To use overlapped I/O, each function takes a WSAOVERLAPPED structure as a parameter. When these functions are called with a WSAOVERLAPPED structure, they complete immediately—regardless of the socket's mode (described at the beginning of this chapter). They rely on the WSAOVERLAPPED structure to manage the completion of an I/O request. There are essentially two methods for managing the completion of an overlapped I/O request: your application can wait for event object notification or it can process completed requests through completion routines. The first six functions in the list have another parameter in common: a WSAOVERLAPPED_COMPLETION_ROUTINE. This parameter is an optional pointer to a completion routine function that gets called when an overlapped request completes. We will explore the event notification method next. Later in this chapter, you will learn how to use optional completion routines instead of events to process completed overlapped requests.

Event Notification

The event notification method of overlapped I/O requires associating Windows event objects with WSAOVERLAPPED structures. When I/O calls such as WSASend and WSARecv are made using a WSAOVERLAPPED structure, they return immediately. Typically, you will find that these I/O calls fail with the return value SOCKET_ERROR and that WSAGetLastError reports a WSA_IO_PENDING error status. This error status simply means that the I/O operation is in progress. At a later time, your application will need to determine when an overlapped I/O request completes by waiting on the event object associated with the WSAOVERLAPPED structure. The WSAOVERLAPPED structure provides the communication medium between the initiation of an overlapped I/O request and its subsequent completion, and is defined as

typedef struct WSAOVERLAPPED {      DWORD    Internal;     DWORD    InternalHigh;     DWORD    Offset;     DWORD    OffsetHigh;     WSAEVENT hEvent; } WSAOVERLAPPED, FAR * LPWSAOVERLAPPED;

The Internal, InternalHigh, Offset, and OffsetHigh fields are all used internally by the system and an application should not manipulate or directly use them. The hEvent field, on the other hand, allows an application to associate an event object handle with this operation.

When an overlapped I/O request finally completes, your application is responsible for retrieving the overlapped results. In the event notification method, Winsock will change the event-signaling state of an event object that is associated with a WSAOVERLAPPED structure from non-signaled to signaled when an overlapped request finally completes. Because an event object is assigned to the WSAOVERLAPPED structure, you can easily determine when an overlapped I/O call completes by calling the WSAWaitForMultipleEvents function, which we also described in the WSAEventSelect I/O model. WSAWaitForMultipleEvents waits a specified amount of time for one or more event objects to become signaled. We can't stress this point enough: remember that WSAWaitForMultipleEvents is capable of waiting on only 64 event objects at a time. Once you determine which overlapped request has completed, you need to determine the success or failure of the overlapped call by calling WSAGetOverlappedResult, which is defined as

BOOL WSAGetOverlappedResult(      SOCKET s,     LPWSAOVERLAPPED lpOverlapped,      LPDWORD lpcbTransfer,      BOOL fWait,      LPDWORD lpdwFlags );

The s parameter identifies the socket that was specified when the overlapped operation was started. The lpOverlapped parameter is a pointer to the WSAOVERLAPPED structure that was specified when the overlapped operation was started. The lpcbTransfer parameter is a pointer to a DWORD variable that receives the number of bytes that were actually transferred by an overlapped send or receive operation. The fWait parameter determines whether the function should wait for a pending overlapped operation to complete. If fWait is TRUE, the function does not return until the operation has been completed. If fWait is FALSE and the operation is still pending, WSAGetOverlappedResult returns FALSE with the error WSA_IO_INCOMPLETE. Because in our case we waited on a signaled event for overlapped completion, this parameter has no effect. The final parameter, lpdwFlags, is a pointer to a DWORD that will receive resulting flags if the originating overlapped call was made with the WSARecv or the WSARecvFrom function.

If the WSAGetOverlappedResult function succeeds, the return value is TRUE. This means that your overlapped operation has completed successfully and that the value pointed to by lpcbTransfer has been updated. If the return value is FALSE, one of the following statements is true:

The overlapped I/O operation is still pending (as we previously described).
The overlapped operation completed, but with errors.
The overlapped operation's completion status could not be determined because of errors in one or more of the parameters supplied to WSAGetOverlappedResult.

Upon failure, the value pointed to by lpcbTransfer will not be updated, and your application should call the WSAGetLastError function to determine the cause of the failure.

The following sample of code demonstrates how to structure a simple server application that is capable of managing overlapped I/O on one socket using the event notification described above.

#define DATA_BUFSIZE     4096 void main(void) {     WSABUF DataBuf;     char buffer[DATA_BUFSIZE];     DWORD EventTotal = 0,           RecvBytes=0,           Flags=0;     WSAEVENT EventArray[WSA_MAXIMUM_WAIT_EVENTS];     WSAOVERLAPPED AcceptOverlapped;     SOCKET ListenSocket, AcceptSocket;     // Step 1:     //  Start Winsock and set up a listening socket     ...     // Step 2:     //  Accept an inbound connection     AcceptSocket = accept(ListenSocket, NULL, NULL);     // Step 3:     //  Set up an overlapped structure     EventArray[EventTotal] = WSACreateEvent();     ZeroMemory(&AcceptOverlapped,         sizeof(WSAOVERLAPPED));      AcceptOverlapped.hEvent = EventArray[EventTotal];     DataBuf.len = DATA_BUFSIZE;     DataBuf.buf = buffer;     EventTotal++;     // Step 4:     //  Post a WSARecv request to begin receiving data     //  on the socket     if (WSARecv(AcceptSocket, &DataBuf, 1, &RecvBytes,         &Flags, &AcceptOverlapped, NULL) == SOCKET_ERROR)     {      if (WSAGetLastError() != WSA_IO_PENDING)      {          // Error occurred      }  }     // Process overlapped receives on the socket     while(TRUE)     {      DWORD    Index;         // Step 5:         //  Wait for the overlapped I/O call to complete         Index = WSAWaitForMultipleEvents(EventTotal,             EventArray, FALSE, WSA_INFINITE, FALSE);         // Index should be 0 because we          // have only one event handle in EventArray         // Step 6:         //  Reset the signaled event         WSAResetEvent(             EventArray[Index - WSA_WAIT_EVENT_0]);         // Step 7:         //  Determine the status of the overlapped         //  request         WSAGetOverlappedResult(AcceptSocket,             &AcceptOverlapped, &BytesTransferred,             FALSE, &Flags);              // First check to see whether the peer has closed         // the connection, and if so, close the         // socket         if (BytesTransferred == 0)         {             printf("Closing socket %d\n", AcceptSocket);             closesocket(AcceptSocket);             WSACloseEvent(                 EventArray[Index - WSA_WAIT_EVENT_0]);             return;         }         // Do something with the received data          // DataBuf contains the received data         ...         // Step 8:         //  Post another WSARecv() request on the socket         Flags = 0;         ZeroMemory(&AcceptOverlapped,             sizeof(WSAOVERLAPPED));         AcceptOverlapped.hEvent = EventArray[Index -              WSA_WAIT_EVENT_0];         DataBuf.len = DATA_BUFSIZE;         DataBuf.buf = buffer;         if (WSARecv(AcceptSocket, &DataBuf, 1,             &RecvBytes, &Flags, &AcceptOverlapped,             NULL) == SOCKET_ERROR)         {             if (WSAGetLastError() != WSA_IO_PENDING)             {                 // Unexpected error             }         }     } }

The application outlines the following programming steps:

Create a socket and begin listening for a connection on a specified port.
Accept an inbound connection.
Create a WSAOVERLAPPED structure for the accepted socket and assign an event object handle to the structure. Also assign the event object handle to an event array to be used later by the WSAWaitForMultipleEvents function.
Post an asynchronous WSARecv request on the socket by specifying the WSAOVERLAPPED structure as a parameter.
Call WSAWaitForMultipleEvents using the event array and wait for the event associated with the overlapped call to become signaled.
Determine the return status of the overlapped call by using WSA-GetOverlappedResult.
Reset the event object by using WSAResetEvent with the event array and process the completed overlapped request.
Post another overlapped WSARecv request on the socket.
Repeat steps 5–8.

This example can easily be expanded to handle more than one socket by moving the overlapped I/O processing portion of the code to a separate thread and allowing the main application thread to service additional connection requests.


	If a Winsock function is called in an overlapped fashion (either by specifying an event within the WSAOVERLAPPED structure or with a completion routine), the operation might complete immediately. For example, calling WSARecv when data has already been received and buffered causes WSARecv to return NO_ERROR. If any overlapped function fails with WSA_IO_PENDING or immediately succeeds, the completion event will always be signaled and the completion routine will be scheduled to run (if specified). For overlapped I/O with a completion port, this means that completion notification will be posted to the completion port for servicing.

Completion Routines

Completion routines are the other method your application can use to manage completed overlapped I/O requests. Completion routines are simply functions that you optionally pass to an overlapped I/O request and that the system invokes when an overlapped I/O request completes. Their primary role is to service a completed I/O request using the caller's thread. In addition, applications can continue overlapped I/O processing through the completion routine.

To use completion routines for overlapped I/O requests, your application must specify a completion routine, along with a WSAOVERLAPPED structure, to an I/O bound Winsock function (described previously). A completion routine must have the following function prototype:

void CALLBACK CompletionROUTINE(     DWORD dwError,     DWORD cbTransferred,     LPWSAOVERLAPPED lpOverlapped,     DWORD dwFlags );

When an overlapped I/O request completes using a completion routine, the parameters contain the following information:

The parameter dwError specifies the completion status for the overlapped operation as indicated by lpOverlapped.
The cbTransferred parameter specifies the number of bytes that were transferred during the overlapped operation.
The lpOverlapped parameter is the same as the WSAOVERLAPPED structure passed into the originating I/O call.
The dwFlags parameter returns any flags that the operation may have completed with (such as from WSARecv).

There is a major difference between overlapped requests submitted with a completion routine and overlapped requests submitted with an event object. The WSAOVERLAPPED structure's event field, hEvent, is not used, which means you cannot associate an event object with the overlapped request. Once you make an overlapped I/O call with a completion routine, your calling thread must eventually service the completion routine once it has completed. This requires you to place your calling thread in an alertable wait state and process the completion routine later, after the I/O operation has completed. The WSAWaitForMultipleEvents function can be used to put your thread in an alertable wait state. The catch is that you must also have at least one event object available for the WSAWaitForMultipleEvents function. If your application handles only overlapped requests with completion routines, you are not likely to have any event objects around for processing. As an alternative, your application can use the Windows SleepEx function to set your thread in an alertable wait state. Of course, you can also create a dummy event object that is not associated with anything. If your calling thread is always busy and not in an alertable wait state, no posted completion routine will ever get called.

As you saw earlier, WSAWaitForMultipleEvents normally waits for event objects associated with WSAOVERLAPPED structures. This function is also designed to place your thread in an alertable wait state and to process completion routines for completed overlapped I/O requests if you set the parameter fAlertable to TRUE. When overlapped I/O requests complete with a completion routine, the return value is WSA_IO_COMPLETION instead of an event object index in the event array. The SleepEx function provides the same behavior as WSAWaitForMultipleEvents except that it does not need any event objects. The SleepEx function is defined as

DWORD SleepEx(     DWORD dwMilliseconds,      BOOL bAlertable );

The dwMilliseconds parameter defines how long in milliseconds SleepEx will wait. If dwMilliseconds is set to INFINITE, SleepEx waits indefinitely. The bAlertable parameter determines how a completion routine will execute. If bAlertable is set to FALSE and an I/O completion callback occurs, the I/O completion function is not executed and the function does not return until the wait period specified in dwMilliseconds has elapsed. If it is set to TRUE, the completion routine executes and the SleepEx function returns WAIT_IO_COMPLETION.

The following code outlines how to structure a simple server application that is capable of managing one socket request using completion routines as described earlier.

#define DATA_BUFSIZE    4096 SOCKET AcceptSocket,        ListenSocket; WSABUF DataBuf; WSAEVENT EventArray[MAXIMUM_WAIT_OBJECTS]; DWORD Flags,       RecvBytes,       Index; char buffer[DATA_BUFSIZE]; void main(void) {     WSAOVERLAPPED Overlapped;     // Step 1:     //  Start Winsock, and set up a listening socket     ...     // Step 2:     //  Accept a new connection     AcceptSocket = accept(ListenSocket, NULL, NULL);     // Step 3:     //  Now that we have an accepted socket, start     //  processing I/O using overlapped I/O with a     //  completion routine. To get the overlapped I/O     //  processing started, first submit an     //  overlapped WSARecv() request.     Flags = 0;              ZeroMemory(&Overlapped, sizeof(WSAOVERLAPPED));     DataBuf.len = DATA_BUFSIZE;     DataBuf.buf = buffer;      // Step 4:     //  Post an asynchronous WSARecv() request     //  on the socket by specifying the WSAOVERLAPPED     //  structure as a parameter, and supply       //  the WorkerRoutine function below as the      //  completion routine     if (WSARecv(AcceptSocket, &DataBuf, 1, &RecvBytes,          &Flags, &Overlapped, WorkerRoutine)          == SOCKET_ERROR)     {         if (WSAGetLastError() != WSA_IO_PENDING)         {             printf("WSARecv() failed with error %d\n",                  WSAGetLastError());             return;         }     }     // Because the WSAWaitForMultipleEvents() API     // requires waiting on one or more event objects,     // we will have to create a dummy event object.     // As an alternative, we can use SleepEx()     // instead.     EventArray [0] = WSACreateEvent();      while(TRUE)     {         // Step 5:         Index = WSAWaitForMultipleEvents(1, EventArray,             FALSE, WSA_INFINITE, TRUE);         // Step 6:         if (Index == WAIT_IO_COMPLETION)         {             // An overlapped request completion routine             // just completed. Continue servicing              // more completion routines.             continue;         }         else         {             // A bad error occurred: stop processing!             // If we were also processing an event             // object, this could be an index to             // the event array.             return;         }     } } void CALLBACK WorkerRoutine(DWORD Error,                              DWORD BytesTransferred,                              LPWSAOVERLAPPED Overlapped,                             DWORD InFlags) {     DWORD SendBytes, RecvBytes;     DWORD Flags;     if (Error != 0   BytesTransferred == 0)     {         // Either a bad error occurred on the socket         // or the socket was closed by a peer         closesocket(AcceptSocket);         return;     }     // At this point, an overlapped WSARecv() request     // completed successfully. Now we can retrieve the     // received data that is contained in the variable     // DataBuf. After processing the received data, we      // need to post another overlapped WSARecv() or     // WSASend() request. For simplicity, we will post      // another WSARecv() request.     Flags = 0;              ZeroMemory(&Overlapped, sizeof(WSAOVERLAPPED));     DataBuf.len = DATA_BUFSIZE;     DataBuf.buf = buffer;     if (WSARecv(AcceptSocket, &DataBuf, 1, &RecvBytes,          &Flags, &Overlapped, WorkerRoutine)          == SOCKET_ERROR)     {         if (WSAGetLastError() != WSA_IO_PENDING )         {             printf("WSARecv() failed with error %d\n",                  WSAGetLastError());             return;         }     } }

The application illustrates the following programming steps:

Create a socket and begin listening for a connection on a specified port.
Accept an inbound connection.
Create a WSAOVERLAPPED structure for the accepted socket.
Post an asynchronous WSARecv request on the socket by specifying the WSAOVERLAPPED structure as a parameter and supplying a completion routine.
Call WSAWaitForMultipleEvents with the fAlertable parameter set to TRUE and wait for an overlapped request to complete. When an overlapped request completes, the completion routine automatically executes and WSAWaitForMultipleEvents returns WSA_IO_COMPLETION. Inside the completion routine, then post another overlapped WSARecv request with a completion routine.
Verify that WSAWaitForMultipleEvents returns WSA_IO_COMPLETION.
Repeat steps 5 and 6.

The overlapped model provides high-performance socket I/O. It is different from all the previous models because an application posts buffers to send and receive data that the system uses directly. That is, if an application posts an overlapped receive with a 10 KB buffer and data arrives on the socket, it is copied directly into this posted buffer. In the previous models, data would arrive and be copied to the per-socket receive buffers at which point the application is notified of the capability to read. After the application calls a receive function, the data is copied from the per-socket buffer to the application's buffer. Chapter 6 will discuss strategies for developing high-performance, scalable Winsock applications. Chapter 6 will also discuss the WSARecvMsg, AcceptEx, ConnectEx, TransmitFile, TransmitPackets, and DisconnectEx API functions in more detail.

The disadvantage of using overlapped I/O with events is, again, the limitation of being able to wait on a maximum of 64 events at a time. Completion routines are a good alternative but care must be taken to ensure that the thread that posted the operation goes into an alertable wait state in order for the completion routine to complete. Also, care should be taken to make sure that the completion routines do not perform excessive computations so that these routines may fire as fast as possible under a heavy load.

The Completion Port Model

For newcomers, the completion port model seems overwhelmingly complicated because extra work is required to add sockets to a completion port when compared to the initialization steps for the other I/O models. However, as you will see, these steps are not that complicated once you understand them. Also, the completion port model offers the best system performance possible when an application has to manage many sockets at once. Unfortunately, it's available only on Windows NT, Windows 2000, and Windows XP; however, the completion port model offers the best scalability of all the models discussed so far. This model is well suited to handling hundreds or thousands of sockets.

Essentially, the completion port model requires you to create a Windows completion port object that will manage overlapped I/O requests using a specified number of threads to service the completed overlapped I/O requests. Note that a completion port is actually a Windows I/O construct that is capable of accepting more than just socket handles. However, this section will describe only how to take advantage of the completion port model by using socket handles. To begin using this model, you are required to create an I/O completion port object that will be used to manage multiple I/O requests for any number of socket handles. This is accomplished by calling the CreateIoCompletionPort function, which is defined as

HANDLE CreateIoCompletionPort(     HANDLE FileHandle,     HANDLE ExistingCompletionPort,     DWORD CompletionKey,     DWORD NumberOfConcurrentThreads );

Before examining the parameters in detail, be aware that this function is actually used for two distinct purposes:

To create a completion port object
To associate a handle with a completion port

When you initially create a completion port object, the only parameter of interest is NumberOfConcurrentThreads; the first three parameters are not significant. The NumberOfConcurrentThreads parameter is special because it defines the number of threads that are allowed to execute concurrently on a completion port. Ideally, you want only one thread per processor to service the completion port to avoid thread context switching. The value 0 for this parameter tells the system to allow as many threads as there are processors in the system. The following code creates an I/O completion port.

 CompletionPort = CreateIoCompletionPort(INVALID_HANDLE_VALUE,     NULL, 0, 0);

This will return a handle that is used to identify the completion port when a socket handle is assigned to it.

Worker Threads and Completion Ports

After a completion port is successfully created, you can begin to associate socket handles with the object. Before associating sockets, though, you have to create one or more worker threads to service the completion port when socket I/O requests are posted to the completion port object. At this point, you might wonder how many threads should be created to service the completion port. This is actually one of the more complicated aspects of the completion port model because the number needed to service I/O requests depends on the overall design of your application. It's important to note the distinction between number of concurrent threads to specify when calling CreateIoCompletionPort versus the number of worker threads to create; they do not represent the same thing. We recommended previously that you should have the CreateIoCompletionPort function specify one thread per processor to avoid thread context switching. The NumberOfConcurrentThreads parameter of CreateIoCompletionPort explicitly tells the system to allow only n threads to operate at a time on the completion port. If you create more than n worker threads on the completion port, only n threads will be allowed to operate at a time. (Actually, the system might exceed this value for a short amount of time, but the system will quickly bring it down to the value you specify in CreateIoCompletionPort.) You might be wondering why you would create more worker threads than the number specified by the CreateIoCompletionPort call. As we mentioned previously, this depends on the overall design of your application. If one of your worker threads calls a function—such as Sleep or WaitForSingleObject—and becomes suspended, another thread will be allowed to operate in its place. In other words, you always want to have as many threads available for execution as the number of threads you allow to execute in the CreateIoCompletionPort call. Thus, if you expect your worker thread to ever become blocked, it is reasonable to create more worker threads than the value specified in CreateIoCompletionPort's NumberOfConcurrentThreads parameter.

Once you have enough worker threads to service I/O requests on the completion port, you can begin to associate socket handles with the completion port. This requires calling the CreateIoCompletionPort function on an existing completion port and supplying the first three parameters—FileHandle, ExistingCompletionPort, and CompletionKey—with socket information. The FileHandle parameter represents a socket handle to associate with the completion port. The ExistingCompletionPort parameter identifies the completion port to which the socket handle is to be associated with. The CompletionKey parameter identifies per-handle data that you can associate with a particular socket handle. Applications are free to store any type of information associated with a socket by using this key. We call it per-handle data because it represents data associated with a socket handle. It is useful to store the socket handle using the key as a pointer to a data structure containing the socket handle and other socket-specific information. As we will see later in this chapter, the thread routines that service the completion port can retrieve socket-handle–specific information using this key.

Let's begin to construct a basic application framework from what we've described so far. The following example demonstrates how to start developing an echo server application using the completion port model. In this code, we take the following preparation steps:

Create a completion port. The fourth parameter is left as 0, specifying that only one worker thread per processor will be allowed to execute at a time on the completion port.
Determine how many processors exist on the system.
Create worker threads to service completed I/O requests on the completion port using processor information in step 2. In the case of this simple example, we create one worker thread per processor because we do not expect our threads to ever get in a suspended condition in which there would not be enough threads to execute for each processor. When the CreateThread function is called, you must supply a worker routine that the thread executes upon creation. We will discuss the worker thread's responsibilities later in this section.
Prepare a listening socket to listen for connections on port 5150.
Accept inbound connections using the accept function.
Create a data structure to represent per-handle data and save the accepted socket handle in the structure.
Associate the new socket handle returned from accept with the completion port by calling CreateIoCompletionPort. Pass the per-handle data structure to CreateIoCompletionPort via the completion key parameter.
Start processing I/O on the accepted connection. Essentially, you want to post one or more asynchronous WSARecv or WSASend requests on the new socket using the overlapped I/O mechanism. When these I/O requests complete, a worker thread services the I/O requests and continues processing future I/O requests, as we will see later in the worker routine specified in step 3.

Repeat steps 5–8 until server terminates.

HANDLE CompletionPort; WSADATA wsd; SYSTEM_INFO SystemInfo; SOCKADDR_IN InternetAddr; SOCKET Listen; int i; typedef struct _PER_HANDLE_DATA  { SOCKET Socket; SOCKADDR_STORAGE  ClientAddr; // Other information useful to be associated with the handle } PER_HANDLE_DATA, * LPPER_HANDLE_DATA; // Load Winsock StartWinsock(MAKEWORD(2,2), &wsd); // Step 1: // Create an I/O completion port CompletionPort = CreateIoCompletionPort(     INVALID_HANDLE_VALUE, NULL, 0, 0); // Step 2: // Determine how many processors are on the system GetSystemInfo(&SystemInfo); // Step 3: // Create worker threads based on the number of // processors available on the system. For this // simple case, we create one worker thread for each // processor. for(i = 0; i < SystemInfo.dwNumberOfProcessors; i++) {     HANDLE ThreadHandle;     // Create a server worker thread, and pass the     // completion port to the thread. NOTE: the     // ServerWorkerThread procedure is not defined     // in this listing.     ThreadHandle = CreateThread(NULL, 0,         ServerWorkerThread, CompletionPort,         0, NULL;     // Close the thread handle     CloseHandle(ThreadHandle); } // Step 4: // Create a listening socket Listen = WSASocket(AF_INET, SOCK_STREAM, 0, NULL, 0,     WSA_FLAG_OVERLAPPED); InternetAddr.sin_family = AF_INET; InternetAddr.sin_addr.s_addr = htonl(INADDR_ANY); InternetAddr.sin_port = htons(5150); bind(Listen, (PSOCKADDR) &InternetAddr,     sizeof(InternetAddr)); // Prepare socket for listening listen(Listen, 5); while(TRUE) {     PER_HANDLE_DATA *PerHandleData=NULL;     SOCKADDR_IN saRemote;     SOCKET Accept;     int RemoteLen;     // Step 5:     // Accept connections and assign to the completion     // port     RemoteLen = sizeof(saRemote);     Accept = WSAAccept(Listen, (SOCKADDR *)&saRemote,      &RemoteLen);     // Step 6:     // Create per-handle data information structure to      // associate with the socket     PerHandleData = (LPPER_HANDLE_DATA)          GlobalAlloc(GPTR, sizeof(PER_HANDLE_DATA));     printf("Socket number %d connected\n", Accept);     PerHandleData->Socket = Accept;     memcpy(&PerHandleData->ClientAddr, &saRemote, RemoteLen);     // Step 7:     // Associate the accepted socket with the     // completion port     CreateIoCompletionPort((HANDLE) Accept,         CompletionPort, (DWORD) PerHandleData, 0);     // Step 8:     //  Start processing I/O on the accepted socket.     //  Post one or more WSASend() or WSARecv() calls     //  on the socket using overlapped I/O.     WSARecv(...); }     DWORD WINAPI ServerWorkerThread(LPVOID lpParam)     {     // The requirements for the worker thread will be      // discussed later.     return 0; }

Completion Ports and Overlapped I/O

After associating a socket handle with a completion port, you can begin processing I/O requests by posting overlapped send and receive requests on the socket handle. You can now start to rely on the completion port for I/O completion notification. Basically, the completion port model takes advantage of the Windows overlapped I/O mechanism in which Winsock API calls such as WSASend and WSARecv return immediately when called. It is up to your application to retrieve the results of the calls at a later time through an OVERLAPPED structure. In the completion port model, this is accomplished by having one or more worker threads wait on the completion port using the GetQueuedCompletionStatus function, which is defined as

BOOL GetQueuedCompletionStatus(     HANDLE CompletionPort,      LPDWORD lpNumberOfBytesTransferred,      PULONG_PTR lpCompletionKey,      LPOVERLAPPED * lpOverlapped,      DWORD dwMilliseconds );

The CompletionPort parameter represents the completion port to wait on. The lpNumberOfBytesTransferred parameter receives the number of bytes transferred after a completed I/O operation, such as WSASend or WSARecv. The lpCompletionKey parameter returns per-handle data for the socket that was originally passed into the CreateIoCompletionPort function. As we already mentioned, we recommend saving the socket handle in this key. The lpOverlapped parameter receives the WSAOVERLAPPED structure of the completed I/O operation. This is actually an important parameter because it can be used to retrieve per I/O–operation data, which we will describe shortly. The final parameter, dwMilliseconds, specifies the number of milliseconds that the caller is willing to wait for a completion packet to appear on the completion port. If you specify INFINITE, the call waits forever.

Per-handle Data and Per-I/O Operation Data

When a worker thread receives I/O completion notification from the GetQueuedCompletionStatus API call, the lpCompletionKey and lpOverlapped parameters contain socket information that can be used to continue processing I/O on a socket through the completion port. Two types of important socket data are available through these parameters: per-handle data and per-I/O operation data.

The lpCompletionKey parameter contains what we call per-handle data because the data is related to a socket handle when a socket is first associated with the completion port. This is the data that is passed as the CompletionKey parameter of the CreateIoCompletionPort API call. As we noted earlier, your application can pass any type of socket information through this parameter. Typically, applications will store the socket handle related to the I/O request here.

The lpOverlapped parameter contains an OVERLAPPED structure followed by what we call per-I/O operation data, which is anything that your worker thread will need to know when processing a completion packet (echo the data back, accept the connection, post another read, and so on). Per-I/O operation data is any number of bytes contained in a structure also containing an OVERLAPPED structure that you pass into a function that expects an OVERLAPPED structure. A simple way to make this work is to define a structure and place an OVERLAPPED structure as a field of the new structure. For example, we declare the following data structure to manage per-I/O operation data:

typedef struct {     OVERLAPPED Overlapped;     char       Buffer[DATA_BUFSIZE];     int    BufferLen;     int        OperationType; } PER_IO_DATA;

This structure demonstrates some important data elements you might want to relate to an I/O operation, such as the type of I/O operation (a send or receive request) that just completed. In this structure, we consider the data buffer for the completed I/O operation to be useful. To call a Winsock API function that expects an OVERLAPPED structure, you dereference the OVERLAPPED element of your structure. For example,

PER_IO_OPERATION_DATA PerIoData; WSABUF wbuf; DWORD Bytes, Flags; // Initialize wbuf ... WSARecv(socket, &wbuf, 1, &Bytes, &Flags, &(PerIoData.Overlapped),  NULL);

Later in the worker thread, GetQueuedCompletionStatus returns with an overlapped structure and completion key. To retrieve the per-I/O data the macro CONTAINING_RECORD should be used. For example,

PER_IO_DATA  *PerIoData=NULL; OVERLAPPED   *lpOverlapped=NULL; ret = GetQueuedCompletionStatus(          CompPortHandle,          &Transferred,          (PULONG_PTR)&CompletionKey,          &lpOverlapped, INFINITE); // Check for successful return PerIoData = CONTAINING_RECORD(lpOverlapped, PER_IO_DATA, Overlapped);

This macro should be used; otherwise, the OVERLAPPED member of the PER_IO_DATA structure would always have to appear first, which can be a dangerous assumption to make (especially with multiple developers working on the same code).

You can determine which operation was posted on this handle by using a field of the per-I/O structure to indicate the type of operation posted. In our example, the OperationType member would be set to indicate a read, write, etc., operation. One of the biggest benefits of per-I/O operation data is that it allows you to manage multiple I/O operations (such as read/write, multiple reads, and multiple writes) on the same handle. You might ask why you would want to post more than one I/O operation at a time on a socket. The answer is scalability. For example, if you have a multiple-processor machine with a worker thread using each processor, you could potentially have several processors sending and receiving data on a socket at the same time.

Before continuing, there is one other important aspect about Windows completion ports that needs to be stressed. All overlapped operations are guaranteed to be executed in the order that the application issued them. However, the completion notifications returned from a completion port are not guaranteed to be in that same order. That is, if an application posts two overlapped WSARecv operations, one with a 10 KB buffer and the next with a 12 KB buffer, the 10 KB buffer is filled first, followed by the 12 KB buffer. The application's worker thread may receive notification from GetQueuedCompletionStatus for the 12 KB WSARecv before the completion event for the 10 KB operation. Of course, this is only an issue when multiple operations are posted on a socket.

To complete this simple echo server sample, we need to supply a ServerWorkerThread function. The following code outlines how to develop a worker thread routine that uses per-handle data and per-I/O operation data to service I/O requests.

 DWORD WINAPI ServerWorkerThread(     LPVOID CompletionPortID) {     HANDLE CompletionPort = (HANDLE) CompletionPortID;     DWORD BytesTransferred;     LPOVERLAPPED Overlapped;     LPPER_HANDLE_DATA PerHandleData;     LPPER_IO_DATA PerIoData;     DWORD SendBytes, RecvBytes;     DWORD Flags;          while(TRUE)     {         // Wait for I/O to complete on any socket         // associated with the completion port              ret = GetQueuedCompletionStatus(CompletionPort,             &BytesTransferred,(LPDWORD)&PerHandleData,             (LPOVERLAPPED *) &PerIoData, INFINITE);         // First check to see if an error has occurred         // on the socket; if so, close the          // socket and clean up the per-handle data         // and per-I/O operation data associated with         // the socket         if (BytesTransferred == 0 &&             (PerIoData->OperationType == RECV_POSTED               PerIoData->OperationType == SEND_POSTED))         {             // A zero BytesTransferred indicates that the             // socket has been closed by the peer, so             // you should close the socket. Note:              // Per-handle data was used to reference the             // socket associated with the I/O operation.               closesocket(PerHandleData->Socket);             GlobalFree(PerHandleData);             GlobalFree(PerIoData);             continue;         }         // Service the completed I/O request. You can         // determine which I/O request has just         // completed by looking at the OperationType         // field contained in the per-I/O operation data.          if (PerIoData->OperationType == RECV_POSTED)         {             // Do something with the received data             // in PerIoData->Buffer         }         // Post another WSASend or WSARecv operation.         // As an example, we will post another WSARecv()         // I/O operation.         Flags = 0;         // Set up the per-I/O operation data for the next         // overlapped call         ZeroMemory(&(PerIoData->Overlapped),             sizeof(OVERLAPPED));         PerIoData->DataBuf.len = DATA_BUFSIZE;         PerIoData->DataBuf.buf = PerIoData->Buffer;         PerIoData->OperationType = RECV_POSTED;         WSARecv(PerHandleData->Socket,              &(PerIoData->DataBuf), 1, &RecvBytes,             &Flags, &(PerIoData->Overlapped), NULL);     } }

If an error has occurred for a given overlapped operation, GetQueuedCompletionStatus will return FALSE. Because completion ports are a Windows I/O construct, if you call GetLastError or WSAGetLastError, the error code is likely to be a Windows error code and not a Winsock error code. To retrieve the equivalent Winsock error code, WSAGetOverlappedResult can be called specifying the socket handle and WSAOVERLAPPED structure for the completed operation, after which WSAGetLastError will return the translated Winsock error code.

One final detail not outlined in the last two examples we have presented is how to properly close an I/O completion port—especially if you have one or more threads in progress performing I/O on several sockets. The main thing to avoid is freeing an OVERLAPPED structure when an overlapped I/O operation is in progress. The best way to prevent this is to call closesocket on every socket handle—any overlapped I/O operations pending will complete. Once all socket handles are closed, you need to terminate all worker threads on the completion port. This can be accomplished by sending a special completion packet to each worker thread using the PostQueuedCompletionStatus function, which informs each thread to exit immediately. PostQueuedCompletionStatus is defined as

BOOL PostQueuedCompletionStatus(     HANDLE CompletionPort,     DWORD dwNumberOfBytesTransferred,      ULONG_PTR dwCompletionKey,      LPOVERLAPPED lpOverlapped );

The CompletionPort parameter represents the completion port object to which you want to send a completion packet. The dwNumberOfBytesTransferred, dwCompletionKey, and lpOverlapped parameters each will allow you to specify a value that will be sent directly to the corresponding parameter of the GetQueuedCompletionStatus function. Thus, when a worker thread receives the three passed parameters of GetQueuedCompletionStatus, it can determine when it should exit based on a special value set in one of the three parameters. For example, you could pass the value 0 in the dwCompletionKey parameter, which a worker thread could interpret as an instruction to terminate. Once all the worker threads are closed, you can close the completion port using the CloseHandle function and finally exit your program safely.

The completion port I/O model is by far the best in terms of performance and scalability. There are no limitations to the number of sockets that may be associated with a completion port and only a small number of threads are required to service the completed I/O. For more information on using completion ports to develop scalable, high-performance servers, see Chapter 6.