Scalable Server Architecture | Network Programming for Microsoft Windows (Microsoft Professional Series)

Scalable Server Architecture

Now that we've introduced the Microsoft-specific extensions, we'll get into the details of implementing a scalable server. Because this chapter focuses on connection-oriented protocols such as TCP/IP, we will first discuss accepting connections followed by managing data transfers. The last section will discuss resource management in more detail.

Accepting Connections

The most common action a server performs is accepting connections. The Microsoft extension AcceptEx is the only Winsock function capable of accepting a client connection via overlapped I/O. As we mentioned previously, the AcceptEx function requires that the client socket be created beforehand by calling socket. The socket must be unbound and unconnected, although it is possible to re-use socket handles after calling TransmitFile, TransmitPackets, or DisconnectEx.

A responsive server must always have enough AcceptEx calls outstanding so that incoming client connections may be handled immediately. However, there is no magic number of outstanding AcceptEx calls that will guarantee that the server will be able to accept the connection immediately. Remember that the TCP/IP stack will automatically accept connections on behalf of the listening application, up to the backlog limit. For Windows NT Server, the maximum backlog value is currently 200. If a server posts 15 AcceptEx calls and then a burst of 50 clients connect to the server, none of the clients' connections will be rejected. The server's accept calls will satisfy the first 15 connections and the system will accept the remaining connections silently—this dips into the backlog amount so that the server will be able to accept 165 additional connections. Then when the server posts additional AcceptEx calls, they will succeed immediately because one of the system queued connections will be returned.

The nature of the server plays an important role in determining how many AcceptEx operations to post. For example, a server that is expected to handle many short-lived connections from a great number of clients may want to post more concurrent AcceptEx operations than a server that handles fewer connections with longer lifetimes. A good strategy is to allow the number of AcceptEx calls to vary between a low and high watermark. An application can keep track of the number of outstanding AcceptEx operations that are pending. Then, when one or more of those completes and the outstanding count decreases below the set watermark, additional AcceptEx calls may be posted. Of course, if at some point an AcceptEx completes and the number of outstanding accepts is greater than or equal to the high watermark then no additional calls should be posted in the handling of the current AcceptEx.

On Windows 2000 and later versions, Winsock provides a mechanism for determining if an application is running behind in posting adequate AcceptEx calls. When creating the listening socket, associate it with an event by using the WSAEventSelect API call and registering for FD_ACCEPT notification. If there are no pending AcceptEx operations but there are incoming client connections (accepted by the system according to the backlog value), then the event will be signaled. This can even be used as an indication to post additional AcceptEx operations.

One significant benefit of using AcceptEx is the capability to receive data in addition to accepting the client connection. For servers whose clients send an initial request this is ideal. However, as we mentioned in Chapter 5, the AcceptEx operation will not complete until at least one byte of data has been received. To prevent malicious attacks or stale connections, a server should cycle through all client socket handles in outstanding AcceptEx operations and call getsockopt with SO_CONNECT_TIME, which will return regardless of whether the socket is actually connected. If it is connected, the return value is greater than zero. A value of -1 indicates it is not connected. If the WSAEventSelect suggestion is implemented, then when the event is signaled it is a good time to check whether the client socket handles in outstanding accept calls are connected. Once an AcceptEx call accepts an incoming connection, it will then wait to receive data, and at this point there is one less outstanding accept call. Once there are no remaining accepts, the event will be signaled on the next incoming client connection. As a word of warning, applications should not under any circumstances close a client socket handle used in an AcceptEx call that has not been accepted because it can lead to memory leaks. For performance reasons, the kernel-mode structures associated with an AcceptEx call will not be cleaned up when the unconnected client handle is closed until a new client connection is established or until the listening socket is closed.

Although it may seem logical and simpler to post AcceptEx requests in one of the worker threads handling notification from the completion port, you should avoid this because socket creation process is expensive. In addition, any complex computations should be avoided within the worker threads so the server may process the completion notifications as fast as possible. One reason socket creation is expensive is the layered architecture of Winsock 2.0. When the server creates a socket, it may be routed through multiple providers, each performing their own tasks, before the socket is created and returned to the application. Chapter 12 discusses layered providers in detail. Instead, a server should create client sockets and post AcceptEx operations from a separate thread. When an overlapped AcceptEx completes in the worker thread, an event can be used to signal the accept issuing thread.

Data Transfers

Once clients are connected, the server will need to transfer data. This process is fairly straightforward, and once again, all data sent or received should be performed with overlapped I/O. By default, each socket has an associated send and receive buffer that is used to buffer outgoing and incoming data, respectively. In most cases these buffers should be left alone, but it is possible to change them or set them to zero by calling setsockopt with the SO_SNDBUF or SO_RCVBUF options.

Let's look at how the system handles a typical send call when the send buffer size is non-zero. When an application makes a send call, if there is sufficient buffer space, the data is copied into the socket's send buffers, the call completes immediately with success, and the completion is posted. On the other hand, if the socket's send buffer is full, then the application's send buffer is locked and the send call fails with WSA_IO_PENDING. After the data in the send buffer is processed (for example, handed down to TCP for processing), then Winsock will process the locked buffer directly. That is, the data is handed directly to TCP from the application's buffer and the socket's send buffer is completely bypassed.

The opposite is true for receiving data. When an overlapped receive call is performed, if data has already been received on the connection, it will be buffered in the socket's receive buffer. This data will be copied directly into the application's buffer (as much as will fit), the receive call returns success, and a completion is posted. However, if the socket's receive buffer is empty, when the overlapped receive call is made, the application's buffer is locked and the call fails with WSA_IO_PENDING. Once data arrives on the connection, it will be copied directly into the application's buffer, bypassing the socket's receive buffer altogether.

Setting the per-socket buffers to zero generally will not increase performance because the extra memory copy can be avoided as long as there are always enough overlapped send and receive operations posted. Disabling the socket's send buffer has less of a performance impact than disabling the receive buffer because the application's send buffer will always be locked until it can be passed down to TCP for processing. However, if the receive buffer is set to zero and there are no outstanding overlapped receive calls, any incoming data can be buffered only at the TCP level. The TCP driver will buffer only up to the receive window size, which is 17 KB—TCP will increase these buffers as needed to this limit; normally the buffers are much smaller. These TCP buffers (one per connection) are allocated out of non-paged pool, which means if the server has 1000 connections and no receives posted at all, 17 MB of the non-paged pool will be consumed! The non-paged pool is a limited resource, and unless the server can guarantee there are always receives posted for a connection, the per-socket receive buffer should be left intact.

Only in a few specific cases will leaving the receive buffer intact lead to decreased performance. Consider the situation in which a server handles many thousands of connections and cannot have a receive posted on each connection (this can become very expensive, as you'll see in the next section). In addition, the clients send data sporadically. Incoming data will be buffered in the per-socket receive buffer and when the server does issue an overlapped receive, it is performing unnecessary work. The overlapped operation issues an I/O request packet (IRP) that completes, immediately after which notification is sent to the completion port. In this case, the server cannot keep enough receives posted, so it is better off performing simple non-blocking receive calls.

TransmitFile and TransmitPackets

For sending data, servers should consider using the TransmitFile and TransmitPackets API functions where applicable. The benefit of these functions is that a great deal of data can be queued for sending on a connection while incurring just a single user-to-kernel mode transition. For example, if the server is sending file data to a client, it simply needs to open a handle to that file and issue a single TransmitFile instead of calling ReadFile followed by a WSASend, which would invoke many user-to-kernel mode transitions. Likewise, if a server needs to send several memory buffers, it also can build an array of TRANSMIT_PACKETS_ELEMENT structures and use the TransmitPackets API. As we mentioned, these APIs allow you to disconnect and re-use the socket handles in subsequent AcceptEx calls.