22.2 Server Architectures | UNIX Systems Programming: Communication, Concurrency and Threads

Team-FLY

Chapter 18 introduced three models of client-server communication: the serial-server (Example 18.2), the parent-server (Example 18.3), and the threaded-server (Example 18.6), respectively. Because the parent-server strategy creates a new child process to handle each client request, it is sometimes called process-per-request . Similarly, the threaded-server strategy creates a separate thread to handle each incoming request, so it is often called the thread-per-request strategy.

An alternative strategy is to create processes or threads to form a worker pool before accepting requests. The workers block at a synchronization point, waiting for requests to arrive . An arriving request activates one thread or process while the rest remain blocked. Worker pools eliminate creation overhead, but may incur extra synchronization costs. Also, performance is critically tied to the size of the pool. Flexible implementations may dynamically adjust the number of threads or processes in the pool to maintain system balance.

Example 22.1

In the simplest worker-pool implementation, each worker thread or process blocks on the accept function, similar to a simple serial server.

 for (  ;  ; )   {    accept request    process request }

Although POSIX specifies that accept be thread-safe, not all operating systems currently support thread safety. Alternatively, workers can block on a lock that provides exclusive access to accept , as the next example shows.

Example 22.2

The following worker-pool implementation places the accept function in a protected critical section so that only one worker thread or process blocks on accept at a time. The remaining workers block at the lock or are processing a request.

 for (  ;  ; )  {    obtain lock (semaphore or mutex)       accept request    release lock    process request }

POSIX provides semaphores for interprocess synchronization and mutex locks for synchronization within a process.

Exercise 22.3

If a server uses N workers, how many simultaneous requests can it process? What is the maximum number of simultaneous client connections?

Answer:

The server can process N requests simultaneously . However, additional client connections can be queued by the network subsystem. The backlog parameter of the listen function provides a hint to the network subsystem on the maximum number of client requests to queue. Some systems multiply this hint by a fudge factor. If the network subsystem sets its maximum backlog value to B , a maximum of N + B clients can be connected to the server at any one time, although only N clients may be processed at any one time.

Another worker-pool approach for threaded servers uses a standard producer-consumer configuration in which the workers block on a bounded buffer. A master thread blocks on accept while waiting for a connection. The accept function returns a communication file descriptor. Acting as the producer, the master thread places the communication file descriptor for the client connection in the bounded buffer. The worker threads are consumers that remove file descriptors and complete the client communication.

The buffer implementation of the worker pool introduces some interesting measurement issues and additional parameters. If connection requests come in bursts and service time is short, buffering can smooth out responses by accepting more connections ahead than would be provided by the underlying network subsystem. On the other hand, if service time is long, accepted connections languish in the buffer, possibly triggering timeouts at the clients. The number of additional connections that can be accepted ahead depends on the buffer size and the order of the statements synchronizing communication between the master producer and the worker consumers.

Exercise 22.4

How many connections ahead can be accepted for a buffer of size M with a master and N workers organized as follows ?

 Master:    for (  ;  ;  ) {       obtain a slot       accept connection       copy the file descriptor to slot       signal item     } Worker:    for (  ;  : ) {       obtain an item (the file descriptor)       process the communication       signal slot    }

Answer:

If N M , then each worker holds a slot while processing the request, and the master cannot accept any connections ahead. For N < M the master can process M “ N connections ahead.

Exercise 22.5

How does the following strategy differ from that of Exercise 22.4? How many connections ahead can be accepted for a buffer of size M with a master and N workers organized as follows?

 Master:    for (  ;  ;  ) {       accept connection       obtain a slot       copy the file descriptor to slot       signal item    } Worker:    for (  ;  ;  ) {       obtain an item (a file descriptor)       signal slot       process the communication    }

Answer:

The strategy here differs from that of Exercise 22.4 in two respects. First, the master accepts a connection before getting a slot. Second, each worker thread immediately releases the slot (signal slot) after copying the communication file descriptor. In this case, the master can accept up to M+1 connections ahead.

Exercise 22.6

In what way do system parameters affect the number of connections that are made before the server accepts them?

Answer:

The backlog parameter set by listen determines how many connections the network subsystem queues. The TCP flow control mechanisms limit the amount that the client can send before the server calls accept for that connection. The backlog parameter is typically set to 100 or more for a busy server, in contrast to the old default value of 5 [115].

Exercise 22.7

What a priori advantages and disadvantages do worker-pool implementations have over thread-per-request implementations?

Answer:

For short requests, the overhead of thread creation and buffer allocation can be significant in thread-per-request implementations. Also, these implementations do not degrade gracefully when the number of simultaneous connections exceeds system capacity ”these implementations usually just keep accepting additional connections, which can result in system failure or thrashing. Worker-pool implementations save the overhead of thread creation. By setting the worker-pool size appropriately, a system administrator can prevent thrashing and crashing that might occur during busy times or during a denial-of-service attack. Unfortunately, if the worker-pool size is too low, the server will not run to full capacity. Hence, good worker-pool deployments need the support of performance measurements.

Exercise 22.8

Can the buffer-pool approach be implemented with a pool of child processes?

Answer:

The communication file descriptors are small integer values that specify position in the file descriptor table. These integers only have meaning in the context of the same process, so a buffer-pool implementation with child processes would not be possible.

In thread-per-request architectures, the master thread blocks on accept and creates a thread to handle each request. While the size of the pool limits the number of concurrent threads competing for resources in worker pool approaches, thread-per-request designs are prone to overallocation if not carefully monitored .

Exercise 22.9

What is a process-per-request strategy and how might it be implemented?

Answer:

A process-per-request strategy is analogous to a thread-per-request strategy. The server accepts a request and forks a child (rather than creating a thread) to handle it. Since the main thread does not fork a child to handle the communication until the communication file descriptor is available, the child inherits a copy of the file descriptor table in which the communication file descriptor is valid.

The designs thus far have focused on the communication file descriptor as the principal resource. However, heavily used web servers are often limited by their disks, I/O subsystems and memory caches. Once a thread receives a communication file descriptor and is charged with handling the request, it must locate the resource on disk. This process may require a chain of disk accesses .

Example 22.10

The client request to retrieve /usp/exercises/home.html may require several disk accesses by the OS file subsystem. First, the file subsystem locates the inode corresponding to usp by reading the contents of the web server's root directory and parsing the information to find usp . Once the file subsystem has retrieved the inode for usp , it reads and parses data blocks from usp to locate exercises . The process continues until the file subsystem has retrieved the actual data for home.html . To eliminate some of these disk accesses, the operating system may cache inodes indexed by pathname.

To avoid extensive disk accesses to locate a resource, servers often cache the inode numbers of the most popular resources. Such a cache might be effectively managed by a single thread or be controlled by a monitor.

Disk accesses are usually performed through the I/O subsystem of the operating system. The operating system provides caching and prefetching of blocks. To eliminate the inefficiency of extra copying and blocking through the I/O subsystem, web servers sometimes cache their most popular pages in memory or in a disk area that bypasses the operating system file subsystem.

Team-FLY