Control Codes and Status Reporting | Programming Server-Side Applications for Microsoft Windows 2000 (Microsoft Programming)

[Previous] [Next]

The HandlerEx function is responsible for handling all requested actions of the service and all notifications. HandlerEx's first parameter is a code indicating the action request or notification. Table 3-2 describes the codes that indicate an action request. An action request tells the service to perform some action to alter its execution state.

Table 3-2. Control codes that indicate an action request

Control Code	Description
SERVICE_CONTROL_STOP	Requests the service to stop.
SERVICE_CONTROL_PAUSE	Requests the service to pause.
SERVICE_CONTROL_CONTINUE	Requests the paused service to resume.
SERVICE_CONTROL_INTERROGATE	Requests the service to immediately update its current status information to the SCM. This is the only control code that all services must respond to.

Table 3-3 describes the codes that indicate notifications. Notifications inform services of "interesting" events in the system. However, services usually do not alter their execution state in response to notifications (although a service might choose to alter its execution state).

Table 3-3. Control codes that indicate a notification

Control Code	Description
SERVICE_CONTROL_PARAMCHANGE	Notifies the service that configuration parameters have changed. A service can ignore this or reconfigure itself while running.
SERVICE_CONTROL_DEVICEEVENT	Notifies the service of a device event. The service must call RegisterDeviceNotification to receive these notifications.
SERVICE_CONTROL_HARDWAREPROFILECHANGE	Notifies the service of a hardware profile change.
SERVICE_CONTROL_POWEREVENT	Notifies the service of a power event.
A number between 128 and 255, inclusive	Notifies the service of a user-defined event.

Codes Requiring Status Reporting

The work performed by the HandlerEx function differs dramatically depending on the control code it receives. In particular, the action request codes require special attention in your code. When the HandlerEx function receives a SERVICE_CONTROL_STOP, SERVICE_CONTROL_SHUTDOWN, SERVICE_CONTROL_PAUSE, or SERVICE_CONTROL_CONTINUE control code, SetServiceStatus must be called to acknowledge receipt of the code and to specify how long the service thinks it will take to process the state change. For example, you acknowledge receipt of the control code by setting the SERVICE_STATUS structure's dwCurrentState member to SERVICE_STOP_PENDING, SERVICE_PAUSE_PENDING, or SERVICE_CONTINUE_PENDING. In addition, the HandlerEx function must return within 30 seconds, or the SCP application will again think that the service has stopped responding. If the SCM thinks that the service has stopped responding, it doesn't kill the service; it just returns failure to the SCP that initiated the service control code.

While a stop, shutdown, pause, or continue operation is pending, you must also specify how long you think the operation will take to complete. Specifying the duration is useful because a service might not be able to change its state immediately—it might have to wait for a network request to complete or for data to be flushed to a drive. You indicate how long the state change will take to complete by using the dwCheckPoint and dwWaitHint members of the SERVICE_STATUS structure, just as you did when you reported that the service was first starting. If you want, you can report periodic progress by incrementing the dwCheckPoint member and setting the dwWaitHint member to indicate how long you expect the service to take to get to the next step.

After the service has performed all the actions required to stop, shut down, pause, or continue itself, SetServiceStatus should be called again. This time you set the dwCurrentStatus member to SERVICE_STOPPED, SERVICE_PAUSED, or SERVICE_RUNNING. When you report any of these three status codes, both the dwCheckPoint and dwWaitHint members should be 0 because the service has completed its state change.

NOTE
After a service calls SetServiceStatus to report SERVICE_STOPPED, the SCM allows that service to run for up to 30 seconds more. If the service is still running after 30 seconds, the SCM terminates the service's process if there are no other services currently running in that process.

When the HandlerEx function receives a SERVICE_CONTROL_INTERROGATE control code, the service should simply acknowledge receipt by setting dwCurrentState to the service's current state and calling SetServiceStatus. (Again, set both dwCheckPoint and dwWaitHint to 0 before making this call to SetServiceStatus.)

When the system is shutting down, the HandlerEx function receives a SERVICE_CONTROL_SHUTDOWN notification code. The service should perform the minimal set of actions necessary to save any data and should ultimately call SetServiceStatus to report SERVICE_STOPPED. To ensure that a machine shuts down in a timely fashion, a service should process this control code only if it absolutely has to. By default, the system gives just 20 seconds for all services to shut down. After 20 seconds, the SCM process (Services.exe) is killed and the machine continues to shut down. This 20-second period is set by the WaitToKillServiceTimeout value, which is contained in the following registry subkey:

 HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control

NOTE
When the system is shutting down, the SCM notifies all services that accept the SERVICE_CONTROL_SHUTDOWN notification code. Some services might ignore the code, some might save data to disk, some might stop themselves and terminate. You must be very careful not to execute any actions in your service that require the assistance of other services. These other services might be in a "bad" state or might even have terminated. The system completely ignores service dependencies when shutting down. Microsoft's goal was to make the system shut down as quick as possible. In fact, other services might even receive their shutdown notifications before your service receives its notification. The problem with this shutdown notification order is that services you depend on might stop running at any time, and your service must handle this situation gracefully.

For the notification codes listed in Table 3-3, your HandlerEx function should handle the notification and return. Do not call the SetServiceStatus function unless the notification response forces the service to change its execution state. If the service is going to change its execution state, SetServiceStatus should be called to set the dwCurrentState, dwCheckPoint, and dwWaitHint members to the appropriate values as previously discussed.

Dealing with Interthread Communication Issues

A service is awkward to write because although the primary thread executing the HandlerEx function receives the action request, often the service thread needs to do the actual work to process the request. For example, you might be writing a service that processes client requests that come in over a named pipe. Your service's thread suspends itself waiting for a client to connect. If your HandlerEx thread receives a SERVICE_CONTROL_STOP code, how do you stop the service? I've seen many developers simply call TerminateThread from the HandlerEx function to kill the service thread forcibly. By now, you should know that TerminateThread is one of the worst functions you can possibly call because the thread doesn't get a chance to clean up: the thread's stack is not destroyed, the thread can't release any kernel objects that it might have waited on, DLLs are not notified that the thread has been destroyed, and so on.

The proper way for the service to stop is to somehow wake up, see that it is supposed to stop, clean up properly, and then return from its ServiceMain function. To make the service do this, you must implement some form of interthread communication between your HandlerEx function and your ServiceMain function. You can use any queuing interthread communication mechanism you like, including APC queues, sockets, and window messages. I always use I/O completion ports.

To update its current status, a service must frequently call SetServiceStatus. All this status reporting can be another difficult aspect of coding a service. Service implementers often debate about where to place the calls to SetServiceStatus. Here are some of the possibilities:

Have the HandlerEx function make the initial call to SetServiceStatus to report the pending action, and then use interthread communication to get the code to the ServiceMain thread. The ServiceMain thread does the work and then uses interthread communication to let the HandlerEx function know that the action is complete. At that point, the HandlerEx function calls SetServiceStatus again to report the service's new execution state.

Have the HandlerEx function use interthread communication to get the code to the ServiceMain thread. The ServiceMain thread makes the initial call to SetServiceStatus to report the pending action, does the work, and then calls SetServiceStatus again to report the service's new execution state.

Have the HandlerEx function make the initial call to SetServiceStatus to report the pending action, and then use interthread communication to get the code to the ServiceMain thread. The ServiceMain thread does the work and also calls SetServiceStatus again to report the service's new execution state.

All of the above scenarios have pros and cons. I have experimented at great length with all of these possibilities and feel very confident in recommending the last option. Here are my reasons.

First, the SCP calls a function to control a service, and the SCM passes this control to the service. At this point, the SCP is suspended, waiting for the service to call SetServiceStatus to indicate that the service has received the control code. If the service's HandlerEx function doesn't return within 30 seconds, the SCM allows the SCP to wake, and the SCP's function call to control the service returns failure.

Second, the HandlerEx function is executed by the service process's primary thread. (All services in a single process have their HandlerEx functions executed by the primary thread.) If the HandlerEx function waited for the ServiceMain thread to complete the action before returning, any other services in the same process would not be able to receive action requests or notifications. This would make all the other services appear nonresponsive, which is unacceptable (in my opinion).

So I prefer the third method—the HandlerEx function makes the initial call to SetServiceStatus, interthread communication is used to get the code to the ServiceMain thread, and the ServiceMain thread performs the work and calls SetServiceStatus to report the new execution state. However, this method has a problem: a potential race condition exists. Imagine a service's HandlerEx receives a SERVICE_CONTROL_PAUSE code, responds with a SERVICE_PAUSE_PENDING, and then passes the code to the ServiceMain thread. The ServiceMain thread starts processing the code when, all of a sudden, the HandlerEx thread preempts the ServiceMain thread and receives a SERVICE_CONTROL_STOP code. The HandlerEx function now responds with a SERVICE_STOP_PENDING code and queues the new code to the ServiceMain thread. When the ServiceMain thread gets CPU time again, it completes its processing of the SERVICE_CONTROL_PAUSE code and reports SERVICE_PAUSED. Then the thread sees the queued SERVICE_CONTROL_STOP code, stops the service, and reports SERVICE_STOPPED. After all of this, the SCM receives the following state updates:

 SERVICE_PAUSE_PENDING SERVICE_STOP_PENDING SERVICE_PAUSED SERVICE_STOPPED

As you can see, these updates are gibberish, and an administrator could become quite confused. Note, however, that the service is running fine. You'd be surprised how many services I've seen that can actually report this sequence. The developers of these services never fix the problem, because it is quite unlikely that an administrator will issue action requests to the service so quickly—but it can happen! To solve this sequence problem, you must use a thread synchronization mechanism. The TimeService sample application at the end of this chapter uses a CGate C++ class to solve this problem efficiently.

When I first started working with services, I thought that the SCM would be responsible for preventing race conditions from occurring. But my experiments show that the SCM does absolutely nothing to time the sending of control codes. In fact, the SCM does absolutely nothing to ensure that a service receives control codes properly, either. Here's what I mean: While a service is already paused, try sending the service a SERVICE_CONTROL_PAUSE code. You won't be able to do this with the Services snap-in because the snap-in will see that the service is paused and thus disable the Pause button. But if you use the SC.exe command-line utility, nothing is stopping you from sending a pause code to a service that is already paused. I would have expected the SCM to report failure to the SC.exe utility, but the SCM simply calls the service's HandlerEx function, passing it the SERVICE_CONTROL_PAUSE code. Your service must be able to handle these erroneous control codes gracefully.

I have seen many services written that don't deal with the possibility of the same control code being sent to the service multiple times in a row. For example, I know of a service that closed the handle to a named pipe when the service was suspended. The service then proceeded to create another kernel object that, coincidentally, got the same handle value as the handle of the original named pipe. Then the service received another pause control code and called CloseHandle, passing the handle value of the old pipe. Since this value happened to be the same as another kernel object's handle, the new kernel object was destroyed and the rest of the service started failing in strange and mysterious ways! I can't tell you how much of a pleasure this mess was to debug.

To fix this multiple stop, pause, or continue control code problem, check first to see whether your service is already in the desired state. If it is, don't call SetServiceStatus, and don't execute your code to change states—just return.

Here is some logic I've seen used in services quite frequently. When the HandlerEx function receives a SERVICE_CONTROL_PAUSE code, HandlerEx calls SetServiceStatus to report SERVICE_PAUSE_PENDING, calls SuspendThread to "pause" the service's thread, and then calls SetServiceStatus again to report SERVICE_PAUSED. This series of calls does avoid the race conditions, because all the work is being done by one thread, but what is this code doing? Does suspending the service thread pause the service? Well, I guess I have to answer "Yes" to that. However, what does it mean to pause a service? The answer depends on the service.

If I am writing a service that processes client requests over the network, to me, pause means that I'll stop accepting new requests. But what about the request I might be in the middle of processing right now? Maybe I should finish it so that my client doesn't indefinitely hang. If my HandlerEx function simply calls SuspendThread, the service thread might be in the middle of who knows what. Maybe the thread's inside a call to malloc, trying to allocate some memory. If another service running inside the same process also calls malloc, this other service gets suspended too (since access to the heap is serialized). This is certainly not what we want happening!

Oh—and what about this: do you think you should be allowed to stop a service that is paused? I do, and apparently Microsoft thinks so too, because the Services snap-in allows me to click on the Stop button even when a service is paused. But how can I stop a service that is paused because its thread has been suspended? Please don't say TerminateThread.

These are some of the issues that make service development challenging.