Specifying Escalation Policy Using the CLR Hosting Interfaces

CLR hosts specify an escalation policy using the failure policy manager from the CLR hosting interfaces. The failure policy manager has two interfaces: ICLRPolicyManager and IHostPolicyManager. As their names imply, ICLRPolicyManager is implemented by the CLR and IHostPolicyManager is implemented by the host.

Hosts obtain a pointer of type ICLRPolicyManager by calling the GetCLRManager method on ICLRControl. The pointer to an ICLRPolicyManager can then be used to call methods that specify timeouts for various operations, the actions to take when failures occur, and so on. The IHostPolicyManager interface is used by the CLR to notify the host of actions taken as a result of escalation policy. The CLR obtains a pointer to the host's implementation of IHostPolicyManager using IHostControl::GetHostManager. The actions taken on failures are then reported to the host by calling methods on IHostPolicyManager.

The next few sections describe how to specify escalation policy using ICLRPolicyManager and how to receive notifications by providing an implementation of IHostPolicyManager.

Setting Policy Using ICLRPolicyManager

Escalation policy consists of three primary concepts: failures, actions, and operations. These concepts were described in general terms earlier in the chapter when I described how escalation policy fits into the CLR's overall infrastructure to support hosts with requirements for long process lifetimes. Failures refer to exceptional conditions such as the failure to allocate a resource. Actions describe the behavior that the CLR should take when a failure occurs. For example, a host can specify an action of thread abort given a failure to allocate a resource. Operations serve two purposes in the failure policy manager. First, they specify the operations for which timeouts can be specified, such as thread aborts, application domain unloads, and process exit. Second, a host can use operations to escalate the actions taken on failures. For example, a host can specify that a failure to allocate a resource in a critical region of code should always be escalated to an application domain unload.

In terms of the hosting interfaces, these concepts are represented by three enumerations: EClrFailure, EPolicyAction, and EClrOperation. Values from these enumerations are passed to the methods of ICLRPolicyManager to define escalation policy. The methods on ICLRPolicyManager are described in Table 11-1.

Table 11-1. The Methods on ICLRPolicyManager
Method	Description
SetActionOnFailure	Enables a host to specify the action to take for a given failure.
SetTimeout	Enables a host to specify a timeout value (in milliseconds) for given operations such as a thread abort or application domain unload.
SetActionOnTimeout	Enables a host to specify which action should be taken when a timeout for a particular operation occurs.
SetTimeoutAndAction	Enables a host to specify both a timeout value and a subsequent action in a single method call. SetTimeoutAndAction is a convenience method that combines the capabilities of Set-Timeout and SetActionOnTimeout.
SetDefaultAction	Enables a host to specify the default action to take for a particular operation. SetDefaultAction is primarily used to override the CLR default behavior for a given action.
SetUnhandledExceptionPolicy	As described, the behavior of unhandled exceptions is different in .NET Framework 2.0 than it is in .NET Framework 1.0 and .NET Framework 1.1. In particular, unhandled exceptions in .NET Framework 2.0 result in process termination. This behavior isn't desirable for hosts with requirements for long process lifetimes, so SetUnhandledExceptionPolicy enables a host to turn off this behavior so that unhandled exceptions do not cause the process to exit.

CLR hosts typically use the following steps to specify escalation policy using ICLRPolicyManager:

1.	Obtain an ICLRPolicyManager interface pointer.
2.	Set the actions to take when failures occur.
3.	Set timeout values and the actions to take when a timeout occurs.
4.	Set any default actions.
5.	Specify unhandled exception behavior.

Step 1: Obtain a ICLRPolicyManager Interface Pointer

As described in Chapter 2, CLR hosts obtain interface pointers to the CLR-implemented managers using the ICLRControl interface. The ICLRControl interface is obtained by calling the GetCLRControl method on the ICLRRuntimeHost pointer obtained from the call to CorBindToRuntimeEx. Given a pointer of type ICLRControl, simply call its GetCLRManager method passing the IID corresponding to ICLRPolicyManager (IID_ ICLRPolicyManager) as shown in the following code sample:

// Initialize the CLR and get a pointer of // type ICLRRuntimeHost. ICLRRuntimeHost *pCLR = NULL; HRESULT hr = CorBindToRuntimeEx(       L"v2.0.41013",       L"wks",       STARTUP_CONCURRENT_GC,       CLSID_CLRRuntimeHost,       IID_ICLRRuntimeHost,       (PVOID*) &pCLR); // Use ICLRRuntimeHost to get the CLR control interface. ICLRControl *pCLRControl = NULL; pCLR->GetCLRControl(&pCLRControl); // Use ICLRControl to get a pointer to the failure policy // manager. ICLRPolicyManager* pCLRPolicyManager = NULL; hr = pCLRControl->GetCLRManager(IID_ICLRPolicyManager,                                 (PVOID*)&pCLRPolicyManager);

Step 2: Set Actions to Take on Failures

Given an interface pointer of type ICLRPolicyManager, a host can now specify which actions the CLR should take given specific failures by calling SetActionOnFailure. As can be seen from the following method signature, SetActionOnFailure takes two parameters. The first parameter is a value from the EClrFailure enumeration describing the failure for which an action is specified. The second parameter is a value from the EPolicyAction enumeration describing the action itself.

interface ICLRPolicyManager: IUnknown {     HRESULT SetActionOnFailure(         [in] EClrFailure failure,         [in] EPolicyAction action);     // Other methods omitted... }

The values for EClrFailure correspond directly to the failures described earlier in the overview of escalation policy. Specifically, the failures for which an action can be specified are the failure to allocate a resource, the failure to allocate a resource in a critical region of code, a fatal error internal to the CLR, and an abandoned synchronization primitive. Here's the definition of EClrFailure from mscoree.idl:

typedef enum {     FAIL_NonCriticalResource,     FAIL_CriticalResource,     FAIL_FatalRuntime,     FAIL_OrphanedLock,     MaxClrFailure } EClrFailure;

The meaning of many of the values from EPolicyAction should also be clear from the overview of escalation policy presented earlier in the chapter. Here's the definition of EPolicyAction from mscoree.idl:

typedef enum {     eNoAction,     eThrowException,     eAbortThread,     eRudeAbortThread,     eUnloadAppDomain,     eRudeUnloadAppDomain,     eExitProcess,     eFastExitProcess,     eRudeExitProcess,     eDisableRuntime,     MaxPolicyAction } EPolicyAction;

Values from EPolicyAction that likely require additional explanation are these:

eNoAction and eThrowException These values are the defaults for various operations that can be customized through escalation policy. For example, eThrowException is the default action for all resource failures. Because these values are defaults, they are primarily used to reestablish default behavior if you had changed it earlier.
eFastExitProcess In addition to the graceful and rude methods for terminating a process, escalation policy enables a host to specify a third alternative called a fast process exit. The CLR does not run any object finalizers during a fast exit, but it does run other cleanup code such as that in finally blocks. A fast exit is a compromise between a graceful exit and a rude exit both in terms of how quickly the process terminates and in the amount of add-in cleanup code the CLR attempts to run.

With the exception of eClrFailure.FAIL_FatalRuntime, all of the values in EPolicyAction are valid actions for any of the failures indicated by EClrFailure. FAIL_FatalRuntime is a special case in that when a fatal error occurs, the only actions that can be taken are to exit the process or disable the CLR. Specifically, the following values from EPolicyAction are valid when calling SetActionOnFailure for a fatal runtime failure:

eExitProcess
eFastExitProcess
eRudeExitProcess
eDisableRuntime

Now that you understand the parameters to SetActionOnFailure, let's look at some example calls to see how particular actions can be specified for given failures. The following series of calls to SetActionOnFailure specifies a portion (minus the timeout aspects) of the escalation policy shown in Figure 11-1.

hr = pCLRPolicyManager->SetActionOnFailure(FAIL_NonCriticalResource,                                               eAbortThread); hr = pCLRPolicyManager->SetActionOnFailure(FAIL_CriticalResource,                                               eUnloadAppDomain); hr = pCLRPolicyManager->SetActionOnFailure(FAIL_OrphanedLock,                                               eUnloadAppDomain); hr = pCLRPolicyManager->SetActionOnFailure(FAIL_FatalRuntime,                                               eDisableRuntime);

In particular, these lines of code specify the following actions:

All failures to allocate a resource cause the thread on which the failure occurred to be aborted.
If the failure to allocate a resource occurs in a critical region of code, the failure is escalated to an application domain unload. Recall that a resource failure in a critical region has the potential to leave the application domain in an inconsistent state. Hence, a conservative approach is to unload the entire application domain.
The detection of an abandoned lock also causes the application domain to be unloaded. As with a resource failure in a critical region, an orphaned lock is a pretty good indication that application domain state probably isn't consistent.
Instead of exiting the process if a fatal internal CLR error is detected, the CLR is disabled, thereby enabling the host to continue any processing unrelated to managed code.

Step 3: Set Timeouts and the Actions to Take for Various Operations

The ability to specify timeout values for operations such as thread aborts and application domain unloads is a key element in the host's ability to describe an escalation policy that will keep a process responsive over time. As shown in Table 11-1, ICLRPolicyManager has three methods that hosts can use to set timeouts: SetTimeout, SetActionOnTimeout, and SetTimeoutAndAction. Each of these methods enables a host to specify a timeout value in milliseconds for an operation described by the EClrOperation enumeration. The definition of EClrOperation is as follows:

typedef enum {     OPR_ThreadAbort,     OPR_ThreadRudeAbortInNonCriticalRegion,     OPR_ThreadRudeAbortInCriticalRegion,     OPR_AppDomainUnload,     OPR_AppDomainRudeUnload,     OPR_ProcessExit,     OPR_FinalizerRun,     MaxClrOperation } EClrOperation;

Timeouts can be specified only for a subset of the values in EClrOperation. For example, it wouldn't make sense to specify a timeout for any of the operations that represent rude thread aborts or application domain unloads because those operations occur immediately. The values from EClrOperation for which a timeout can be specified are as follows:

OPR_ThreadAbort
OPR_AppDomainUnload
OPR_ProcessExit
OPR_FinalizerRun

In general, hosts that require a process to live for a long time are likely to want to specify timeout values for at least OPR_ThreadAbort, OPR_AppDomainUnload, and OPR_FinalizerRun because there are no default timeout values for these operations. That is, unless a host specifies a timeout, attempts to abort a thread or unload an application domain could take an infinite amount of time. The CLR does have a default timeout for OPR_ProcessExit, however. If a process doesn't gracefully exit in approximately 40 seconds, the process is forcefully terminated.

The following series of calls to SetTimeoutAndAction specifies timeout values and the actions to take for the operations indicated by the escalation policy specified in Figure 11-1.

   hr = pCLRPolicyManager->SetTimeoutAndAction (OPR_ThreadAbort, 10*1000,                                                 eRudeAbortThread);    hr = pCLRPolicyManager->SetTimeoutAndAction (OPR_AppDomainUnload, 20*1000,                                                 eRudeUnloadAppDomain);

In particular, these statements specify the following:

If an attempt to abort a thread doesn't complete in 10 seconds, the thread abort is escalated to a rude thread abort.
If an attempt to unload an application domain doesn't complete in 20 seconds, the unload is escalated to a rude application domain unload.

Step 4: Set Any Default Actions

The SetDefaultAction method of ICLRPolicyManager can be used to specify a default action to take in response to a particular operation. In general, this method is used less than the other methods on ICLRPolicyManager because the CLR's default actions are sufficient and because escalation is usually specified in terms of either particular failures or timeouts. However, if you'd like to change the defaults, you can do so by calling SetDefaultAction with a value for EPolicyAction describing the action to take in response to an operation defined by a value from EClrOperation. For example, the following call to SetDefaultAction specifies that the entire application domain should always be unloaded whenever a rude thread abort occurs in a critical region of code:

hr = pCLRPolicyManager->SetDefaultAction (OPR_ThreadRudeAbortInCriticalRegion,                                           eUnloadAppDomain);

SetDefaultAction can be used only to escalate failure behavior; it cannot be used to downgrade the action to take for a given operation. For example, it's not valid to use SetDefaultAction to request an action of EAbortThread for an operation of OPR_AppDomainUnload. Table 11-2 describes which values from EPolicyAction represent valid actions for each operation from EClrOperation.

Table 11-2. Valid Combinations of Actions and Operations for SetDefaultAction
Value of eClrOperations	Valid Values from EPolicyAction
OPR_ThreadAbort	eAbortThread eRudeAbortThread eUnloadAppDomain eRudeUnloadAppDomain eExitProcess eFastExitProcess eRudeExitProcess eDisableRuntime
OPR_ThreadRudeAbortInNonCriticalRegion OPR_ThreadRudeAbortInCriticalRegion	eRudeAbortThread eUnloadAppDomain eRudeUnloadAppDomain eExitProcess eFastExitProcess eRudeExitProcess eDisableRuntime
OPR_AppDomainUnload	eUnloadAppDomain eRudeUnloadAppDomain eExitProcess eFastExitProcess eRudeExitProcess eDisableRuntime
OPR_AppDomainRudeUnload	eRudeUnloadAppDomain eExitProcess eFastExitProcess eRudeExitProcess eDisableRuntime
OPR_ProcessExit	eExitProcess eFastExitProcess eRudeExitProcess eDisableRuntime
OPR_FinalizerRun	eNoAction eAbortThread eRudeAbortThread eUnloadAppDomain eRudeUnloadAppDomain eExitProcess eFastExitProcess eRudeExitProcess eDisableRuntime

Step 5: Specify Unhandled Exceptions Behavior

The change in the way unhandled exceptions are treated in .NET Framework 2.0 was implemented to make application exceptions more obvious and easier to debug. Previously, the CLR caught, and thereby hid, unhandled exceptions; often this completely masked the problem such that the end user or developer wasn't even aware the error occurred. Enabling unhandled exceptions always to be visible makes it possible for these errors to be discovered and fixed. Although this is a positive step in general, hosts that require the process to live for a long time would rather have unhandled exceptions masked in these cases rather than having the process terminated. As I've shown, escalation policy provides the host with the means to unload any code that might be in a questionable state because of an unhandled exception. The SetUnhandledExceptionPolicy method on ICLRPolicyManager enables the host to revert to the behavior for treating unhandled exceptions that was implemented in versions of the .NET Framework earlier than version 2.0. The definition of SetUnhandledExceptionPolicy is shown here:

interface ICLRPolicyManager: IUnknown {     HRESULT SetUnhandledExceptionPolicy(         [in] EClrUnhandledException policy);     // Other methods omitted... }

As you can see, hosts specify their desired behavior with respect to unhandled exceptions by passing a value from the EClrUnhandledException enumeration to SetUnhandledExceptionPolicy. EClrUnhandledException has two values as shown in the following definition from mscoree.idl:

typedef enum {     eRuntimeDeterminedPolicy,     eHostDeterminedPolicy, } EClrUnhandledException;

The eRuntimeDeterminedPolicy value specifies the .NET Framework 2.0 behavior, whereas the eHostDeterminedPolicy value specifies that unhandled exceptions should not be allowed to proceed to the point where the process will be terminated.

The following call to SetUnhandledExceptionPolicy demonstrates how a host would use the API to specify that unhandled exceptions should not result in process termination:

pCLRPolicyManager->SetUnhandledExceptionPolicy(eHostDeterminedPolicy);

Receiving Notifications Through IHostPolicyManager

Hosts that specify an escalation policy using ICLRPolicyManager can receive notifications whenever an action was taken as a result of that policy. These notifications are sent to the host through the IHostPolicyManager interface. Table 11-3 describes the methods the CLR calls to notify the host of actions taken as a result of escalation policy.

Table 11-3. The Methods on IHostPolicyManager
Method	Description
OnDefaultAction	Notifies the host that an action was taken in response to a particular operation. The action taken and the operation to which it applied are passed as parameters.
OnTimeout	Notifies the host that a timeout has occurred. The operation that timed out and the action taken as a result are passed as parameters.
OnFailure	Notifies the host that a failure has occurred. The particular failure that occurred and the action taken as a result are passed as parameters.

To receive these notifications, a host must complete the following two steps:

1.	Provide an implementation of IHostPolicyManager.
2.	Register that implementation with the CLR.

These steps are described in more detail in the next two sections.

Step 1: Provide an Implementation of IHostPolicyManager

To provide an implementation of IHostPolicyManager, simply write a class that derives from the interface and implement the OnDefaultAction, OnTimeout, and OnFailure methods. As described earlier, the CLR calls these methods for informational purposes only. Although no direct action can be taken in the implementation of the methods, it is useful to see when your escalation policy is being used by the CLR. This information can be helpful in tuning your policy over time. The following code snippet provides a sample definition for a class that implements IHostPolicyManager:

class CHostPolicyManager : public IHostPolicyManager { public:     // IHostPolicyManager     HRESULT STDMETHODCALLTYPE OnDefaultAction(EClrOperation operation,                                               EPolicyAction action);     HRESULT STDMETHODCALLTYPE OnTimeout(EClrOperation operation,                                         EPolicyAction action);     HRESULT STDMETHODCALLTYPE OnFailure(EClrFailure failure,                                         EPolicyAction action);      // IUnknown     virtual HRESULT STDMETHODCALLTYPE QueryInterface(const IID &iid,                                                        void **ppv);     virtual ULONG STDMETHODCALLTYPE AddRef();     virtual ULONG STDMETHODCALLTYPE Release(); };

Step 2: Notify the CLR of Your Implementation Using IHostControl

After you've written a class that implements IHostPolicyManager, you must notify the CLR of your intent to receive notifications related to escalation policy. As with all host-implemented interfaces, the host returns its implementation when the CLR calls the GetHostManager method on IHostControl (refer to Chapter 2 for a refresher on how the CLR discovers which managers a host implements). The following partial implementation of IHostControl::GetHost-Manager creates an instance of the class defined earlier when asked for an implementation of IHostManager:

HRESULT STDMETHODCALLTYPE CHostControl::GetHostManager(REFIID riid,void **ppv) {     if (riid == IID_IHostPolicyManager)    {       // Create a new instance of the class that implements       // IHostPolicyManager.       CHostPolicyManager *pHostPolicyManager = new CHostPolicyManager();       *ppv = (IHostPolicyManager *) pHostPolicyManager;       return S_OK;    }    // Checks for other interfaces omitted...    return E_NOINTERFACE; }

Now that I've notified the CLR of the implementation of IHostPolicyManager, it will call the implementation each time an action is taken in response to the escalation policy defined using ICLRPolicyManager.