Section 9.6. Delivery Failures | Programming WCF Services

9.6. Delivery Failures

As discussed in Chapter 6, a connected call may fail both due to communication failures and to service-side errors. Similarly, a queued call can fail due to delivery failure or to service-side playback errors. WCF provides dedicated error-handling mechanisms for both types of errors, and understanding them as well as integrating your error-handling logic with them is an intrinsic part of using queued services.

While MSMQ can guarantee delivery of the message if it is technically possible to do so, there are multiple examples of when it is not possible to deliver the message. These include but are not limited to:

Time out and expiration: As you will see shortly, each message has a timestamp, and the message has to be delivered and processed within that timeout. Failing to do so will cause the delivery to fail.
Security mismatch: If the security credentials in the message (or the chosen authentication mechanism itself) do not match up with what the service expects, the service will reject the message.
Transactional mismatch: The client cannot use a local nontransactional queue while posting a message to a transactional service-side queue.
Network problems: If the underlying network fails or is simply unreliable, the message may never reach the service.
Machine crashes: The service machine may crash due to software or hardware failures and will not be able to accept the message to its queue.
Purges: Even if the message is delivered successfully, the administrator (or any application, programmatically) can purge the messages out of the queue and avoid having the service process them.
Quota breach: Each queue has a quota controlling the maximum size of data it can hold. If the quota is exceeded, future messages are rejected.

After every delivery failure, the message goes back to the client's queue where MSMQ will continuously retry to deliver it. While in some cases, such as intermediate network failures or quota issues, the retries may eventually succeed, there are many cases where MSMQ will never succeed in delivering the message. In fact, in practical terms, even a large enough number of attempts may be unacceptable and may create a dangerous amount of thrashing. Delivery-failure handling deals with how MSMQ would know it should not retry forever, after how many attempts it should give up, after how long it should give up, and what it should do with the failed messages.

MsmqBindingBase offers a number of properties governing handling of delivery failures:

 public abstract class MsmqBindingBase : Binding,... {    public TimeSpan TimeToLive    {get;set;}    //DLQ settings    public Uri CustomDeadLetterQueue    {get;set;}    public DeadLetterQueue DeadLetterQueue    {get;set;}    //More members }

9.6.1. The Dead-Letter Queue

In messaging systems, after an evident failure to deliver, the message goes to a special queue called the dead-letter queue (DLQ). The DLQ is somewhat analogous to a classic dead-letter mailbox at the main post office. In the context of this discussion, failure to deliver constitutes not only failure to reach the service-side queue, but also failure to commit the playback transaction. Note that the service may still fail processing the playback and still commit the playback transaction. MSMQ on the client and on the service side constantly acknowledge to each other receiving and processing messages. If the service-side MSMQ successfully received and retrieved the message from the service-side queue (that is, the playback transaction committed), it sends a positive acknowledgement (ACK) to the client-side MSMQ. The service-side MSMQ can also send a negative acknowledgement (NACK) to the client. When the client-side MSMQ receives a NACK, it posts the message to the DLQ. If the client-side MSMQ receives neither ACK nor NACK, the message is considered in-doubt.

With MSMQ 3.0 (that is, on Windows XP and Windows Server 2003), the dead-letter queue is a system-wide queue. All failed messages from any application go to this single repository. With MSMQ 4.0 (that is, on Windows Vista), you can configure an application-specific DLQ where only messages destined to that specific service go. Application-specific dead-letter queues grossly simplify both the administrator's and the developer's work.

When dealing with a nondurable queue, failed nontransactional messages go to a special system-wide durable DLQ.

9.6.2. Time to Live

With MSMQ, each message carries a timestamp initialized when the message is first posted to the client-side queue. In addition, every queued WCF message has a timeout, controlled by the TimeToLive property of MsmqBindingBase. After posting a message to the client-side queue, WCF mandates that the message must be delivered and processed in the configured timeout. Note that successful delivery to the service-side queue is not good enoughthe call must be processed as well. The TimeToLive property is therefore somewhat analogous to the SendTimeout property of the connected bindings. The TimeToLive property is only relevant to the posting client, and has no affect on the service side, nor can the service change it. TimeToLive defaults to one day. After continuously trying and failing to deliver (and process) for as long as TimeToLive allows, MSMQ stops trying and moves the message to the configured DLQ.

You can configure the time-to-live value either programmatically or administratively. For example, using a config file, here is how to configure a time to live of five minutes:

 <bindings>    <netMsmqBinding>       <binding name = "ShortTimeout" timeToLive = "00:05:00">       </binding>    </netMsmqBinding> </bindings>

The main motivation for configuring a short timeout is dealing with time-sensitive calls that must be processed in a timely manner. However, time-sensitive queued calls go against the grain of disconnected queued calls in general, because the more time-sensitive the calls are, the more questionable the use of queued services is in the first place. The correct way of viewing time to live is as a last-resort heuristic used to eventually bring to the attention of the administrator the fact that the message was not delivered, not as a way to enforce business-level interpretation of the message sensitivity.

9.6.3. Configuring the Dead-Letter Queue

MsmqBindingBase offers the DeadLetterQueue property of the enum type DeadLetterQueue:

 public enum DeadLetterQueue {    None,    System,    Custom }

When set to DeadLetterQueue.None, WCF makes no use of a dead-letter queue. After a failure to deliver, WCF silently discards the message as if the call never happened. DeadLetterQueue.System is the default value of the property. As its name implies, it uses the system-wide DLQ, and after a delivery failure WCF moves the message from the client-side queue to the system-wide DLQ.

When set to DeadLetterQueue.Custom, the application can take advantage of a dedicated DLQ. DeadLetterQueue.Custom requires the use of MSMQ 4.0, and WCF verifies that at the call time. In addition, WCF requires that the application specify the name of the custom DLQ address in the CustomDeadLetterQueue property of the binding. The default value of CustomDeadLetterQueue is null, but when DeadLetterQueue.Custom is employed, CustomDeadLetterQueue cannot be null:

 <netMsmqBinding>    <binding name = "CustomDLQ"        deadLetterQueue = "Custom"        customDeadLetterQueue = "net.msmq://localhost/private/MyCustomDLQ">    </binding> </netMsmqBinding>

Conversely, when the DeadLetterQueue is set to any other value besides DeadLetterQueue.Custom, then CustomDeadLetterQueue must be null.

It is important to realize that the custom DLQ is just another MSMQ queue. It is up to the client-side developer to also deploy a DLQ service that processes its messages. All WCF does on MSMQ 4.0 is automate the act of moving the message to the DLQ once a failure is detected.

9.6.3.1. Custom DLQ verification

If a custom DLQ is required, then, like any other queue, it is up to the client to verify at runtime, before issuing queued calls, that the custom DLQ exists and, if necessary, to create it. Following the pattern presented previously, you can automate and encapsulate this with my QueuedServiceHelper.VerifyQueue( ) method, shown in Example 9-15.

Example 9-15. Verifying a custom DLQ

 public static class QueuedServiceHelper {    public static void VerifyQueue(ServiceEndpoint endpoint)    {       if(endpoint.Binding is NetMsmqBinding)       {          string queue = GetQueueFromUri(endpoint.Address.Uri);          if(MessageQueue.Exists(queue) == false)          {             MessageQueue.Create(queue,true);          }          NetMsmqBinding binding = endpoint.Binding as NetMsmqBinding;          if(binding.DeadLetterQueue == DeadLetterQueue.Custom)          {             Debug.Assert(binding.CustomDeadLetterQueue != null);             string DLQ = GetQueueFromUri(binding.CustomDeadLetterQueue);             if(MessageQueue.Exists(DLQ) == false)             {                MessageQueue.Create(DLQ,true);             }          }       }    }    //More members }

9.6.4. Processing the Dead-Letter Queue

The client needs to somehow process the accumulated messages in the DLQ. In the case of the system-wide DLQ, the client can provide a mega-service that supports all contracts of all queued endpoints on the system to enable it to process all failed messages. This is clearly an impractical idea, because that service could not possibly know about all queued contracts, let alone have meaningful processing for all applications. The only feasible way to make this work is to restrict the client side to at most a single queued service per system. Alternatively, you can write a custom application for direct administration and manipulation of the system DLQ using System.Messaging. That application will parse and extract the relevant messages and process them. The problem with that approach (besides the inordinate amount of work involved) is that if the messages are protected and encrypted (as they should be), the application will have a hard time dealing with and distinguishing between them. In practical terms, the only possible solution for a general client-side environment is the one offered by MSMQ 4.0; that is, a custom DLQ. When using a custom DLQ, you also provide a client-side service whose queue is the application's custom DLQ. That service will process the failed messages according to the application-specific requirements.

9.6.4.1. Defining the DLQ service

Implementing the DLQ service is done like any other queued service. The only requirement is that the DLQ service be polymorphic with the original service's contract. If multiple queued endpoints are involved, you will need a DLQ per contract per endpoint. Example 9-16 shows a possible setup.

Example 9-16. DLQ service config file

 ////////////////// Client side ///////////////////// <system.serviceModel>    <client>       <endpoint          address  = "net.msmq://localhost/private/MyServiceQueue"          binding  = "netMsmqBinding"          bindingConfiguration = "MyCustomDLQ"          contract = "IMyContract"       />    </client>    <bindings>       <netMsmqBinding>          <binding name = "MyCustomDLQ" deadLetterQueue = "Custom"             customDeadLetterQueue = "net.msmq://localhost/private/MyCustomDLQ">          </binding>       </netMsmqBinding>    </bindings> </system.serviceModel> ////////////////// DLQ service side ///////////////////// <system.serviceModel>    <services>       <service name  = "MyDLQService">          <endpoint             address  = "net.msmq://localhost/private/MyCustomDLQ"             binding  = "netMsmqBinding"             contract = "IMyContract"          />       </service>    </services> </system.serviceModel>

The client config file defines a queued endpoint with the IMyContract contract. The client uses a custom binding section to define the address of the custom DLQ. A separate queued service (potentially on a separate machine) also supports the IMyContract contract. The DLQ service uses as its address the DLQ defined by the client.

9.6.4.2. Failure properties

The DLQ service typically needs to know why the queued call delivery failed. For that, WCF offers the MsmqMessageProperty class, used to find out the cause of the failure and the current status of the message. MsmqMessageProperty is defined in the System.ServiceModel.Channels namespace:

 public sealed class MsmqMessageProperty {    public const string Name = "MsmqMessageProperty";    public int AbortCount    {get;internal set;}    public DeliveryFailure? DeliveryFailure    {get;}    public DeliveryStatus? DeliveryStatus    {get;}    public int MoveCount    {get;internal set;}    //More members }

The DLQ service needs to obtain the MsmqMessageProperty from the operation context's incoming message properties:

 public sealed class OperationContext : ... {    public MessageProperties IncomingMessageProperties    {get;}    //More members } public sealed class MessageProperties : IDictionary<string,object>,... {    public object this[string name]    {get;set;}    //More members }

When a message is passed to the DLQ, WCF will add to its properties an instance of MsmqMessageProperty detailing the failure. MessageProperties is merely a collection of message properties that you can access using a string as a key. To obtain the MsmqMessageProperty, use the constant MsmqMessageProperty.Name, as shown in Example 9-17.

Example 9-17. Obtaining the MsmqMessageProperty

 [ServiceContract(SessionMode = SessionMode.NotAllowed)] interface IMyContract {    [OperationContract(IsOneWay = true)]    void MyMethod( ); } [ServiceBehavior(InstanceContextMode = InstanceContextMode.PerCall)] class MyDLQService : IMyContract {    [OperationBehavior(TransactionScopeRequired = true)]    public void MyMethod( )    {       MsmqMessageProperty msmqProperty = OperationContext.Current.    IncomingMessageProperties[MsmqMessageProperty.Name] as MsmqMessageProperty;       Debug.Assert(msmqProperty != null);       //Process msmqProperty    } }

Note in Example 9-17 the practices discussed so far of session mode, instance management, and transactionsthe DLQ service is, after all, just another queued service.

The properties of MsmqMessageProperty detail the reasons for failure and offer some contextual information. MoveCount is the number of attempts made to play the message to the service. AbortCount is the number of attempts made to read the message from the queue. AbortCount is less relevant to recovery attempts, because it falls under the responsibility of MSMQ and usually is of no concern. DeliveryStatus is a nullable enum of the type DeliveryStatus defined as:

 public enum DeliveryStatus {    InDoubt,    NotDelivered }

DeliveryStatus will be set to DeliveryStatus.InDoubt unless the message was positively not delivered (a NACK was received). For example, expired messages are considered in-doubt because their time to live elapsed before the service could acknowledge them one way or the other.

The DeliveryFailure property is a nullable enum of the type DeliveryFailure defined as follows (without the specific numerical values):

 public enum DeliveryFailure {    AccessDenied,    NotTransactionalMessage,    Purged,    QueueExceedMaximumSize,    ReachQueueTimeout,    ReceiveTimeout,    Unknown    //More members }

9.6.4.3. Implementing a DLQ service

The DLQ service cannot affect the message properties, such as extending its time to live. Handling of delivery failures typically involves some kind of compensating transaction: notifying the administrator; trying to resend a new message, or resending a new request with extended timeout; logging the error; or perhaps doing nothing, merely processing the failed call and returning, thus discarding the message.

Example 9-18 demonstrates one such implementation.

Example 9-18. Implementing a DLQ service

 [ServiceBehavior(InstanceContextMode = InstanceContextMode.PerCall)] class MyDLQService : IMyContract {    [OperationBehavior(TransactionScopeRequired = true)]    public void MyMethod(string someValue)    {       MsmqMessageProperty msmqProperty = OperationContext.Current.         IncomingMessageProperties[MsmqMessageProperty.Name] as MsmqMessageProperty;       //If tried more than 25 times: discard message       if(msmqProperty.MoveCount >= 25)       {          return;       }       //If timed out: try again       if(msmqProperty.DeliveryStatus == DeliveryStatus.InDoubt)       {          if(msmqProperty.DeliveryFailure == DeliveryFailure.ReceiveTimeout)          {             MyContractClient proxy = new MyContractClient( );             proxy.MyMethod(someValue);             proxy.Close( );          }          return;       }       if(msmqProperty.DeliveryStatus == DeliveryStatus.InDoubt ||          msmqProperty.DeliveryFailure == DeliveryFailure.Unknown)       {          NotifyAdmin( );       }    }    void NotifyAdmin( )    {...} }

The DLQ service in Example 9-18 examines the cause of the failure. If WCF tries more than 25 times to deliver the message, the DLQ service simply drops the message and gives up. If the cause for the failure was a timeout, the DLQ service tries again by creating a proxy to the queued service and calling it, passing the same arguments from the original call (the in parameters to the DLQ service operation). If the message is in-doubt or unknown failure took place, the service notifies the application administrator.