Troubleshooting Notification Distribution


This section covers problems in the distribution part of the SQL-NS pipeline. Problems in this area usually manifest themselves as notifications not being delivered, or not being delivered correctly.

The Distributor or the Components It Hosts Are Not Running

This section covers problems that prevent the distributor, or the content formatters and delivery protocols it hosts, from running.

The SQL-NS Service Hosting the Distributor Is Not Started

Refer to the earlier section "The SQL-NS Service Hosting the Event Provider Is Not Started" (p. 514) and apply the information given there to the SQL-NS Service hosting the distributor.

The Distributor Is Not Enabled

Symptom: The distributor is not running, and its status is reported as Disabled or Disable Pending.

Cause: Either the distributor was never enabled after the instance was created or updated, or it was explicitly disabled.

Action: Use the nscontrol enable command to enable the distributor from the command line. If you want to enable only the distributor without enabling other components, you can use the -distributor option. Alternatively, use the Instance Properties dialog box to enable the distributor from Management Studio. The section "Enabling and Disabling Components" (p. 492) in Chapter 14 provides details on the use of these tools.

The Distributor Is Unable to Start

Symptom: The distributor is not running, and the Application Event Log on the distributor machine contains entries in the 6000 range.

Cause: The distributor encountered errors during its startup processing.

Action: Errors with event IDs in the 6000 range indicate errors encountered by the distributor. Examine the descriptions in the event log messages to determine the causes of the errors.

A Custom Content Formatter or Delivery Protocol Is Unable to Start

Refer to the section, "A Custom Event Provider Is Unable to Start" (p. 511). That section describes various startup problems with custom event providers, but its recommendations also apply to custom content formatters and delivery protocols.

The Distributor Does Not Process New Notification Batches

If the generator produces new notification batches, but those batches do not get processed by the distributor, one of the problems described in this section might be the cause.

Notification Batch Generation Does Not Complete Successfully

Symptom: The notifications table contains notifications, but they are never processed by the distributor, and their delivery status codes remain at 0 (meaning that delivery has never been attempted). The notification batches table shows that the batches for these notifications have a status code of 5 (indicating that generation failed).

Cause: The generator began generating notifications for the notification batch but was unable to complete the batch successfully. Although there are notifications in the notifications table, the notification batch reflects an unsuccessful generation status, so the notifications are ignored by the distributor.

Action: Examine the Application Event Log for error messages indicating generation problems. Refer to the "Troubleshooting Notification Generation" section (p. 514) for possible causes of generator failures.

A Distributor Work Item Remains in an Incomplete State

Symptom: The distributor appears to be stuck processing one incomplete work item and does not attempt to process new notification batches. The row in the work items table (NSDistributorWorkItems, located in the application schema) for the work item being processed shows a status code of 1 (indicating it is being processed), a valid start time, and an end time value of NULL.

Note

All work items will be in this state for a short time while they are being processed. However, a work item remaining in this state for an extended period of time usually indicates a problem.


Cause: While processing one work item, the distributor encountered a situation from which its processing cannot advance. It might be in an infinite loop or waiting for an event or signal (which may never arrive). Because the distributor is stuck in this state, its work scheduling algorithm never runs, so new notification batches are not seen.

Action: The problem is most likely in the implementation of a custom delivery protocol or content formatter running in the distributor. The custom component probably has not returned control to the distributor from one of its methods. Stopping and restarting the SQL-NS engine will alleviate the condition and allow the distributor to proceed, but if a bug does exist in the delivery protocol or content formatter implementation, it is likely that the distributor will get into this state again. Examine the custom component code to determine the problem. Refer to the "Debugging a Custom Component" topic in the SQL-NS Books Online for information on how to debug custom content formatters and delivery protocols.

Notifications Are Not Delivered as Expected

This section describes a range of problem situations that cause the distributor to deliver notifications incorrectly. In these cases, the distributor might deliver the wrong number of notifications, send them to the wrong destinations, or send them at unexpected times.

The Subscriber Device Names Specified in the Notification Data Are Invalid

Symptom: Notification batches are generated successfully, the distributor creates work items from them, and these work items appear to be processed, but no notifications are actually delivered. Events with Event ID 4048 are written to the Application Event Log, indicating that some notifications cannot be delivered.

Cause: The notification data contains invalid subscriber device names. Specifically, the subscriber devices identified by the SubscriberId and DeviceName columns in the notifications table do not exist in the subscriber devices table (NSSubscriberDevices) in the instance schema. This can happen because the match rule inserted invalid device names into the notifications view, or the subscriber devices were deleted or renamed after the notifications were generated.

Because the distributor cannot locate the subscriber device records for the notifications, it is unable to determine which delivery channels should be used for their delivery and therefore cannot assign them to distributor work items (recall that notifications are assigned to work items on the basis of their target delivery channels). Thus, the distributor work items created from the notification batch all appear to be empty. They are marked as successfully processed but no notifications are delivered.

Action: Verify that the device names used in the match rules are correct and that the subscriber device records have not been deleted or renamed. The combination of the subscriber ID and device name in the notification data must match one of the rows in the NSSubscriberDevices table in the instance schema.

Subscriber Devices Are Configured for Incompatible Delivery Channels

Symptom: Notification batches are generated successfully, but some notifications are not included in any distributor work item and are never delivered.

Cause: Each notification is associated with a particular delivery channel based on a subscriber device. The notifications not delivered are targeted at delivery channels that use protocols the notification class does not support: The protocols used by those delivery channels do not appear in the list of supported protocols specified in the <Protocols> element of the notification class declaration in the ADF.

Action: Verify that the subscriber devices are configured to the correct delivery channels and that the delivery channel definitions in the ICF specify the correct protocols. If it is your intention that some protocols are not supported by the notification class, your SMI should prevent users from entering subscriptions associated with incompatible devices.

If the notification class should support all protocols, you have to change the notification class declaration to include the missing protocol support. Add a <Protocol> element to the notification class's <Protocols> element for each protocol and supply the appropriate protocol fields and configuration settings. See the "Notification Delivery" section (p. 149) in Chapter 5, "Designing and Prototyping an Application," for more information on declaring protocol support. After adding the required protocol configuration information, you must update the instance.

A Timeout Occurs While Reading Notification Data

Symptom: Notifications are not delivered, and the Application Event Log contains event messages from the NotificationServices event source with event ID 6074. The error message indicates that a SQL timeout occurred.

Cause: The SQL command that reads the set of notifications in a distributor work item is taking too long to execute.

Action: By default, the SQL command that reads the notifications in a distributor work item times out after a period of 15 minutes. Depending on the volume of data in your application, this command might reasonably require more time to execute. If this is the case, you can adjust the timeout by means of the <WorkItemTimeout> element in the ADF. Refer to the SQL-NS Books Online for more information on this element.

If you believe the default timeout should be adequate, or you've adjusted the timeout to a reasonable value and the command still takes too long, the following are possible explanations:

  • The notification batches are too large. On SQL-NS Enterprise Edition, you can control the notification batch size by means of the <NotificationBatchSize> element in the ADF. For more information on setting notification batch sizes, see the "Notification Batch Size" section (p. 439) in Chapter 12.

    Distributor logging is putting too much load on the SQL Server. You can reduce this load by adjusting the options in the <DistributorLogging> element in the ADF. For more information on controlling distributor logging, see the "Distributor Logging Options" section (p. 445) in Chapter 12.

    Your database system is not configured optimally. See the "Configuring a Database System for Deployment" section (p. 456) in Chapter 13 for information on configuring a database system for best performance.

Timeouts Occur While Updating Notification Delivery Status

Symptom: The delivery status of some notifications are not updated, and the Application Event Log contains event messages from the NotificationServices event source with event ID 6076. The event descriptions indicate that SQL timeouts occurred.

Cause: A timeout occurred on the SQL command that updates the notification delivery status.

Action: The SQL command that updates the notification delivery status used a fixed timeout of 5 minutes. This cannot be adjusted.

If this command times out, the following are possible explanations:

  • Your custom delivery protocol is reporting delivery status for too many notifications at once. Change the delivery protocol implementation to pass fewer NotificationStatus objects in each invocation of the distributor's notification status callback.

  • Distributor logging is putting too much load on the SQL Server. You can reduce this load by adjusting the options in the <DistributorLogging> element in the ADF. For more information on controlling distributor logging, see the "Distributor Logging Options" section (p. 445) in Chapter 12.

  • Your database system is not configured optimally. See the "Configuring a Database System for Deployment" section (p. 456) in Chapter 13 for information on configuring a database system for best performance.

The Distributor Delivers Duplicate Notifications

This section deals with problems in which subscribers receive duplicate notifications from a SQL-NS application.

Duplicate Notifications Are Being Generated

Symptom: The notifications table shows the same notification data generated multiple times for the same subscriber.

Cause: There are several possible reasons for duplicate notifications being generated:

  • The event provider submitted the same events multiple times.

  • The event chronicles are not maintained correctly, causing old data to generate repeated matches.

  • Duplicate subscriptions exist (either because a subscriber entered the same subscription more than once or because a bug in the SMI causes duplicate subscription records to be created).

  • The match rule contains a logic error that results in multiple rows being inserted into the notifications view for a single match.

Action: Examine the events tables to determine whether duplicate event data is being submitted. This might indicate an event provider problem, or the event provider may have obtained duplicate event data from the event source. If you control the event source, you can ensure that it does not communicate duplicate events to the event provider. Otherwise, you can filter out duplicate events in the event provider code (in the case of a custom event provider) or in the event chronicle rules.

If the problem is not duplicate events, examine your event chronicle tables and rules to make sure that they are working as you expect. If old data is not deleted from the chronicles, or filtered out by the match rule logic, it can cause the same notification to be generated more than once.

If the events and event chronicles appear correct, examine the subscriptions. Duplicate subscriptions will, in general, cause duplicate notifications to be generated. Duplicate subscriptions may have been entered by users (accidentally or intentionally) or might result from bugs in the SMI. If you want to prevent users from entering duplicate subscriptions, you can add validation logic to your SMI that checks for and eliminates duplicates.

Finally, if the events, event chronicles, and subscriptions look correct, the problem might be in the match rules. If a match rule inserts more than one row into the notifications view for each match, duplicate notifications can result. Examine the match rule logic and use the debugging stored procedures (described in Chapter 11) to find such problems.

The Same Notifications Are Being Delivered More Than Once

Symptom: The notifications table does not contain unexpected duplicates, but the same notifications are delivered more than once. It is likely that the notification distribution views will indicate multiple delivery attempts for each notification. (For information about using the distribution views, see the section "The Notification Distribution Views," p. 360, in Chapter 10, "Delivery Protocols.")

Cause: There are several possible explanations for a single notification (a single row of data in the notifications table) being delivered multiple times:

  • The notification was successfully delivered once, but its delivery status was not recorded. This made the notification appear as though it had not been sent, and a subsequent retry attempt delivered the notification again. Failure to record a notification's delivery status can result from an error or timeout while executing one of the distributor's SQL commands, or from a custom delivery protocol failing to report delivery status via the notification status callback.

  • A custom delivery protocol incorrectly performed multiple delivery operations when it handled the notification data.

  • The external delivery system, to which the delivery protocol sent the notification, delivered it to its final recipient more than once.

Action: If the Application Event Log contains event messages from the NotificationServices event source with event ID 6076, the distributor encountered errors while recording the delivery status of some notifications. Retry attempts may have delivered those notifications again, even if they were successfully delivered on the first attempt. Examine the descriptions in the event messages to determine the causes of the errors. If the failures to update delivery status were due to SQL timeouts, refer to the "Timeouts Occur While Updating Notification Delivery Status" section (p. 521) for information on how to correct the problem.

If the Application Event Log contains no errors related to recording delivery status and your application uses a custom delivery protocol, check the delivery protocol's implementation to make sure it reports notification status for all notifications it handles. Refer to the section, "The Custom Delivery Protocol Interface," (p. 373) in Chapter 10 for information on how custom delivery protocols should report notification status.

Also, make sure that the custom delivery protocol performs only one delivery operation on each notification passed to it by the distributor. For information on how to debug your custom delivery protocol, see the "Debugging a Custom Component" topic in the SQL-NS Books Online.

If your application delivers notifications to an external delivery system (such as a web service that provides the routing and delivery to the final recipient), this delivery system might be responsible for sending the notification more than once. If this is the case, the distribution log in your SQL-NS application will only show one delivery attempt, but the final recipient may have received the notification more than once. There might be a problem with the delivery instructions your SQL-NS delivery protocol sends to the external delivery system, or the external delivery system may not be operating correctly.

Notifications Are Delivered at Unexpected Times

This section deals with problems that cause notifications to be delivered long after they should have, or at times that don't match the schedules users intended for their scheduled subscriptions.

Generator Fall-Behind is Causing Notifications to be Generated Late

Symptom: The generator quantum clock has fallen behind and as a result, the generator is producing notifications from event batches that arrived a long time in the past. The distributor is delivering the resulting notification batches as soon as they are ready.

Cause: Notifications are generated later than they should, because the generator's quantum clock has fallen behind. This can happen if quantum processing takes longer than the quantum duration, or if the generator restarts after a period of downtime.

Action: You can get the quantum clock back on schedule by specifying quantum limits in the ADF. These control how far behind the quantum clock is allowed to fall before the generator skips quantums. For information on using quantum limits, see the "Quantum Limits" section (p. 437) in Chapter 12.

In addition to getting the generator back on schedule, it's even more important to understand why the generator fell behind in the first place, if downtime was not the cause. Most likely, the application's rules are performing poorly, causing quantum processing to take longer than it should. The rules might be expressed in an inefficient way, or the tables against which they operate may not be indexed properly. Also, the database system may not be optimally configured. Refer to the "Indexes and Query Optimization" section (p. 425) in Chapter 12 for information on rule optimization. The "Configuring a Database System for Deployment" section (p. 456) in Chapter 13 provides information on configuring a database system for best performance.

The Distributor Is Delivering Old Notifications After Downtime

Symptom: After a prolonged period of downtime, the distributor starts delivering notifications that were generated much earlier.

Cause: While the distributor component was not running, the generator was still generating notifications. When the distributor was restarted, it began processing this backlog of old notifications and delivering them late.

Action: If the semantics of your application dictate that notifications older than a certain age should not be delivered, you should specify a notification expiration age in the <ExpirationAge> element in the ADF. See the "Delivery Failure: Retry and Notification Expiration" section (p. 359) in Chapter 10 for more information about setting notification expiration ages. If you do not specify an expiration age, the notifications never expire, and the distributor attempts to send them no matter how old they are.

A Scheduled Subscription Specifies the Wrong Time Zone

Symptom: Notifications are being generated at the wrong time for a scheduled subscription.

Cause: The subscription definition specifies the wrong time zone. Either the user entered the subscription incorrectly, or the SMI has a bug that caused the wrong time zone information to be recorded.

Action: Examine your subscription data to make sure that the time zones on the subscription schedules are correct. You can use the NSScheduledSubscriptionList and NSScheduledSubscriptionDetails stored procedures to determine which subscriptions are scheduled to be evaluated at a given time (refer to the "Using the Scheduled Subscription Debugging Stored Procedures" section, p. 411, in Chapter 11 for more information on these stored procedures). You might have to debug your subscription management application to ensure that it records the correct time zone information based on users' input.

Notifications Are Delivered via the Wrong Delivery Channel

This section deals with problems that cause the distributor to use the wrong delivery channel when delivering notifications.

Subscriber Devices Are Not Configured Correctly

Symptom: Notifications are generated as expected, but delivered via the wrong delivery channel.

Cause: Subscriber devices records, in the NSSubscriberDevices table in the instance schema, specify the wrong delivery channel names. Either invalid user input or a bug in the SMI caused the subscriber devices to be configured to the wrong delivery channels.

Action: Examine the NSSubscriberDevices table in the instance schema to see the delivery channel specified for each device. If this information is incorrect, you will have to update the subscriber device records. Verify that your SMI is recording subscriber device information correctly.

The Match Rule Inserts the Wrong Subscriber Device Name into the Notifications View

Symptom: Each individual subscriber device record is correctly configured, but notifications are being sent to the wrong delivery channels.

Cause: The match rule may be inserting the wrong subscriber device names into the notifications view. Check the data in the notifications table to see the values inserted.

Action: Examine your match rules to determine where the device name is coming from. A logic error in the rule might cause the wrong name to be inserted, or, if your rule obtains the device name from the subscriptions table, the wrong name may have been recorded in that table.

Getting Help From the SQL-NS Development Community

As SQL-NS adoption has grown, the community of people knowledgeable about the product has grown too. There are already several SQL-NS experts, both within Microsoft and at other companies, willing to help others solve SQL-NS problems. The best way to reach these people is through the SQL-NS public newsgroup and the MSDN discussion forum.

The SQL-NS public newsgroup name is microsoft.public.sqlserver.notificationsvcs. You can access the newsgroup using any news reader. Alternatively, you can use the web-based MSDN discussion forum dedicated to SQL-NS topics, located at http://forums.microsoft.com/msdn/ShowForum.aspx?ForumID=97.

Generally, any question about SQL-NS is fair game on the newsgroup or discussion forum. Typical posts range from questions about applicability of SQL-NS to a particular scenario, to specific coding or design questions. To make it easy for others to answer your questions, always be as specific as possible and include code snippets where applicable.

Remember that participation in the newsgroup can be a two-way endeavor. You can use the newsgroup to get information from others, but you can also benefit the whole SQL-NS development community by sharing the knowledge and experience you have. Don't worry if you don't think you're an expertif you know the answer to someone's question, don't hesitate to post it. The value of the newsgroup and discussion forum lie in their interactive nature and the open information exchange they facilitate. As you learn about SQL-NS and use it in your applications, I encourage you to participate in the community as much as you can.





Microsoft SQL Server 2005 Notification Services
Microsoft SQL Server 2005 Notification Services
ISBN: 0672327791
EAN: 2147483647
Year: 2006
Pages: 166
Authors: Shyam Pather

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net