If your automated jobs, alerts, or notifications are not working properly, use the guidelines found in this lesson to isolate and solve the problem.
After this lesson, you will be able to
- Troubleshoot potential problems when jobs or alerts do not execute as anticipated
Estimated lesson time: 30 minutes
Each of the following items is discussed in turn in:
If SQL Server Agent has stopped for any reason, it is not able to execute jobs or fire alerts.
You should consider having the SQLServerAgent service start automatically whenever the Windows NT Server is started.
The SQLServerAgent Monitor provides self-checking for SQL Server Agent. If the SQLServerAgent service stops unexpectedly, SQLServerAgent Monitor can attempt to restart the service. Enable SQLServerAgent Monitor through SQL Server Enterprise Manager as shown in Figure 13.10, or by running the xp_sqlagent_monitor extended stored procedure. When SQL Server Agent Monitor restarts the SQL Server service, an error is written to the Windows NT application log, making it possible to configure an alert to fire when the service is restarted.
Figure 13.10 The Advanced tab of the SQL Server Agent Properties dialog box, showing the Restart Services options
If jobs are not executing as scheduled, alerts are not being fired, or operators are not receiving notifications, verify that the job, schedule, alert, or operator is enabled.
The fact that a job, schedule, alert, or operator is disabled is not recorded in the SQL Server Agent error log or the job history.
An operating system command or Active Scripting job step in a job owned by a user who is not a member of the sysadmin role cannot execute under the following conditions:
NOTE
If a job step fails for this reason, you will probably not see an entry in the SQL Server Agent Error Log. The error will be recorded in the job step history.
Reviewing error messages in the Windows NT application log and the SQL Server Agent and SQL Server error logs may help you troubleshoot the source of your problem.
If the maximum size of the Windows NT application log is too small or is defined to be overwritten frequently, events may not exist for SQL Server to process. To avoid losing event information about SQL Server, increase the maximum log size for the Windows NT application log. Use Windows NT Event Viewer to check that events are being recorded for SQL Server. SQL Server Agent fires alerts only for events that are recorded in the Windows NT application log.
SQL Server Agent errors are recorded in the SQL Server Agent error log. By default, all SQL Server Agent errors and warnings are recorded in the error log. You can also record execution trace messages when troubleshooting specific problems; this can cause the log to become large, however, so it should not be enabled during normal operation. Note the following:
You should also review the SQL Server error log. By comparing the dates and times for events in the SQL Server error log, the SQL Server Agent error log, and the Windows NT application log, you can narrow down the list of probable causes of problems.
After you review the SQL Server Agent error log, you also may want to review history information on an alert, an operator, or a job. The date and time of the single most recent action is recorded for alerts and operators. Full job history is captured in the msdb database.
The maximum amount of job history information can be resized. Having a full sysjobhistory system table can cause alerts and jobs to fail. If you must keep large amounts of job history information, you should expand the msdb database to a size that is sufficient to accommodate the growth.
TIP
Some job errors are not written in the SQL Server Agent error log but do get written in the job history, so you should check the job history even if the SQL Server Agent error log shows no error.
In this exercise, you will rename the data transfer file to correct the simulated failure from a previous exercise.
What information is displayed in the output file?
Answer
If e-mail or pager notifications are not working, verify that the mail client is working properly. To do so, log on to the mail client, using the SQL Server Agent domain user account, and send an e-mail or pager notification to an operator.
Because SQL Server Agent both depends on and monitors SQL Server events, it can become caught in an endless loop, firing the same alert repeatedly. This generally occurs when SQL Server runs out of an essential global resource and an alert has been defined on this event.
A looping condition occurs when SQL Server Agent fires an alert that attempts to execute a job. Executing the job in turn causes the same error that originally fired the alert. This causes the job to be executed again, and so on.
Signs that looping alerts may be occurring are that the Windows NT application log rapidly fills with the same error, the CPU use is unusually high, or the number of alert responses is high.
Under these conditions, the delay increases between when the event appears in the Windows NT application log and when SQL Server Agent responds to that event. This causes a backlog of alerts.
When looping alerts occur, resolve the error condition and clear the backlog of events from the Windows NT application log. You can do this in one of the following ways:
TIP
The default delay between responses is 0 for alerts added with the Create Alert Wizard or the sp_add_alert system stored procedure. In most cases, it is recommended that you change this to 1 minute after using the wizard to create alerts.
To create an error that is non-alert-generating, you must modify the Windows NT registry. The result of this modification is that SQL Server Agent will not fire the alert when the error occurs.
Use this solution as a last resort only. Refer to SQL Server Books Online for information on how to configure non-alert-generating errors.
Most problems with automated administration are caused by accounts or services that are configured incorrectly or by services that are not running. When errors occur, you can check the SQL Server and SQL Server Agent error logs as well as the job histories to determine the source of the problem. Planning your alerts carefully will prevent looping alerts from occurring.