Lesson 4: Troubleshooting SQL Server Automated Administration | Microsoft SQL Server 7.0 System Administration Training Kit

[Previous] [Next]

If your automated jobs, alerts, or notifications are not working properly, use the guidelines found in this lesson to isolate and solve the problem.

After this lesson, you will be able to

Troubleshoot potential problems when jobs or alerts do not execute as anticipated

Estimated lesson time: 30 minutes

Automated Administration Troubleshooting Checklist

Each of the following items is discussed in turn in:

Verify that SQL Server Agent has been started.

Verify that the job, schedule, alert, or operator is enabled.

Ensure that the SQLAgentCmdExec account is correctly configured.

Review error logs.

Review job history.

Verify that the mail client is working properly.

Verify That SQL Server Agent Has Been Started

If SQL Server Agent has stopped for any reason, it is not able to execute jobs or fire alerts.

You should consider having the SQLServerAgent service start automatically whenever the Windows NT Server is started.

The SQLServerAgent Monitor provides self-checking for SQL Server Agent. If the SQLServerAgent service stops unexpectedly, SQLServerAgent Monitor can attempt to restart the service. Enable SQLServerAgent Monitor through SQL Server Enterprise Manager as shown in Figure 13.10, or by running the xp_sqlagent_monitor extended stored procedure. When SQL Server Agent Monitor restarts the SQL Server service, an error is written to the Windows NT application log, making it possible to configure an alert to fire when the service is restarted.

Figure 13.10 The Advanced tab of the SQL Server Agent Properties dialog box, showing the Restart Services options

Verify That the Job, Schedule, Alert, or Operator Is Enabled

If jobs are not executing as scheduled, alerts are not being fired, or operators are not receiving notifications, verify that the job, schedule, alert, or operator is enabled.

The fact that a job, schedule, alert, or operator is disabled is not recorded in the SQL Server Agent error log or the job history.

Ensure That the SQLAgentCmdExec Account Is Correctly Configured

An operating system command or Active Scripting job step in a job owned by a user who is not a member of the sysadmin role cannot execute under the following conditions:

If the SQLAgentCmdExec account has been improperly installed

If the SQLAgentCmdExec account has been removed

If changes to the SQLAgentCmdExec account password have been made outside of the SQL Server Agent Properties dialog box

If SQL Server Agent has been set to allow only members of sysadmin to execute CmdExec and Active Scripting job steps

NOTE
If a job step fails for this reason, you will probably not see an entry in the SQL Server Agent Error Log. The error will be recorded in the job step history.

Review Error Logs

Reviewing error messages in the Windows NT application log and the SQL Server Agent and SQL Server error logs may help you troubleshoot the source of your problem.

The Windows NT Application Log

If the maximum size of the Windows NT application log is too small or is defined to be overwritten frequently, events may not exist for SQL Server to process. To avoid losing event information about SQL Server, increase the maximum log size for the Windows NT application log. Use Windows NT Event Viewer to check that events are being recorded for SQL Server. SQL Server Agent fires alerts only for events that are recorded in the Windows NT application log.

The SQL Server Agent Error Log

SQL Server Agent errors are recorded in the SQL Server Agent error log. By default, all SQL Server Agent errors and warnings are recorded in the error log. You can also record execution trace messages when troubleshooting specific problems; this can cause the log to become large, however, so it should not be enabled during normal operation. Note the following:

Each time SQL Server Agent is stopped and restarted, a new error log is created.

You can view the current error log with SQL Server Enterprise Manager or any text editor. Up to nine previous versions of the error log are saved in the C:\Mssql7\Log directory.

You can define an error message pop-up recipient to be sent a net send command when errors are logged into the SQL Server Agent error log.

The SQL Server Error Log

You should also review the SQL Server error log. By comparing the dates and times for events in the SQL Server error log, the SQL Server Agent error log, and the Windows NT application log, you can narrow down the list of probable causes of problems.

Review Job History

After you review the SQL Server Agent error log, you also may want to review history information on an alert, an operator, or a job. The date and time of the single most recent action is recorded for alerts and operators. Full job history is captured in the msdb database.

The maximum amount of job history information can be resized. Having a full sysjobhistory system table can cause alerts and jobs to fail. If you must keep large amounts of job history information, you should expand the msdb database to a size that is sufficient to accommodate the growth.

TIP
Some job errors are not written in the SQL Server Agent error log but do get written in the job history, so you should check the job history even if the SQL Server Agent error log shows no error.

Exercise: Correcting a Problem

In this exercise, you will rename the data transfer file to correct the simulated failure from a previous exercise.

To correct a problem and verify that a job works

Open Windows NT Explorer.

Rename the C:\Sqladmin\Exercise\Ch13\Prods.sav file to C:\Sqladmin\Exercise \Ch13\Prods.txt

Switch to SQL Server Enterprise Manager.

Right-click the StudyNwind Monthly Data Transfer job, and then click Start Job. The Start Job dialog box allows you to start a job at any step. Click Start to have the job execute starting with the first step.

In the details pane, right-click the StudyNwind Monthly Data Transfer job, and then click View Job History. Verify that the job completed successfully.

Open Notepad, and then open C:\Temp\Prodcopy.txt.

What information is displayed in the output file?

Answer

Verify That the Mail Client Is Working Properly

If e-mail or pager notifications are not working, verify that the mail client is working properly. To do so, log on to the mail client, using the SQL Server Agent domain user account, and send an e-mail or pager notification to an operator.

Troubleshooting Alerts

Because SQL Server Agent both depends on and monitors SQL Server events, it can become caught in an endless loop, firing the same alert repeatedly. This generally occurs when SQL Server runs out of an essential global resource and an alert has been defined on this event.

A looping condition occurs when SQL Server Agent fires an alert that attempts to execute a job. Executing the job in turn causes the same error that originally fired the alert. This causes the job to be executed again, and so on.

Signs that looping alerts may be occurring are that the Windows NT application log rapidly fills with the same error, the CPU use is unusually high, or the number of alert responses is high.

Under these conditions, the delay increases between when the event appears in the Windows NT application log and when SQL Server Agent responds to that event. This causes a backlog of alerts.

Resolving Looping Alerts

When looping alerts occur, resolve the error condition and clear the backlog of events from the Windows NT application log. You can do this in one of the following ways:

Use Windows NT Event Viewer to clear the Windows NT application log. If you do this, all events, including those not generated by SQL Server, are cleared. You should try to resolve the alert backlog by other means if you do not want to lose your current Windows NT application log.

SQL Server Agent processes an alert only once within a defined period of time. This time defaults to one minute, so SQL Server Agent will process the same alert only once every minute, no matter how many times the error on which the alert is based occurs. Use the Delay Between Responses option for each alert to set this time. A longer delay reduces the number of alerts that can fire due to any one condition and will decrease the number of alerts fired when looping alerts are occurring.

TIP
The default delay between responses is 0 for alerts added with the Create Alert Wizard or the sp_add_alert system stored procedure. In most cases, it is recommended that you change this to 1 minute after using the wizard to create alerts.

To prevent recurring alerts on specific error numbers from consuming all of your resources, you can define them as non-alert-generating.

To create an error that is non-alert-generating, you must modify the Windows NT registry. The result of this modification is that SQL Server Agent will not fire the alert when the error occurs.

Use this solution as a last resort only. Refer to SQL Server Books Online for information on how to configure non-alert-generating errors.

Lesson Summary

Most problems with automated administration are caused by accounts or services that are configured incorrectly or by services that are not running. When errors occur, you can check the SQL Server and SQL Server Agent error logs as well as the job histories to determine the source of the problem. Planning your alerts carefully will prevent looping alerts from occurring.