Troubleshooting Clusters

Before you begin to troubleshoot the cluster, make sure you have at a minimum Service Pack 3 for Windows 2000 and SQL Server installed. (See 'Service Pack Installations' later in this chapter.) Most clustering issues have nothing to do with SQL Server at all and troubleshooting needs to begin at the hardware level. The basic order of items to troubleshoot in a cluster is

  1. Hardware

  2. Operating system

  3. Network

  4. Security and permissions

  5. MSCS

  6. SQL Server

  7. Any other application

To diagnose the cluster itself, start with the Event Viewer and look for problem events. One event that can be ignored is Event ID 2506, which shows up as an error in the Application Event Log. This error is a bug and should actually be listed as informational. If nothing stands out there, you can move on to the cluster logs, which are located typically in the \winnt\cluster directory:

  • Cluster.log The main cluster log, which traps nearly every event in the cluster.

  • Sqlstpn.log The SQL Server setup log, where n in the filename represents a sequential number of setup attempts.

  • Sqlclstr.log The log of clustered SQL Servers.

Tip 

You can also go to the command prompt and type SET CLUSTERLOG to determine where the cluster logs are located.

One common problem I see is when an administrator removes the cluster account from the login list on the SQL Server. If you remove the login that starts the cluster service, the cluster will not be able to connect to the SQL Server and perform IsAlive checks. MSCS will then think that the SQL Server service has failed because MSCS can't connect to it. The only way around this quandary is to start SQL Server outside Cluster Administrator and add the login back to the SQL Server.

Rebuilding the Master Database

A common question I see on newsgroups is how to rebuild the master database in a cluster. To do that requires that you have the SQL Server CD or shared installation files handy and follow these steps:

  1. Go to the node that currently owns SQL Server.

  2. Stop the SQL Server resource.

  3. If you're using the SQL Server CD, copy all the install files from the CD to the local hard drive and remove the read-only attribute for the files in Windows.

  4. Execute rebuildm.exe and point it to the installation files on the hard drive.

  5. Click Windows Collation or SQL Collation.

  6. After the program completes, ensure that the SQL Server resource can be brought online by starting the resources.

  7. Restore any user databases.

    Note 

    For more information on this topic, you can read Microsoft KB article Q298568.




SQL Server 2000 for Experienced DBAs
ppk on JavaScript, 1/e
ISBN: 72227885
EAN: 2147483647
Year: 2005
Pages: 126

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net