7.1. SQL Server Enterprise ManagerTo start SQL Enterprise Manager on the MOM database server, select Start Programs Microsoft SQL Server Enterprise Manager. Surprisingly enough, the first time you launch this tool, it will not have a connection to a database serveryou have to register a SQL Server in the console to manage it. To register a server:
Once a server has been successfully registered in the console, you can expand it to the folder level and select the databases folder . In here you will see the OnePoint database as well as the master, msdb, model, and tempdb databases. These last four databases are the system databases and should not be tampered with. Figure 7-2 shows the taskpad view of the OnePoint database in SQL Enterprise Manager. To open the taskpad view, right-click the OnePoint database and select View Taskpad view. Figure 7-2. Taskpad view of the OnePoint databaseMany of the administrative tasks are accessible in the taskpad view. The only other folder that you really need to be concerned about is the Management folder, which contains the SQL jobs that are used to maintain OnePoint, the backup utility, and the database maintenance plans. These will be covered in the "Backing Up SQL Databases" and "Restoring SQL Databases" sections later in this chapter. In the taskpad view, you are immediately given valuable information regarding the size of the database, the space available, and the date and time of the last backups. In Figure 7-2, the database size is listed as 1,200 MB, which is a combination of the database file and the transaction logfile. 7.1.1. Basic SQL Database ConceptsThe OnePoint database actually consists of two files that are located in the SQL install directoryC:\Program Files\Microsoft SQL Server\MSSQL\Data. The database file is EeaData.mdf and its transaction logfile is EeaLog.ldf . Both of these files should be the size specified at setup. When you start planning your backups, you must include the transaction logfile in the backups. To understand why, you need to understand a little about how transactional databases work. The OnePoint database file, EeaData.mdf, consists of about 375 tables . For the sake of this discussion, you can think of a table as a spreadsheet. A row in the table is one record and a column in the table is a value that is a member of all the records. The relational database engine (SQL Server) is constantly reading and writing and working with this data as processes that are outside of SQL interact with the database. If these processes were allowed to directly read, write, or delete records in the database and for some reason the connection was interrupted during the operation, the record that process was working with would become corrupted. This does not create a very stable database. So, instead of allowing direct access to the database tables, all interactions are first written to the transaction log. These transactions are then committed to the tables by the database engine itself. In this way, the transaction log proxies and buffers communications between the outside world and the database. The way SQL Server uses the data in the transaction logs provides an additional layer of protection and reliability. Basically, each transaction is made up of multiple steps, which all need to be completed successfully for the whole transaction to be considered successful. This is called an atomic transaction, which means that every step needed to complete a transaction is contained in the transaction itself. Here's an example that, while not database-related, is useful for understanding the concept of self-containment or atomicity. Say you want to move a file from one drive to another in Windows Explorer. First you select the file, then you select Cut, then you navigate to the destination directory and select Paste. The OS then performs several separate actions to complete the move. The first step for the OS is to check the destination directory to see if there is an existing file with the same name and return an error if the file already exists. Assuming the filename is not found, the OS reads the file and writes it to the destination directory (which is actually a copy action), and then deletes it from the source directory. If any one of these steps fails, the operation fails. To protect the original document, it is not deleted from the source directory unless all the preceding steps have completed successfully. So, if the preceding steps fail it is as if the entire process never occurred at all. SQL Server tracks the success or failure of the individual steps in committing a transaction to the database and if any of them fail, the transaction as a whole fails. SQL server will then basically undo each of the steps, thereby rolling back the database to its original state before the transaction started. This is how the database is kept in a non-corrupted or consistent state. What all of this means is that at any given time, the data that represents the OnePoint database is actually split between the database file and its transaction logfile. This means you have to back them both up. Because these are physically separate files, for purposes of performance and robustness, it is wise to distribute these files across separate physical drives, preferably those that participate in one of the common fault-tolerant RAID array configurations. 7.1.2. System DatabasesThere are other databases created by the default installation of SQL Server as shown in Figure 7-2. These are the four system databases . Of these four, the master and msdb databases and their transaction logs should be included in your backup plans. Briefly, here is what they do:
The order in which the system and MOM 2005 databases need to be restored for a complete system restore is covered in the "Restoring SQL Databases" section later in this chapter. 7.1.3. OnePoint SQL Maintenance JobsWhen the OnePoint database is installed, by default nine MOM-specific SQL jobs are created. They are located in the Management folder SQL Server Agent Jobs container (Figure 7-3). If you look in this container on a MOM database server that also has MOM Reporting installed, you will find that an additional grooming job is added for the data warehouse, bringing the total to 10. Of these nine jobs, the MOMX Partitioning And Grooming, the OnePoint - Reindex, and the Update Database are the most important to you. SQL jobs are basically wrappers around code that you want executed under certain conditions. Those conditions could be in response to a SQL alert (which is entirely different than a MOM alert and not relevant to this discussion), on a schedule, or when the SQL server processor reaches a certain user-defined level of utilization. For example, OnePoint jobs are all scheduled, although they can all be run manually, such as a task in the Windows Task Scheduler. A single job can consist of multiple steps (with each step consisting of a different set of instructions) that occur in a specific order. The execution of one step can be made conditional on the success (or failure, depending on the logic built into the job) of a prior step. At the completion of each step, the job reports the success or failure to a SQL operator. The OnePoint jobs are all configured to write an event to the Windows event log. As you can probably guess, these specific events are being watched for by the MOM agent on the database server and generate an alert if a failure is reported. Figure 7-3. SQL jobs for maintaining OnePointThe steps in these jobs all call SQL-stored procedures, which are located back under the OnePoint database object. For example, the MOMX Partitioning and Grooming job consists of two stepspartitioning and grooming. Each of these steps calls a SQL-stored procedure with a simple command like EXEC dbo.MOMXPartitioning. This Transact SQL statement is very straightforward and tells the SQL Server Agent to execute the MOMX Partitioning-stored procedure. The stored procedure then is a much longer set of instructions also written in Transact SQL syntax. 7.1.3.1. OnePoint - Update DatabaseBy default, the OnePoint database runs every hour on the half hour and it automatically resolves alerts that have aged beyond the time limit defined in the grooming global settings, which by default is four days. Once an alert's resolution state is set to resolved, it can be deleted from the OnePoint database by the MOMX Partitioning And Grooming job. 7.1.3.2. MOMX Partitioning And GroomingThis is the job that actually deletes data from the OnePoint database, the yin to the yang of the influx of event, alert, and performance dataflow. If the database grooming is configured correctly, there will be a harmonious balance and your database will achieve a Zen-like state of serenity. The job first partitions the data in the database by date, then calculates the cutoff time, and drops the data older than the cutoff point by partitions. This makes for very fast grooming because SQL does not need to search for individual records to delete, it just drops whole tables. By default, this job runs at midnight daily. If it fails, an event ID 208 is written to the application event log and is picked up by MOM. If for some reason you find that your database has exceeded 60% utilized space and that the weekly reindex job is failing due to lack of free space, you can run this job manually by right-clicking on it and selecting Start job. In an emergency grooming situation, before you run this job you should reset the "Groom data older than the following number of days" value in the Administrator console from the default of four days to some lesser amount depending on how much data you wish to delete and how much you wish to retain. Don't forget to do a commit configuration change and then reset the value back to the pre-emergency state. 7.1.3.3. OnePoint - ReindexTo speed searching, SQL builds indexes for the tables in the OnePoint database. These indexes are rebuilt weekly by this job. This job requires 40% free space in the database, and if it fails there is no great harm done to the functionality of the OnePoint database in the short run. However, if it continually fails, database response time will increase. This job is scheduled for 3:00 a.m. on Sundays, so look for specific alerts in the Operator console referring to application event ID 208 that cite the failure of this job when you get into work on Monday mornings. This job refers to the database maintenance plan OnePointReindex . It is unlikely that you will have to touch or run the other jobs manually. Briefly, here is what they do:
7.1.4. Tuning OnePoint Database GroomingChapter 5 introduced the database grooming controls that are available in the global settings of the Administrator console. The default setting is to groom out data that is more than four days old. To figure out if this setting is correct for your organization, you will need to perform a few tasks on a daily basis. The first thing is to define what "correct" is, and you should already have a good idea based on database sizing calculations. In those calculations, you estimated the amount of data that would be flowing into the database on a daily basis and stated the amount of time you wanted to retain that data before deleting it. This number is an absolute limit because the database is configured not to allow automatic growth, so monitoring the total database size will not tell you how much of the database is actually used. The value that you need to watch is the percentage used. If your grooming values are correctly set for your environment, the percentage used will remain fairly constant over time. Of course, there will be spikes, especially if you deploy a large number of new agents or deploy a new management pack. You may need to make adjustments to the database size for these deployments. The MOM 2005 management pack includes performance rules and event rules that monitor the database free space, the status of the database, and the success/failure status of the SQL maintenance jobs; see Table 7-1.
These monitoring rules are valuable to have, and you should never ignore an alert they generate, but rules can only respond to a condition that has already occurred. These rules do not give you the information needed to proactively tune your grooming settings, which is what the database free space measurement is. Obviously, the quickest way to find out your database free space is to look it up in the taskpad view of the SQL Enterprise manager, but that only gives you a single point in time view. Preferable is a preconfigured performance view (see Figure 7-4) in the Operator console, which is found in the Performance view group Microsoft Operations Manager Operations Manager 2005 Database Performance Operational Database Free Space. Figure 7-4. A view over the past two hours of the OnePoint database free spaceThis performance chart displays the last two hours of data, but you can display more by modifying the time filter (see Figure 7-5). You can go back as far as the data is retained in the OnePoint database. This control (point 1 in Figure 7-5) is available on the view toolbar. With this tool you can set the time range for the interval between two points or simply for the past number of hours up to the limit of the retained data. The past 24 hours of data is viewed here (point 2 in Figure 7-5). You can save these graphs to the clipboard (right-click on the graph and select "copy to clipboard") if you want to keep a historical record of database free space to establish a baseline to measure against. Figure 7-5. Adjust the time range of the data you want to see with this filterIf you find that over time the amount of free space is steadily shrinking and it is starting to approach the 40% free space threshold, then this is an indication that more data is flowing in than is being groomed out on a nightly basis. Start by reducing the grooming setting by one day (to three) and carefully watch the results. In addition to monitoring the status of the database, the success or failure of jobs, and the amount of free space in the database, you should know what types of data are waiting to be groomed out and how much data there is. With this knowledge, you can decide to adjust the auto-resolve values for alerts in the database grooming settings. You do this by interacting directly with the database via the SQL Query Analyzer . This tool allows the targeting of Transact SQL (TSQL) statements against specific databases or tables. You can access this tool either through the start menu or through the wizard page of the taskpad view of the database in the SQL Enterprise Manager. If you launch the SQL Query Analyzer from the OnePoint taskpad view (see Figure 7-6), the focus of the analyzer is automatically set to the OnePoint database. Using this method ensures that the analyzer is connected to the desired database on the desired server, which is an advantage in an environment with multiple SQL Servers, each with many databases. Figure 7-6. Launching the SQL Query Analyzer from the SQL Enterprise Manager guarantees correct focusThe SQL Query Analyzer is a sophisticated tool with powerful features, but as a MOM administrator, there are five basic parts you need to be familiar with (see Figure 7-7):
In the MOM 2005 Operations Guide and in the MOM 2005 Resource Kit, Microsoft has included a few SQL queries that you can use to find out the number of alerts, events, and individual alerts that are waiting to be groomed. In Figure 7-7, the TSQL statement shows how many alerts are ready to be groomed out from the OnePoint database on homemomserver3. Figure 7-7. The SQL Query Analyzer interfaceThis chapter does not teach you TSQL, but it is not too hard to decipher a simple query like the one in point 2:
This query happens to find that there are 132 alerts waiting to be groomed out (point 3 in Figure 7-7). You can run this type of query at any time, or you could even set up a SQL job to execute these on a regular basis. Collecting this type of information over time is useful because you can use it as a baseline for spotting trends in the type of data being groomed out. A sudden spike in the number of AlertsAwaitingGrooming can be a tip-off that one of two things has occurred: there was a spike in the number of alerts generated, or something has gone wrong in the grooming job and further investigation is required. All queries can be saved as a text file with a .sql extension so you don't have to type the TSQL statements every time you want to run them. By default, they are saved to the My Documents folder of the machine you are running the SQL Query Analyzer on. The "Enumeration of all alerts awaiting grooming" is another query that can give you further insight on the types and volume of data waiting to be groomed out. This query outputs the name, description, alert level, repeat count, who resolved the alert, and the resolution time for each individual alert that is waiting to be groomed. SELECT Name, Description, AlertLevel, RepeatCount, ResolvedBy, TimeResolved FROM dbo.Alert Where (ResolutionState=255) Once you have your grooming setting tuned, you only need to keep tabs regularly on these values for trending purposes. You shouldn't adjust the grooming settings unless the data you are collecting indicates that you should. The OnePoint database and other critical information on your MOM database server should be backed up on daily basis. The next section in this chapter outlines what you need to back up and why, how to perform the backups, and how to perform a restore. 7.1.5. Backing Up SQL DatabasesEveryone knows it is important to have a backup of your data/OS/systems configuration in case of hardware failure, data corruption, etc. The backup restores the system to its pre-failure state and hopefully picks up exactly where you left off. So, consider this the standard "Do your backups" admonishment. Telling an IT professional that he needs to do backups is like having to tell a builder that she needs to put a roof on a house. The curious thing about recovery with MOM is that it is more important to preserve the configuration of the management group and all of its components than it is to preserve the operational data. This is because all of the operational data was generated by the monitored environment, so if the OnePoint database is lost, all of the conditions that caused the alerts to be generated would still exist and the alerts would generate again once MOM is running again. On the other hand, the configuration data tuned the management packs, and all of the historical data only exists in the OnePoint database. If you lose it and you have no backups saved anywhere else (such as in a synchronized preproduction environment), then you will have to start your MOM implementation from scratch. That being said, since the two types of data are in the same database, you get both for the cost of a single database backup. In addition to backing up the OnePoint database, you must also capture backups of your management packs and report definitions (if you have reporting installed), the master and msdb systems databases , custom files like ManualMC.txt, and any file transfer server files. Critically important, but seldom remembered, is to capture the OS configuration for the management servers and database servers. Specifically, the physical and logical drive configuration, the accounts and permissions with local rights, and the accounts used for the DAS account and the management server action account. Also, you need to track the permissions granted to SQL logins. The next section talks about the planning steps for a backup and how to actually perform a backup using a combination of SQL and system backup tools . The restore process is discussed and the necessary steps outlined. Now, for the standard backup/restore disclaimer: you have to adapt the following information for your environment. Test your backup plan by taking your production backups and using them to re-create your management groups in an isolated environment. If you don't do this for practice, then the only time you will test these procedures is in production during an outage, when you are under quite a bit of stress. 7.1.5.1. Backup toolsTo get a complete backup of all components that will be needed for recovery, you need to use both the SQL backup utility and an OS-level backup utility. For the purposes of this discussion, the utilities that are built into the products will be used. There are many third-party SQL/OS backup solutions from companies such as BMC, VERITAS (recently purchased by Symantec), CA, IBM (Tivoli), and so on. You should consider these tools when you need to back up databases in addition to the OnePoint and reporting databases. 7.1.5.2. What to back upThe first step in planning your MOM backups is to identify what you need to back up:
7.1.5.3. When not to back upIn general, you want to avoid scheduling your database backups during times when the database is especially busy, or when a process puts a lock on a database table, such as when the grooming SQL jobs run. The following are the jobs and their schedules that you want to avoid. These are the default times, so if any of these have changed in your environment be sure to check the job schedules in SQL Enterprise Manager Management SQL Server Agent Jobs.
7.1.5.4. Creating a SQL backup jobLooking at the times listed in the previous section, it seems that the best time for the SQL database backups to start is around 8:00 p.m. For the sake of speed, the SQL backups will be configured to write to disk-based backup devices on the SQL Server itself. This means that there must be enough local disk space to accommodate this. It is a best practice to write the SQL backups to a different logical disk than the ones the database and transaction logs are on. After the backups complete successfully, create and schedule an OS-level backup that captures the SQL backups along with any other custom files that may exist on the SQL Server or management servers. This OS-level backup will write to a network share, thus copying the SQL backups off of the SQL Server itself. Before creating and scheduling the SQL backup jobs, some preparation work needs to be done. This involves defining the backup routine and creating SQL backup devices. In SQL, a backup device refers to a physical object like a tape drive or a file on the local hard drive. Regardless if you are backing up to disk (a file) or a tape (an actual physical device), it is useful to create backup devices because in SQL they are recorded in the master database and are, therefore, persistent and reusable. You will be able to refer to them when you create the backup jobs rather than having to supply a fully qualified path to a file along with the filename as the destination for the backup every time. Table 7-2 summarizes what will be backed up, the backup name, the frequency of the backup, and the backup device name. The plan is to take a full backup of the OnePoint database nightly and on Sunday nights take a full backup of OnePoint, master, and msdb databases.
Each nightly backup will have its own SQL job. This pattern will be repeated weekly, with each job overwriting the previous week's files. That means there should never be more than seven SQL backup files in the destination directory.
To create a backup device, open the SQL Enterprise Manager and navigate to the SQL Server Management Backup container. Open the context menu for the Backup container and select to create a new backup device. This brings up the backup device creation page (see Figure 7-8). Here, enter the name of the backup device as specified in Table 7-2. For this example, the Mon1PointFullBackupDevice is created. The default path for all backups, whether to a file or a device on disk is <file or device name>. For a device on disk, a file is created consisting of the device name appended with the .bak extension. Figure 7-8. Creating a reusable SQL backup device on diskRepeat this process for each of the backups defined in Table 7-2, resulting in nine separate backup devices. Now that the devices are defined, you can create the backup jobs. In SQL, the definition of what to back up, where to back it up to, and how often to perform the backup is all part of the same process. There are several different ways to create and schedule a backup. You can create the backup from the context menu of the database object, from the Database Taskpad View Wizards Backup a Database or from the Tools Wizards Management Backup a Database. For the sake of this example, the backup is created by bringing up the context menu for the Backup container and selecting the Backup a Database option (see Figure 7-9). This brings up a two-tab page in which you define all of the backup parameters. On the general tab (see Figure 7-10) make selections at the following points:
Once finished with the General tab, the values on the Options tab need to be edited (see Figure 7-14). To be sure that this backup can be read and written to successfully, select to verify the backup. In this example, the additional amount of time required for this is not a concern because the file size will be consistent and this is all happening on a fast disk. Figure 7-10. The general tab for creating a backupFigure 7-11. Selecting the backup device for the Monday night OnePoint full backupTo help keep the OnePoint transaction logs clean of transactions that have been successfully committed to the database, remove them from the transaction log. Figure 7-12. Name the schedule on this page, select recurring, and click nextFigure 7-13. Define the recurring schedule hereThe other settings that were not selected are not that relevant to this backup scenario; they are more useful if backing up to a tape device. But briefly here is what they do:
When the configuration process is completed, SQL will create a job and place it in the Jobs container along with all the other SQL jobs (see Figure 7-15). The next step is to create the same jobs for the rest of the week for the OnePoint backups. To speed this process up and to avoid manually re-creating these backup definitions should they be deleted or lost, SQL will generate a script that will create the same backup job when run from the SQL Query Analyzer against this server. Once the script generates, edit it with Notepad (you can also use the SQL Query Analyzer tool if that is your preference). Perform a find on Mon1PointFullBackup and replace it with Tues1PointFullBackup. Replace the Monday in the description with Tuesday and so on for the rest of the days in the week. Figure 7-15. The completed Mon1PointFullBackup jobTo do this, open the context menu for the Mon1PointFullBackup object in the Jobs container and select "All tasksGenerate SQL script." This brings up the Generate SQL Script page (see Figure 7-16), where all you need to enter is a path and filename to save the script as. Figure 7-16. Saving the backup scriptClick the ellipsis and accept the default path. The file is named Mon1PointFullBackup and the .sql extension is automatically appended. Next, perform the editing of the text file and save one for each day of the week. Open and run the Tues, Weds, Thurs, Fri, Sat, and Sun versions of the files in the SQL Query Analyzer and the other jobs are created automatically. The last step is to open the job object and set the scheduled day of the week to the desired setting. This feature is a real time saver. To round out the backup job creation, create the backup for the master and msdb databases by using the process shown in Figures 7-10 through 7-14. These jobs will run on schedule and output the <jobname>.bak file on the local disk. If you look at these, you will see that they are significantly smaller than the total database size, which is a good thing because the next step is to create an OS-level backup job that grabs these files and backs them up across the network to a share. 7.1.5.5. Creating OS-level backupsThis portion of the backup scheme uses the Windows Backup utility to back up the .bak files (SQL backup devices) from the SQL server to a network share. The use of this interface is already well documented so this next example will just go through the creation of one of these tasks. To start, on the MOM database server open the Windows Backup utility in the Backup wizard mode.
Completing this process will create a job in the scheduled tasks. When this runs (and you can kick it off manually by right-clicking the task and selecting Run), the backup cycle for the databases (and any other files that have been included) will be completed. 7.1.6. Restoring SQL DatabasesRestoring a database in SQL, when SQL is healthy, is about as painless a process as you can imagine. By using the method outlined in the last section, SQL has recorded every backup action taken. This includes the name and location of the backup file/device, the type of backup, when it was done, and under what credentials it executed. SQL tracks this information for each database in the msdb system database. Thus, when you start the restore process for any given database, SQL will tell you which backup set or sets to use, where those backups are, and in what order to apply them. Being able to restore the OnePoint database this easily is only possible if both the msdb and master system databases are up and healthy. As a MOM administrator, you must be prepared to recover from a OnePoint-only failure and a failure that impacts the SQL system databases. 7.1.6.1. Restoring OnePointThe first step in the process of restoring the OnePoint database and its transaction log is to stop all communications with the OnePoint database while the restore is in progress. You can do this by stopping the MOM service on all management servers and on the database server. You should also make sure that the reporting DTS scheduled task is disabled if the restore is performed around the time that the operations-to-warehouse data copy is supposed to take place. Keep in mind that none of the agents are stopped on the managed computers. They will continue to process rules and responses as well as cache operations data locally up to a configurable limit (3 MB by default) until the agents are able to communicate with their management servers again. Another way to isolate the OnePoint database from the rest of the management groups is to take it offline. You do this by bringing up the context menu for the OnePoint database in SQL Enterprise Manager, selecting All Tasks and selecting "Take offline." The database icon will then be grayed out and be marked as offline (see Figure 7-17). Figure 7-17. The OnePoint database taken offlineThis method of database isolation allows the management servers to continue receiving operational data from their agents. In addition, if there are multiple management servers in the management group, agent failover is configured automatically. So, if both management servers become unavailable at the same time, the agents will continually try to failover to the alternate management server. This will generate a great deal of event log data on the agent-managed machines as the agents ping-pong between the management servers trying to find one that is available. By keeping both of them up, communications between the agents and management servers are never interrupted. Once the OnePoint database is offline, bring up the context menu, and select All Tasks Restore Database. SQL will then retrieve the backup information from the msdb database and the list of backups that represent the quickest path to a fully restored and functioning database will automatically be listed and selected for the restore operations (see Figure 7-18). There are no options to be configured on the General tab, but on the Options tab select the "Force restore over existing database" option to ensure that the old, damaged database is overwritten (see Figure 7-19). Figure 7-18. Restoring the OnePoint databaseFigure 7-19. Options for a restore jobBy selecting the "Leave database operational" option in the "Recovery completion state" box, the OnePoint database will automatically be brought back online when the restore completes successfully. To start the restore, click OK. SQL Enterprise Manager will display a progress box for the restore process and notify you when it has completed successfully. 7.1.6.2. Restoring system databasesRestoring the system databases is only slightly more complex than restoring user databases like OnePoint. This type of restore is required if the msdb and/or the master databases become corrupted or you are recovering from the loss of a whole server. Detailed discussion of the steps for whole server recovery is beyond the scope of this book, but basically the steps include:
You can restore the system databases once you have reached the point in the rebuild process that you can start SQL, even if the master and msdb are corrupted and SQL never went down. When you install SQL on the recovery server, new copies of the master and msdb in their default configurations are created. Therefore, the master will be lacking any SQL logins that had been defined and it will have no record of the other user databases. The msdb will only contain records of any default jobs. So, even though they are there, they are not really in a usable state. The restore process is below:
Repeat this process for the other system databases if required and then perform the restore of the OnePoint database following the previously outlined procedures. |