Chapters 9 and 10, Database Environment Basics for Recovery and Implementing Backup and Restore, covered backup and restore from a high availability standpoint. One topic not covered, however, was the restore of the system databases, which are critical for the successful recovery of your user databases in a disaster recovery scenario.
|More Info|| |
For specific disaster recovery scenarios with clusters, see Chapter 5, Designing Highly Available Microsoft Windows Servers, Chapter 6, Microsoft SQL Server 2000 Failover Clustering, and Chapter 8, Replication. Use that information in conjunction with the information presented here to help recover your systems completely. This section deals mainly with the general SQL Server failure condition.
Again, it cannot be stressed enough that unless yours is a very experienced shop and you have seen this type of failure before and know with certainty how to handle it, it is crucial to know what procedures to follow. Ultimately, the process is best led by a support professional or senior staff member who knows what he or she is doing. Performing any of the work in the following list improperly could worsen your situation.
You should be aware of some important characteristics of system databases:
The default recovery model on system databases is Simple. Although this can be changed, you should leave the default unless you have a good reason to change it. Because the recovery model is Simple, you cannot do point-in-time recovery. This is one of the main reasons that anytime you make a system change (for example, adding new logins or adding or growing databases), you should back them up.
Tempdb is rebuilt every time SQL Server is started, so not only can it not be backed up, but it is not necessary. It will be regenerated to the size specified on configuration.
It is possible to rebuild master, but not any other database. When you are rebuilding the master database, users cannot be on the system. This is not the case when user databases are being restored.
There are further distinctions, but this list is a brief summary of the basic factors that make a rebuild different from other restores .
As important as your user databases are, the system databases are just as ”if not more ”important to protect. If you lose your master database, SQL Server ceases to operate correctly. This is one of the most basic premises of proper design in architecting your high availability solution. Just as you should separate your data volumes from your transaction log, you should also separate your system databases (master, msdb) onto a more secure area of your system protected by RAID. Prevention is much easier than a rebuild from the ground up.
System databases are small in comparison to user databases. They range in size depending on system usage. Due to their smaller size, but their importance to SQL Server availability, you should back them up every night along with your user databases. Additionally, scheduling a DBCC CHECKDB on them periodically helps confirm the overall health of SQL Server, too. The work involved in doing this test is a very small price to pay for the long- term benefits.
Many times when a system crashes, everything is not completely lost. There are reasonable starting places other than a total flatten and rebuild. It is in your best interests to survey the damage first and then put together a plan of attack.
The five categories that most issues fall into are as follows :
Hardware failures This includes CPU, memory, and a media failure (that is, a disk drive failing, even if you have RAID).
Logical failures This is the human error in which bad data would be inserted or a DBA does something wrong, like accidentally dropping a critical table.
When making large or critical system changes, make sure you not only make a backup prior to starting, but when executing the Transact -SQL statement, begin the SQL command with a BEGIN TRAN. If all is successful, then you can commit the transaction; if something incorrect happened (for example, the DBA forgot to put a WHERE clause on a DELETE statement), you can always rollback the transaction without damage.
Operating system failures This is a problem that you cannot do much about, because it is external to SQL Server. The kernel fails to complete a function. You can only keep up with service packs and appropriate hotfixes for your environment.
Device driver failures A driver for a particular piece of hardware malfunctions. Like an operating system failure, this is external to SQL Server, but affects it. A good example is a bad SAN driver on a cluster.
Low-level failures A low-level failure occurs when something abnormal happens to a SQL Server database. It can be situational and is usually rooted in hardware failure. For example, a power outage could cause a torn page or corruption. You would then need to perform the appropriate action, such as issuing a DBCC CHECKDB or restoring a full, good backup.
Use battery- backed-up controllers to help protect your data from power outages. Many controllers have extensive amounts of cache on them to increase performance. If you lose power, you have lost the data if there is no battery backup. Additionally, an interrupted write due to power loss can cause a torn page.
As part of any good game plan, you should make a backup of the server if possible. This could be a flat file operating system backup and a SQL Server backup. If PSS needs to be engaged, a before image of a corrupt database can offer valuable clues for rescuing data. If you destroy the original state of the data by restoring over it before backing it up, you have lost a good starting point for PSS.
In the same vein, people often find out the hard way that the backup they thought was good really was not, so they were in worse shape after the restore than before. If you made a backup at the operating system level with automatic system reconfiguration (ASR) before starting the restore process, you can always restart. Remember, the last good backup you made was the last one that was successfully restored.
Restore your backups often to help ensure their integrity and make yourself familiar with the restore process. It also helps to know how long a database restore takes. Users often want to know, Is it finished yet?
If you have any doubts about the process or do not feel comfortable with recovering your system, stop and engage PSS. Not performing any work can actually be a safer alternative than haphazardly beginning an unsuccessful rebuild.
Any ground up recovery starts with a good backup of the master and msdb databases. Make sure you always have current, known, good backups of your system databases and run health checks against them.
Before starting any reconstruction of a server, whether minor or full-blown, there are certain steps that need to be taken and items that need to be in place. First and foremost, ensure that you have the right person in place to execute any actions. Do not leave an important task to a rookie system engineer or DBA who does not know your systems well enough. The following steps offer fallback positions if you need to start over or engage other support personnel. They also help if you need to reconstruct at an off-site recovery location.
Refer to the documentation in your run book for your hardware configuration, which should include model numbers , the physical layout (including which card goes in what slot), configuration settings for each component, firmware revisions, driver versions, block size of the file system, and so on.
If your server is up, make an operating system-level backup of the server with SQL Server down, including ASR. This allows you to go back to a known state before attempting any rebuilds and gives you the ability to send the damaged data to PSS if needed.
After making the operating system backup, back up what you can using the standard SQL Server backup processes. Try and run a full backup, and also grab the tail of the transaction log. These allow you to recover to the point of failure.
If the database is accessible, run DBCC CHECKDB to assess the damage.
Running DBCC CHECKDB can take a long time depending on which options you select and the size of your database. This might not be appropriate for all environments and should be done at the recommendation of a support professional.
When you want to rebuild SQL Server, as with the server itself, a build sheet should be accessible in your run book. The information that should be included is listed in the section Run Book earlier in this chapter.
If you are restoring at a remote disaster recovery site, ensure you have compatible tape media hardware and the proper version of the restore software at that location, too. For example, you do not want to find out that your site uses Linear Tape- Open (LTO) technology and the disaster site only has Super DLT technology.
Your run book should also include software and license keys for all components (Windows, SQL Server, third-party applications, and so on) that will be rebuilt.
A connection to the Internet will help you gather service packs and hotfixes, as well as information such as the Knowledge Base articles on http://support.microsoft.com . You should probably pull all relevant service packs and hotfixes and burn them to a CD or DVD that can be easily accessed in a disaster recovery situation.
Reference your vendor call list and contract numbers in the run book. If your support contracts have expired , you might need a credit card handy in case you need to call in other vendors using phone support.
The key to success is having all this in place prior to a disaster.
The following are the basic steps that should be followed if you need to do a full recovery, or flatten the system, when all components need to be rebuilt:
Rebuild the system, starting with the operating system and any service packs, hotfixes, and so on. This should be done with the proper tools (such as NTBACKUP or a third-party utility, such as Veritas Backup Exec or any number of enterprise backup tools). Depending on the scope of damage, you might even need to low-level format your disks.
Test the operating system to ensure that it is configured properly.
Follow the steps detailed later to restore or recover system and user databases.
Restore full-text indexes (if necessary or possible).
Synchronize logins, and perform any other tasks (such as restoring custom objects with scripts) to bring your SQL Server instance to a usable point.
Test everything extensively.
|On the CD|| |
You can find a flowchart that illustrates the disaster recovery process in the file Disaster_Recovery_Process.pdf.
You do not need to do anything if tempdb is damaged because tempdb is regenerated each time SQL Server restarts. What you might want to do, for example, if your disk is damaged, is use the ALTER DATABASE statement to move the location of tempdb.
To put it bluntly, without master, there is no SQL Server, so ensuring that master is up and functioning is paramount to any recovery operation. This is why you must go to great lengths during the planning phases of the system to use techniques like RAID and put the proper backup scheme in place. However, if you lose master, it does not necessarily mean that your user databases are damaged. You can rebuild or reinitialize the server, including the master database only, and then restore the system database backups from your last verified backup files. However, there are some very important requirements to make this work successfully. If planned for properly, this is easy!
Remember to back up your master database any time you make a change to it.
You cannot restore a backup of a system database, especially master, from a different build onto another SQL Server with a different build version because there are schema or metadata changes. The following is an example of an error message:
Server: Msg 3168, Level 16, State 1, Line 1
The backup of the system database on device d:\temp\master.bak cannot be restored because it was created by a different version of the server (134217904) than this server (134217920).
Server: Msg 3013, Level 16, State 1, Line 1 RESTORE DATABASE is terminating abnormally. http://support.microsoft.com
You need to reinstall and apply all service packs, hotfixes, and so on, to be able to use the backup. Go to and consult Knowledge Base article 264474 for information and workarounds. Remember to back up system databases both before and after an application of a SQL Server 2000 hotfix or service pack so you can restore to the proper point in time.
The tool used for rebuilding the master database is called Rebuildm.exe, a binary executable located in the SQL Server installation s Tools\Binn directory on your hard disk.
Do not rebuild your master database without knowing the consequences and making absolutely sure you need to do this. Follow the text in this entire section relating to master for more information.
To run REBUILDM, you also need to access the SQL Server installation CD-ROM or installation point because it includes various data files needed for execution of the tool. Because the CD-ROM is read-only, you must copy the CDROM:\x86\Data directory on the CD locally to the server before starting the REBUILDM utility, and change the attribute of all files to be non-read-only. This can be done with the command line attrib -r *.* in the directory, or by right-clicking the files, selecting Properties, and clearing the Read-Only check box.
If you have access to a network share installation of SQL Server, that would also be appropriate. However, ensure that the files do not have a read-only attribute (as they would if it was just a copy of the CD-ROM).
Use the following steps to run REBUILDM.
If you are running REBUILDM on a clustered instance of SQL Server 2000, make sure you are on the node that currently owns the SQL Server resources. You can verify this in Cluster Administrator.
Ensure all files are not read-only.
Take SQL Server offline using SQL Server Service Manager.
From a command line, run Rebuildm.exe. When the main dialog box opens, as shown in Figure 12-1, select or enter the name of the SQL Server in the Server drop-down list box, select the directory of the files that you copied from the CD-ROM, and select the proper collation. Click Rebuild to begin the rebuilding process.
Make sure you select the right collation for your SQL Server. It should match the previous setting for this particular instance. If you do not select the right one, you might have some incompatibilities with your user databases.
The process then copies the data files for master to the right location and configures the server. A successful completion results in the dialog box shown in Figure 12-2.
Figure 12-2: A successful Rebuild Master completion.
Under the 64-bit edition of SQL Server 2000, you cannot use REBUILDM to rebuild your master database. For a stand-alone instance, follow the instructions outlined in the SQL Server 64-bit Books Online topic Repairing a SQL Server Installation (64-bit). On a SQL Server 2000 64-bit virtual server, you must follow the steps outlined here, as the standard process does not work.
To rebuild the master database, you need the product Product ID (PID) used to install SQL Server 2000 (64-Bit). The PID is a globally unique identifier (GUID) that is stored in the registry. For a default instance of SQL Server 2000 (64-Bit), the PID is in the registry under HKEY_LOCAL_MACHINE\Software\Microsoft\MSSQLServer\Setup\productID. For a named instance of SQL Server 2000 (64-Bit), the PID is in the registry under HKEY_LOCAL_MACHINE\Software\Microsoft\ Microsoft SQL Server\< instance name >\Setup\productID.
When you rebuild the master database, you can configure the sa password, collation, cross-database ownership chaining, and error reporting. If you make changes to the authentication mode, they are not reflected after the server is reinstalled. You can change the authentication mode manually after the reinstallation is complete.
You must complete all steps from the cluster node that owns the SQL Server resource.
To rebuild the master database of a default instance of Microsoft SQL Server 2000 (64-Bit), follow these steps:
Take the SQL Server virtual server offline.
In Registry Editor, under HKEY_LOCAL_MACHINE\Software\MicrosoftWindows\CurrentVersion\Uninstall\< PID >, where < PID > is the PID of the Windows Installer package used to install SQL Server, change the SqlCluster value to 0 and note the value of SqlClusterSec. Then change the value of SqlClusterSec to 0. You will need to restore the value of SqlClusterSec after you rebuild the master database.
Run Cliconfg.exe to launch the SQL Server Client Network Utility.
In the SQL Server Client Network Utility, on the Alias tab, click Add. The Add Network Library Configuration window appears.
Under Network Libraries, click Other.
In the Server Alias text box, type < computer name >. For a named instance, type < computer name > \ < instance name >, where < computer name > is the name of the local computer and < instance name > is the name of the SQL Server instance. The local computer must be the cluster node that owns the SQL Server resource.
In the File Name text box, type DBNETLIB .
In the Parameters text box, type LPC: < virtual server name >. For a named instance, type LPC: < virtual server name > \ < virtual server instance name >, where < virtual server name > is the name of the SQL Server virtual server, and < virtual server instance name > is the name of the named instance of the SQL Server 2000 (64-bit) virtual server.
Run Msiexec .exe to rebuild the master database. Type
msiexec.exe /i < PID > REINSTALL=ALL REINSTALLMODE=amus SQLAUTOSTART=0 /L*v < path and filename.txt >
where < PID > is the PID of the Windows Installer package used to install SQL Server, and < path and filename.txt > is the location and file name for the verbose log file. For example, type d:\rebuild_master_verbose_log.txt .
In the Reinstallation dialog box that appears, make any necessary changes to the sa password, collation, cross-database ownership chaining, and error reporting. If you make changes to the authentication mode, they are not reflected after the server is reinstalled. You can change the authentication mode after the reinstallation is complete.
In some situations, the computer might display a message stating that file operations are pending and ask you if you want to reboot the computer. Click No.
In Registry Editor, under HKEY_LOCAL_MACHINE\Software\ Microsoft\Windows\CurrentVersion\Uninstall\< PID >, change the SqlCluster value to 1. Restore the value of SqlClusterSec to the value you noted in Step 2.
Run Cliconfg.exe to launch the SQL Server Client Network Utility. On the Alias tab, select the server alias that you created in Step 6, and click Remove.
If the computer requests a restart, restart the computer now.
Bring the virtual server online.
As noted earlier, once the master database has been rebuilt, it has no recognition of the user databases or user logins. To finish the recovery process, you need to restore the master database from the most recent available backup by following these steps:
Restoring master without a valid backup There is no way to restore master in this case other than by using REBUILDM and then going through the steps for the other databases.
Restoring master with a valid backup The way to restore master in this case is as follows:
Ensure that your SQL Server 2000 installation is at the same version as your backup, including hotfixes and service packs.
Optionally, create a logical backup device to reference the location where the backup of master is located, or you can just point the restore to the physical file. To create the device, use, for example:
sp_addumpdevice 'disk', 'master_backup_device', 'c:\backup\master.bak'
Take SQL Server offline using SQL Server Service Manager.
If you have changed the recovery model for msdb to Full, save the transaction log before attempting any rebuild process for master; this allows you to back up the tail of the log and have up-to-the-minute recovery. To do this, after SQL Server is offline, you have to go to the location of the transaction log for msdb and rename it in a command window or in Windows Explorer because executing REBUILDM overwrites it.
Rebuild the master database (see the previous section).
In a command window, restart SQL Server in single-user mode from the command line using Sqlservr .exe. This is located in the \80\Binn directory under your SQL Server 2000 installation. The following syntax is an example of a default instance and its resulting output:
D:\Program Files\Microsoft SQL Server\MSSQL\Binn> sqlservr.exe -c -m 2003-02-02 15:44:08.43 spid3 SQL Server started in single user mode. Updates allowed to system catalogs. 2003-02-02 15:44:08.46 spid3 Starting up database 'master'.
This is the syntax for a named instance:
D:\Program Files\Microsoft SQL Server\MSSQL$ instancename \Binn> sqlservr.exe -c -m -S instancename
If you do not bring up SQL Server in single-user mode, you will see the following error when you attempt to restore master:
Server: Msg 3108, Level 16, State 1, Line 1 RESTORE DATABASE must be used in single user mode when trying to restore the master database. Server: Msg 3013, Level 16, State 1, Line 1 RESTORE DATABASE is terminating abnormally.
Do not shut this command window until Step 8 is completed because you are now running SQL Server as a process in that command window.
You can now start Enterprise Manager or log into a query tool such as Query Analyzer or OSQL, and log into your instance as system administrator or as a trusted administrator to begin the restore process.
Restore the master database from a backup using either the backup device or with disk= syntax. For example:
restore database master from master_backup_device
When complete, you will see the following output:
The master database has been successfully restored. Shutting down SQL Server. SQL Server is terminating this process.
If you are running Enterprise Manager, you might have to manually terminate it with Task Manager.
Bring SQL Server online using SQL Server Service Manager.
Back up master.
If you have the ability to back up the tail of the log for msdb, follow these steps:
Create a database similar to msdb, with the same number of data and log files.
Take SQL Server offline.
Delete all of the data files from your newly created database so that it will fail recovery. Replace the log file with the log file you renamed from your original msdb and rename it appropriately.
Restart SQL Server.
Back up the tail of the transaction log of this database with Enterprise Manager or in a query tool with Transact-SQL. A sample syntax follows:
backup log db_name to disk = device_name WITH NO_TRUNCATE
Use sp_dbremove to remove the temporary database from your new SQL Server. Back up master.
You can now go about restoring other databases, such as msdb, model, and finally, user databases. Then you can worry about tasks such as synchronizing logins and ensuring all objects are there for your use.
As you go through the rest of your recovery process, make frequent backups of master and any other databases as they are restored or at key points of the recovery process.
Restoring msdb is the next step in rebuilding your SQL Server instance. If master is the brains and heart of SQL Server, msdb is the soul. It contains much of the status information and other information (such as replication information and definition of all SQL Server Agent jobs) used by SQL Server. It is doubtful that you do not use msdb at all. There are two scenarios for recovering msdb:
If you did not change the recovery model for msdb, you can only restore the last full backup that is good. This means that you might have lost quite a bit of information if it is a very old backup.
If you changed the recovery model of msdb to Full and you were able to back up the tail of the transaction log as described in the last section (as long as master was not damaged), you can take your point-in-time backup and restore all the transaction logs, meaning you could have an up-to-the-minute recovery.
It is strongly recommended that you change the recovery of msdb to Full. However, when you stop and restart SQL Server Agent, the recovery model is reset to Simple. You need to create a job that runs on each SQL Server Agent restart that executes two commands:
ALTER DATABASE msdb SET RECOVERY FULL
and BACKUP DATABASE msdb TO DISK = 'location 'You should back up msdb s transaction log on a frequent basis.
This is optional, but you might want to stop and restart SQL Server after restoring msdb.
The model database is used as the basis for all new databases that are created, so you do need to back it up and have a valid backup for a possible disaster recovery scenario. However, unless you make customizations to it that are specific to your environment, you should not need to back up the model database on a very frequent basis. There are two main scenarios for restoring model:
You have a valid backup of model from that server. If that is the case, you can stop and then restart SQL Server with trace flag 3608. This trace flag has only SQL Server recover master in the startup process. You can then restore model like any other database. Remove the trace flag from the startup parameters in Enterprise Manager, and stop and restart SQL Server.
You do not have a backup, but do have another SQL Server with the same code page/collation in your environment with a valid model database (modified or not modified).
If the latter situation is yours, perform the following steps to restore model:
Stop and start the instance you are recovering with trace flag 3608.
On the SQL Server instance that has the valid model database, execute the following syntax in a query window:
use master go sp_detach_db 'model' go
Copy the Model.mdf and Modellog.ldf files to the proper location for your damaged instance.
Open a query window that is connected to your damaged instance, and execute syntax similar to the following example to attach model:
use master go sp_attach_db 'model','E:\Sqldata\model.mdf', 'E:\Sqldata\modellog.ldf' go
Remove the trace flag from the startup parameters in Enterprise Manager, and stop and restart SQL Server.
When it is time to restore user databases, you have two main options: a traditional restore using backup and restore, or sp_attach_db . Restore, as you know, is a straightforward process that can take considerable time depending on the size of your database. However, you might want to consider using sp_attach_db if your server has crashed but your disk array is still intact. In this case, you might be able to recover your data in a few minutes instead of hours. This stored procedure allows you to attach database data and log files that exist at the operating system level, but are not yet part of the SQL Server. The corresponding stored procedure to detach a database from SQL Server is sp_detach_db . Because the time could be much shorter than doing a full restore, it might be worth trying to attach and then let the database attempt recovery. You should run a DBCC CHECKDB after the sp_attach_db to ensure that the database is in proper condition for usage by your users. Additionally, make sure the user logins are mapped properly.
The following is an example of the usage of sp_attach_db :
EXEC sp_attach_db @dbname = N'PRD', @filename1 = N'H:\mssql\data\PRD_data1.MDF', @filename2 = N'U:\mssql\log\PRD_log1.LDF', @filename3 = N'J:\mssql\data\PRD_data3.NDF', @filename4 = N'L:\mssql\data\PRD_data5.NDF', @filename5 = N'I:\mssql\data\PRD_data2.NDF', @filename6 = N'K:\mssql\data\PRD_data4.NDF', @filename7 = N'M:\mssql\data\PRD_data6.NDF', @filename8 = N'N:\mssql\data\PRD_data7.NDF', @filename9 = N'O:\mssql\data\PRD_data8.NDF', @filename10 = N'P:\mssql\data\PRD_data9.NDF', @filename11 = N'Q:\mssql\data\PRD_data10.NDF', @filename12 = N'R:\mssql\data\PRD_data11.NDF', @filename13 = N'S:\mssql\data\PRD_data12.NDF'
Restoring pubs and Northwind is exactly like restoring a user database.
|More Info|| |
For details on the differences between doing a database restore with RESTORE and using sp_attach_db , consult Chapter 13, Highly Available Microsoft SQL Server Upgrades.
If you are using full-text indexing with your user databases, you need to think about how you will restore your indexes if there is a catastrophic failure, as mentioned in earlier chapters such as Chapter 7, Log Shipping.
|More Info|| |
Knowledge Base article 240867, INF: How to Move, Copy, and Back Up Full-Text Catalog Folders and Files, is a good reference on backing up, copying, and moving full-text indexes.
If you have successfully restored your master database, you should not need to synchronize logins unless new ones were added after the date and time of the backup that was restored. If that is the case, consult the section Synchronizing Logins in Chapter 14, Administrative Tasks to Increase Availability.
After the full rebuilding process, back up all system databases because you know they are in a good state.