Chapter 13: Highly Available Upgrades | Microsoft SQL Server 2000 High Availability

Upgrades present a challenge that all organizations face at some point. Upgrades fall into different categories, such as upgrading versions of Microsoft SQL Server, server consolidation, applying a SQL Server or Microsoft Windows service pack or hotfix , or applying third-party patches for the software you run in your environment. The challenge with upgrades is not so much the upgrade process itself ”that is fairly mechanical ”it is ensuring that you minimize downtime and the impact on users. If you have specific service level agreements (SLAs) in place, you understand the complexities of maintaining a highly available environment when you need to apply changes to your production environment. This chapter guides you through the process of how to plan and execute highly available upgrades and the considerations you need to take into account.

Important

Regardless of any discussion that follows , please keep in mind that only qualified personnel should perform upgrades of any type. Leaving your upgrade to someone who is unfamiliar with your environment could prove costly.

General Upgrade, Consolidation, and Migration Tips

An upgrade or migration is all about having a solid plan, testing it, and developing contingency plans to deal with common problems or possibilities. From a high availability perspective, at some point your upgrade process is likely to incur some downtime. Remember that there are application-specific patches or upgrades as well as ones for operating systems and that the availability of the operating system will absolutely affect the availability of the applications running on the server. Whether you are upgrading for security reasons or to fix another functionality, as you try to minimize the impact take into account the following rules of thumb that apply to all the concepts presented in this chapter:

Everything affects everything. Think about your body. When a doctor prescribes a new medicine for you to take, it is absorbed by your blood and might solve a problem, but it might also have some undesirable side effects that affect you daily. Or it might not have any negative interactions with your internal systems, and you will be just fine.

From a technology standpoint, if you apply a Windows service pack, assume it will affect SQL Server in some way. There can be no 100 percent guarantee that whatever you do is transparent, and it might even subtly change the behaviors of your application or operating system. The more you have running on one server, the greater the chance of possible interactions. Even doing something as seemingly innocuous as upgrading the BIOS on your system or upgrading to the latest driver or firmware for a specific piece of hardware (such as a SAN) ”seemingly unrelated to your application software ”can affect your software behaviors or break them completely.

You should consult the hardware or software manufacturer for any and all available information on their update and what it might do to your systems. However, manufacturers do not have the ability to test all permutations of the effects of their patches (and probably do not have your application code to do so even if they wanted to), so it is your responsibility to ensure that upgrades behave in your environment.
Try not to combine multiple updates, especially large ones. For example, do not upgrade to a higher bandwidth network card and then install various driver updates to existing components in one maintenance window. It is even more important not to perform multiple updates if they were not tested both alone and together. Troubleshooting after simultaneous upgrades is infinitely more difficult than after just one; it is more difficult to determine what is causing the problem because you applied different changes.

Another example would be if you upgrade the Windows service pack and the SQL service pack at the same time. You probably save some downtime during the upgrades, but if you experience some form of regression you will more than lose it in the amount of time it takes you to troubleshoot the problem.
Test any patches, driver updates, hotfixes, service packs , and so on prior to rolling them out. If availability is one of your company s concerns, there is no better time than the present to start testing. The best way to test, should you want to apply a hotfix, is to do it on a dedicated testing or staging environment to determine how it will impact your applications, the server itself, and so on. You do not want to cause a potentially larger availability outage by having to roll back your installation or reinstall from scratch should you encounter a worst-case scenario. You need additional hardware to do these tests, but think about the business case. Is it better to spend some money up front to ensure minimal downtime or to pay an expensive price for being down?
Test the patches under your production load. Do not test on a server with no or minimal usage. Many companies are great about testing the functionality of what they are upgrading to but neglect to test it under stress. The testing environment must have some method of simulating load on the servers so that you will know there are no issues waiting for you around a performance bend in the upcoming road.

This is especially true with SQL Server because of its self-tuning nature. This internal tuning is incredibly complex but it can also be somewhat fragile in that it is picking a particular plan based on its statistical information and the underlying mathematics. You might be right on the borderline for a particular plan, and the smallest possible change could give you a completely different plan for that subset of queries. The performance difference is probably not noticeable unless this is a plan you happen to use very often and it is also the biggest load point (bottleneck) in your application at the time. The solution might be as simple as a two-word hint for that query to slide things back to the original plan. However, finding the problem during testing allows you to implement this hint during your upgrade instead of finding it during your peak load the next day and being unable to do anything about it quickly enough to prevent a negative user experience.
How do you build your test platform? Your test platforms will prove your deployment. If they are not built in the same manner as the production servers, even small delta could skew the results.
Always read the documentation that comes with the upgrade or patch carefully . It contains information about the fixes and behaviors, as well as how to install the upgrade or patch. Too often support calls are generated because people skip this step and just click on Setup.exe or Install.bat (or whatever the installation mechanism is) without thinking. If you have any questions or concerns that are not addressed in the documentation, especially in relation to your configurations, call a support professional. This is your production environment, not some system that no one is using (if it is, you might want to make it your test server). Although some might see it as a wasted support call, it is better to ensure that the process will go smoothly and properly than to skip that call but spend hours on the phone later, after an unsuccessful installation. Also, keep an eye on newsgroups, magazine articles, and other resources, because your peers might point you to helpful information they have learned through their implementation experiences.
Notify anyone who will be affected by the upgrade and give them plenty of lead time. The point is to inconvenience others as little as possible. Notifying them five minutes beforehand or killing their session or connection without any warning will not create goodwill. Remember to notify them after the system is available for use again. Give them a solid contact point to notify if they do experience difficulties. You might have been up all night doing the upgrade, and somebody else could try to fix the problem the next day while you are catching up on your sleep without knowing what you were working on or where you left off.
Do not allow user connections to the server while you are applying the upgrade or fix. The upgrade instructions should include any specific implications and details, but it is best to ensure that nothing or no one can interfere with the installation process. For example, a Microsoft SQL Server 2000 service pack installation puts the server into single-user mode, so no connections can be made unless the user goes in at the right moment, which would ruin the installation.
Prepare a detailed implementation plan and ensure your disaster recovery plans are up to date prior to the upgrade or update You do not want surprises in the middle of your implementation. As noted earlier in this book, your plan and the way to back out of it will be crucial for your success. Unfortunately, many people still attempt to apply an upgrade or fix to their systems without thinking of the consequences. When availability is one of your main goals, you are asking for trouble if you are without a workable plan.
Consider the time frame within which you will execute your plan. If you know your business does its monthly sales forecasting the last week of the month and users need access to the sales database at that time, do not perform the upgrade that week, or maybe even not the one before or after. That would leave you, in a four- week month, a one-week window to execute the plan at a time when it would have the least impact on the business and give you the most time to recover should something catastrophic happen.
Make backups of all of your user and system databases prior to performing an update or upgrade. This is one of the most important things you should do. It ensures ( assuming the backups are good) that in a worst-case scenario you can get back to the point in time where you were prior to the bad update or upgrade. If the process goes well, remember to back up the databases afterward so you have a new baseline of good backups for use in a disaster recovery scenario. Finally, you should make another set of backups after the upgrade if it involved SQL Server in any way. This way, you can restore to this point in time without restoring the server to the previous state separately from the database.

For SQL-based backups, you should consider your backup strategy carefully. You should also calculate the effect of both your backup strategy and your current checkpoint timers and how they affect your SQL Server log size . It might be worth a little extra work to quiesce or at least partially quiesce your database if you will be backing it up twice! Think of the time, disk, or tape savings in pure size and time for the backup. Then think also of the restore time savings if things do go wrong in spite of all your careful planning.

Note	Good planning cannot protect you from every problem, but it can give you a solid plan for dealing with most problems.

More Info

For more information on putting plans together and good background information in general for this chapter, consult Chapter 1, Preparing for High Availability, Chapter 2, The Basics of Achieving High Availability, Chapter 9, Database Environment Basics for Recovery, Chapter 10, Implementing Backup and Restore, and Chapter 12, Disaster Recovery Techniques for Microsoft SQL Server.