Agreeing on a Solution | Microsoft SQL Server 2000 High Availability

Before starting on the design, purchase, and eventually roll out of any systems, the key players must meet to agree on the specifics of the solution that will eventually be deployed. This should be done for every deployment and solution desired, whether or not it is considered mission-critical.

The Project Team

Assemble a project team that will own the overall responsibility for the availability of the solution. The leader should be the business sponsora person from the management team who will ultimately answer for the success or failure of the operation and have the greatest influence on its budget. Representatives from all parts of the organizationfrom management to end usersshould be part of the team, as each of them will be affected in one way or another by the availability of the systems or solution that will be put in place.

Guiding Principles for High Availability

Once the project team is assembled , it should meet to decide the principles that will govern how the solution will be designed and supported. The following are some sample questions to ask; there might be more for your environment.

What type of application is being designed or purchased?
How many users are expected to be supported concurrently by this solution in the short term? In the long term ?
How long is this solution, with its systems, supposed to stay in production?
How much will the data grow over time? What is projected versus actual growth (if known)?
What is acceptable performance from both an end user and administrative or management perspective? Keep in mind that performance can be defined in various waysit might mean throughput, response time, or something else.
What is the availability goal for the individual system? The entire solution?
How is maintenance going to be executed on this system? Like performance, how you maintain your systems is specific to your particular environment.
What are the security requirements for both the application and the systems for the solution? Are they in line with corporate policies?
What is the short-term cost of developing, implementing, and supporting the solution? The long-term cost?
How much money is available for the hardware?
What is the actual cost of downtime for any one system? The entire solution?
What are the dependencies of the components in the solution? For example, are there external data feeds into SQL Server that might fail as the result of an availability problem?
What technologies will be used in the construction of the solution?

Some of the questions you ask might not have answers until other pieces of the puzzle are put in place, such as the specifications for each application, because they will drive how SQL Server and other parts of the solution (for example, Microsoft Windows 2000 Server) are used and deployed. Others might be answered right away. Ensure that both the business drivers and the more detailed what if scenarios are well documented, as they are crucial in every other aspect of planning for high availability. It might even be a good idea when documenting to divide the questions into separate lists: one that pertains to the requirements of the business independent of technology, and one that is technology-dependent.

Having listed questions, it is a safe assumption that each person in the room will have a different answer for each question. The goal is to have everyone on the same page from the start of the project; otherwise the proverbial iceberg might start ripping holes in the hull of your solution. Compromise will always be involved. Compromise can only be achieved if there is a business sponsor who is driving and ultimately owning the solution at all levels. It is the responsibility of this person to gather consensus and make decisions that result in the compromise that everyone can live with. As long as all parties agree on the compromise, the planning, implementation, and support of the solution will be much smoother than if the voices of those two steps down the road are not heard or are ignored.

Making Trade-Offs

High availability is not synonymous with other vital aspects of any production system, such as performance, security, feature sets, and graphical user interfaces (GUIs). Achieving high availability is ultimately some form of a trade-off of availability versus performance versus usability. All aspects also need to be considered when doing overall system, infrastructure, and application design. Designing a highly available system that is not usable will not satisfy anyone . This is where the trade-offs come into play.

Is buying a single 32-processor server to support a larger amount of concurrent users for a database that is used 24 hours a day the most important business factor, or is it more important to ensure that the server is going to be up 24 hours a day to support the continuous business? Chances are people will say both are equally as important, but in reality, a budget dictates that some sort of trade-off must occur. Having slightly lower performance to ensure availability might be a reasonable trade-off if the cost of downtime is greater than the ability to have 10 additional users on the system. The same could be said for securityif the system or solution is so secure that no one can use it, is it really available? Conversely, if a developer coded the database servers security administrator account and password into the application to make things more convenient , this may compromise security, as well as the applications ability to work with certain high-availability technologies.

Think of it another way: If money was no object and you had to purchase a car, would you buy a fast, sleek sports car or a sensible four-door sedan with airbags for all passengers? Many would choose style over substancethat is human nature. High availability is like buying the sedan; it might not be the best looking car on the block, but it is a solid, reliable investment. The airbags and sturdy roll bar, among other safety features, will make you as prepared as you can be for a possible accident .

For a clear example of a trade-off, briefly consider the Titanic again and that the White Star Line valued luxury over lifeboats. That was their trade-off, and at the time it made sense to the people funding the ship as well as the designers. That decision ultimately proved to be fatal. You need to determine what the acceptable trade-offs are for each situation so that the solution will meet the needs of everyone, especially those responsible for administering it and, most important, the end user or customer.

Identifying Risks

Once the basic principles governing the solution have been put into place, it is time to mitigate the known and unknown risks by asking the what if questions to the best of the group s ability. You might know what the risk and its associated questions are, but not the solution to mitigate them. Even more risks will become apparent as the solution moves from conception to planning, on through to implementation, and as more and more technology and application decisions are undertaken. Whenever a risk is identified, even if there is no answer at the time, make sure it is documented. Continually check the documented list of risks to see that there has been a corresponding answer to the question recorded. By the time the solution hits production, all identified risks should have a response, even if it is that nothing can be done to mitigate the risk.

Although there are many more possibilities, here are some common questions to jump-start the risk management process:

What will you do if one disk fails? The entire disk subsystem?
What will you do if a network card fails?
What will you do if network connectivity is lost?
What will you do if the entire system goes down or stops responding? Is loss of life involved (for example, a health system)? Although this is related to the overall cost of downtime, the specific result of downtime should be known.
How does the application handle a hardware (including network) or software failure?
Is there a corporate standardwhat do we do now for availability on other systemsand is that plan working well?
How will the proper people be notified in an emergency?
How will a problem be detected ?

Next Steps

Your guiding principles are now documented, along with some risks that might or might not have answers. At this point, the principles should be reviewed and debated to ensure that they are correct. If something is not right, now is the time to correct it, as these principles will live through the entire project life cycle. They can be modified and reassessed if necessary, but the initial principles should be the measuring stick against which the success or failure of a solution is measured. There should be a formal signing off by the entire team, because availability is the responsibility of everyone involved.