Like the data center best practices already presented, this section covers another topic that could be a book in itself, database staff. The general principles apply to other types of IT groups as well.
Before getting into the specifics of what a DBA needs to do in a high availability environment, the company needs to determine whether it is properly staffed. Too often operational database administration work and other tasks , such as managing networks, disk backup and maintenance, and so on, are left to application developers and network operations people who are not only splitting their time, but are not experts. It can be argued that if a mission-critical database system needing four nines is deployed with no DBA or dedicated database staff onsite, the commitment to a highly available database system does not exist. Do not let staffing become a barrier to availability.
Whether your team will support individual, highly available systems or an entire end-to-end solution, all team members must be able to function cohesively. The entire group should be responsible for the quality of the service it provides, with a focus on sharing information freely throughout the team. When issues arise, find root causes and identify process improvement solutions to address problems rather than worrying about who made each particular mistake. Shifting blame (or praise) from the individual to the group fosters an atmosphere of trust. Building such a team requires cycles of constant improvement with persons who are willing to do what is right for the good of the team, not just for individual gain. Regularly evaluate ways to improve the quality of your group s service, the interactions within your team, and the team s interactions with the rest of the company.
Team members should collectively own the database servers so that anyone can respond to issues for any server. If you have a system that only one person can administer, then that person becomes the single point of failure in the system. This is why it is so important for the DBA team to act as an integrated unit.
If you are currently the only DBA in the company, find another person (or several) to act as extra support for various aspects of the system when you will be absent.
Shared ownership in the team can be facilitated by rotation, effective use of spare time, lunchtime presentations, and reliable communications within the team. Rotation encourages cross-training, provides breadth of experience, encourages common practices, and leads to better documentation. Here are some examples of how your team can implement rotation practices:
Assign a primary and at least one secondary DBA for each system, so that there are at least two people supporting each system.
Rotate staff among projects, classes of servers, and different environments (development, test, and production).
Rotate at long enough intervals to assure team comfort with new roles and individual knowledge and growth.
Proper training is crucial for the success of the individual and the team. Technical skills should be maintained and furthered. Whether someone is a junior administrator out of college or a senior administrator, if he or she is new to your environment, he or she should get acclimated to your systems before assuming full access and duties . The person should shadow someone who knows the ins and outs. Too small a training budget (or none at all) could be a barrier to availability.
In addition to technical training, team members should acquire soft skills. Communication skills, both oral and written, enhance employees long- term success in dealing with people at all levels. Writing and documentation are necessary for mentoring and knowledge transfer, and they also have historical value. Acquiring these communication skills might involve formal training or courses in presentation skills and technical writing, among other things.
The DBA s responsibilities are constantly evolving. He or she is expected to not only know how to twist knobs and optimize settings, but also to know about the culture and the business processes of the company. This includes understanding what he or she is doing and how it affects each person, the team, and the company as a whole. Conversely, understanding the big picture also helps the DBA understand the small picture. Things are not always simple. A successful DBA demonstrates good leadership when he or she aligns personal goals with the goals of the company and then takes others along for the ride.
By establishing a high level of communication and trust among team members, you will improve the response time of the team overall. As each DBA becomes familiar with other systems and environments, the number of issues that only one person can handle diminishes. As a result, the DBAs do not need to call each other as frequently for answers, which can save valuable minutes when a server needs extra attention or is unavailable. This also increases the availability of your DBA resources.
DBAs must be instantly available to each other during working hours using agreed-on methods (for example, cell phone, e-mail, pager, two-way radio) as well as during hours that the DBA is on call (should your company have such a policy), or if an emergency arises and you need to be contacted. Communication devices issued to DBAs must be tested on a regular basis, and contingency plans must be put in place in the event conventional methods fail.
Paramount to succeeding as a DBA is getting along with the other IT staff. Unfortunately, this can become a highly political push-and-pull situation. The long-term interpersonal goal should be to eliminate such tensions. This is easier said than done, because each group ”whether it is the general Windows administrators, the people running the storage area network (SAN), the network gurus, or the build engineers ”has its own daily duties. If you make a request, such as needing a domain account for the services for SQL Server, your request might be contrary to what they do or take time away from something they need to do. Be sensitive to and understanding of others. Be aware that the DBA s role as the gatekeeper and protector of the data might add stress to existing relationships.
A good example of working together is the planning, building, and deploying of a database system. The Windows administrators and the build engineers have in their minds what it takes to make a performing system, the network engineer has his or her requirements, and the database administrator also has his or her requirements. Each person must ultimately agree on the construction of the system, or it will be hard to maintain the system going forward for each group (they will all likely be involved).
Building a well-oiled team takes time ”not just the passage of time, but also a commitment of time to team building that might seemingly be better spent in other ways. A racecar might normally go from 0 to 60 miles per hour in 4.6 seconds if it is properly maintained, unlike one with loose lug nuts and no motor oil. The principle is the same for a DBA team.
An SLA is a list of system service requirements presented in a formal document or communication, with specific terms, roles, deliverables, responsibilities, and even liabilities (where applicable ). It is not just a contract, say, between a company and a third-party hosting vendor. SLAs should exist between any two parties that will be using the system: the IT department and end users, the vendor and the IT department, the hosting company with its vendors , and so on. Without an SLA, service requests can quickly become nearly impossible to fulfill because no expectations are set, and the parties tend to adjust themselves to the state of the situation.
The goal of an SLA is to establish basic expectations between the two groups; anything else is negotiable. An SLA guarantees that a system will perform to specifications (this includes defining what performance means), support required growth, and be available to a given standard. Any regularly scheduled downtime windows (for data loads, invasive maintenance, and so on) must be noted in an SLA, as they will affect availability.
SLAs for database systems typically include the following (at minimum):
Percentage of availability (defined as a percentage, such as 99.999 percent).
Maximum number of concurrent users.
Number of transactions to be supported per unit of time (such as 5000 per second).
Method of contacting support personnel, and the maximum number of support calls permitted within a given period.
Response time expected on support issues. This can also be defined at a higher level for the entire IT department.
An SLA can also include a section defining terms, limitations, and exceptions. The more experience the DBA has, the longer this additional section tends to be, which is a polite way of saying that the more trouble you have seen, the more likely you are to protect yourself from future trouble.
An SLA also includes staffing, server configuration, licensing, product support agreements, equipment utilized by the solution covered by the SLA, and finally, the cost of the service itself. If you cannot support an existing SLA as is, you need to either renegotiate the SLA or create a budget or a project before accepting the SLA. The DBAs should state the level of support that is possible without causing undue strain on the team or undue risk to other systems in the data center. Of course, this does not necessarily mean that you will not have to attempt to support the system that causes the risk, but you can at least quietly document the risk. Later, it cannot be said that management was uninformed of the risk. If repeated problems arise from the system because of this, you are more likely to see a revision of the SLA, or a budget increase for more staffing, new hardware, or other provisions.
An SLA must account for the weakest components of the system. For example, if your database vendor does not offer 24/7 technical support, do not guarantee 24/7 support to end users without taking this into account. If you have any doubts about an SLA, detail them immediately. If you develop doubts as time passes , renegotiate immediately, but keep in mind the delicacy of doing that. SLAs should never be renewed without detailed analysis of their content to determine if they still meet the requirements for the new period of coverage, or, in less common scenarios, if they are more than you require.
It is vital to keep SLAs current in a high availability environment. If you are unable to solve a system issue in a timely manner on your own, the absence of a vendor support agreement will probably threaten your uptime. The support service staffs of each vendor are troubleshooting professionals trained in their equipment or software. They have many more resources than anyone else available to them to find a solution for your problem.
However, all of this is useless if your staff does not know how to access the vendor s support center or does not have necessary account codes or contract numbers for verification when calling the support center.
Your goal is to manage the people, both internal and external to your company, who are involved in the systems you are supporting, the processes, and the technology with the expectation of meeting the terms of the SLA. Success should be measured in terms of customer satisfaction, as well as numeric values recorded over time and periodically compared to the SLA. If you miss your numbers, identify the points of failure and fix them. If you cannot fix the problem because of resources, processes, or technology, then renegotiate your SLA or create a report justifying the need for more resources. Attach a copy of your purchase orders to the report and send it to management. If you did not meet the agreed-on numbers, but the end users are still happy, also note that when the SLA comes up for renewal. It might be possible to renegotiate terms.
|On the CD|| |
A sample SLA can be found on the CD-ROM that accompanies this book. The file name is SLA.doc, and it can be customized to your environment or serve as a reference for an SLA you put together. Some SLAs might require the involvement of your company s legal counsel, so please check before sending such a document to someone else.