|< Day Day Up >|| |
The MOF is one of the three frameworks that form the Microsoft Enterprise Services frameworks. Each framework provides detailed information on the people, processes, and technologies required for success in the different phases of the IT life cycle. The three Enterprise Services frameworks are as follows:
The Microsoft Readiness Framework provides guidance to help prepare the organization to use Microsoft products.
The Microsoft Solutions Framework provides guidance in the planning, building, and deployment phases of the project life cycle.
The MOF provides comprehensive operational guidance for managing environments based on Microsoft technologies.
Microsoft is well established as a software giant. However, a framework is not software. The three frameworks include a variety of assessment tools, best practices, case studies, courseware, deployment guides, operations guides, planning tools, solution kits, support tools, training roadmaps, and white papers.
MOF includes a set of best practices, principles, and models that promote mainframe-quality reliability, availability, supportability, and manageability for environments built on Microsoft products and technology.
Microsoft designed MOF to help IT departments design IT services to meet business goals and priorities while reducing downtime, risks, and the total cost of ownership for production systems.
There are some obvious benefits for IT organizations, but there are some equally important benefits to Microsoft. Microsoft wants to compete more effectively at the enterprise level for mission-critical production systems. To accomplish this, the company needs to negate the perception that Microsoft platforms do not have the reliability necessary for mission-critical services. During the past few years, Microsoft has focused on improving software quality and has added cluster support and Microsoft Windows Datacenter to improve availability. Hardware improvement during the same period has produced Intel-based servers with RAID controllers; storage area networks; redundant power supplies, fans, and controllers; hot swappable fans, disk drives, power supplies; and other features to improve reliability and availability. However, as shown in Figures 2.1 and 2.2, a major cause of downtime is poor operational procedures. Microsoft cannot improve its reliability image without addressing the people and processes used to manage Microsoft environments.
The development of MOF was an important investment for Microsoft, therefore the company incorporated several design goals to ensure its success, including the following:
MOF needed to use ideas that were proven to be successful in existing production environments, leveraging industry best practices, rather than inventing new ones. MOF needed to incorporate input from customers, partners, Microsoft ITG, and Microsoft product and service organizations.
Microsoft knew that its employees could not anticipate every possibility, so the company chose to provide an extensible foundation for operations knowledge.
MOF needed to integrate with frameworks that manage other parts of the IT life cycle, such as planning and deployment.
MOF needed to address managing end-to-end services, including processes and procedures, rather than just managing servers and technology. At the same time, MOF needed to increase the IT department's ability to help business units rapidly adjust to changing conditions.
MOF also needed to address more than just processes, procedures, and technology-it also needed to address people.
MOF combines the ideas in ITIL with specific guidelines for using Microsoft technologies. MOF also extends ITIL to support distributed IT environments and current industry trends, such as application hosting and web-based systems. MOF is composed of three models: the process mode, the team model, and the risk model. These models provide guidance about people, processes, and risk management for IT service management. Each model focuses on the technologies and best practices for achieving high availability, reliability, supportability, and manageability for the Microsoft environment and provides guidance on interoperability with non-Microsoft environments.
Because IT operations include so many processes, procedures, and communications-all occurring simultaneously for a large collection of systems, applications, and platforms-it is impossible to create a model that captures all of the intricacies. Instead of trying to create an exact model, MOF simplifies this complexity into a framework that is more easily understood and more easily applied. The MOF Process Model is a functional model of the processes that operations teams perform to manage and maintain IT services. As such, it provides a simplified, generalized way to think about complex IT environments. The Process Model has some concepts that are keys to understanding the model. The keys are as follows:
IT Service Management has a life cycle that consists of distinct logical phases.
The life cycle needs review-driven management at specific points during the life cycle. The life cycle needs some reviews when moving from one phase to another and needs other reviews periodically.
IT operations are continually becoming more important and more complex, and problems are very visible. Risk management is important to ensure that IT department failures do not impact the business to the extent that the overall company fails.
IT infrastructures are not static; they are constantly changing. One of the primary responsibilities of the operations team is to manage these changes in a way that ensures the continued availability of critical services. A common and effective way to deal with change is to group related changes together into a series of releases, which allows for planning and managing of each group of related changes as a unit. The MOF Process Model recognizes that applications or services follow a life cycle of distinct, integrated phases. Examples of these phases include the following:
For Exchange 5.5 implementations, there was a period when you were preparing for and then implementing Exchange 5.5. The MOF model refers to this as the changing phase.
Once you placed Exchange 5.5 into production, the Exchange service entered the next phase of the life cycle: the operating phase, where the primary mission was to effectively and efficiently execute the day-to-day tasks of making Exchange services available to users.
The next phase in the life cycle is the supporting phase, where the mission is to quickly resolve incidents, problems, and inquiries about the Exchange service.
The final phase is optimization, where you drive changes to optimize the Exchange service delivery cost, performance, capacity, or availability. At some point in the optimization phase, you may decide that the best way to optimize the service is to implement the next release of the product, and the life cycle enters the changing phase in preparation for implementing Exchange 2000/2003.
These phases, also known as quadrants, form an iterative life cycle that can be applied to any release, and they describe the processes or activities that make up each part of that life cycle. There are also four reviews described by the model:
The Release Approved Review is the final review before a proposed change is released into the production environment.
When the release is complete, the Release Readiness Review evaluates the effectiveness of the Service Management Functions.
The Operations Review happens periodically once a service has been released into the production environment. It is a review of the IT staff 's ability to maintain the service.
The SLA Review happens periodically and evaluates the staff 's ability to meet the requirements defined in SLAs.
Microsoft based the Process Model on the best practices documented in the ITIL, with the addition of some Microsoft-specific content. Most of the Microsoft-specific content is in the operating quadrant of the Process Model. Because ITIL is platform independent, it does not cover these items. Where applicable, MOF also references specific Microsoft products and features that either automate or improve the delivery of the service management functions.
Figure 2.4 illustrates the MOF process model, showing the relationship between the life cycle phases and the reviews associated with each phase.
Figure 2.4: Microsoft Operations Framework Process Model
The changing quadrant follows a Release Approved Review. This is the final review before a proposed change is released into the production environment. It reviews the readiness of the release itself, the readiness of the staff, and the potential impact of the release on other systems. If the release passes this review, then the following service management functions perform the release:
Change management. To mitigate or eliminate adverse effects, the change management function identifies affected processes and systems.
Configuration management. Configuration management identifies, tracks, and reports on key IT assets.
Release management. Release management ensures that you carefully plan, test, and implement software and hardware releases.
The change, configuration, and release management functions work closely with each other to ensure that the shared configuration management database is always accurate and up to date. When the release is complete, the Release Readiness Review evaluates the effectiveness of the service management functions.
Once you have completed the deployment, the service management functions in the operating quadrant are responsible for effectively and efficiently performing the daily operational tasks.
Directory services administration. The directory services administration function is responsible for daily operations, maintenance, and support of the enterprise directory.
Job scheduling. The job scheduling function is responsible for scheduling batch processing jobs at times when the additional system resources required for the batch jobs will not affect business and system operations.
Network administration. Network administration is responsible for design and maintenance of the physical network components, such as firewalls, routers, servers, and switches.
Print/output management. The print/output management function is responsible for managing the components associated with business output.
Security administration. Security administration is responsible for maintaining a secure computing environment.
Service monitoring and control. Service monitoring and control is responsible for monitoring IT service health.
Storage management. The storage management function is responsible for data storage, including off-site backups and historical archiving.
System administration. System administration is responsible for the day-to-day tasks of keeping systems running and for assessing the impact of planned releases.
Periodically, you should perform an Operations Review, which is an inwardly focused review of the operations group's ability to maintain the service.
No system is perfect, and problems will occur after a service is put into daily operations. The objective of the functions in the supporting quadrant is to resolve incidents, problems, and inquiries in a timely manner.
Incident management. The incident management function is responsible for resolving all incidents and quickly restoring the IT service.
Problem management. Problem management is responsible for investigating and correcting the root causes of problems that affect the IT service.
Service desk. The service desk provides first-line support to the user community for incidents, problems, and inquiries associated with IT services.
Periodically, you should perform an SLA Review to evaluate the support staff 's ability to meet the requirements defined in the SLAs. The SLA Review often results in changes to the support staff procedures. It also often influences changes to other operational processes, tools, and procedures.
The service management functions in the optimizing quadrant focus on future needs rather than the day-to-day management of the current environment.
Availability management. The availability management function is responsible for maintaining the availability of IT services and information to meet SLA requirements.
Capacity management. This function plans and controls service capacity to meet SLA requirements.
Financial management. The financial management function is responsible for budgeting, cost accounting, cost recovery from business units, and all other tasks that ensure that you are providing IT services in the most cost-effective manner.
Service continuity management. Service continuity management (often referred to as contingency planning) is responsible for developing and testing plans to recover from an IT disaster.
Service level management. This function is responsible for negotiating SLAs with the business units. They also monitor the IT organization's compliance with the SLAs.
Workforce management. The workforce management function is responsible for recruiting, retaining, training, and motivating the IT workforce.
The functions in the optimizing quadrant often identify changes that the IT department should implement to improve delivery of the IT services. The Release Approved Review is the final review for these proposed changes.
Microsoft created the Team Model on the basis of ITIL's best practice for organizational structure and process ownership, augmented by best practices used by organizations with successful IT operations. By examining the practices of these successful IT organizations, Microsoft found that these organizations shared many common attributes that were the keys to their success. These attributes drive the team and help define the Team Model.
Accurate inventory tracking of all IT services and systems
Automated, predictable, and repeatable system management
Balancing costs with technology and business needs
Focus on service level management
Management of physical environments and infrastructure tools
Management of services provided through partners and outsourcing vendors
Protection of corporate assets by controlling access to systems and information
Quick problem resolution
Release management and change management
Building successful teams requires shared principles that set guidelines for how the team functions and create a sense of common values. The primary principles and guidelines for the Team Model are:
To build strong, synergistic virtual teams
To leverage IT automation and knowledge management tools
To provide great customer service
To understand the business priorities and add business value
To attract, develop, and retain strong IT staff
Microsoft incorporated these shared attributes and principles into the Team Model to provide examples for how other IT operations teams can improve their own operations and service management practices. The Team Model describes the following:
Best practices to structure operations teams
Key activities and skills required for each of the role functions
Key quality goals of an effective operations team
How to scale the teams for different sizes and organization types
Guidance for operating distributed environments based on the Microsoft platform
The role clusters of the Team Model define six general categories of activities and processes. The role clusters are groups of activities that share common goals. They do not imply any kind of organizational chart and they are not job descriptions. They also do not imply a specific number of people to perform these roles. The number of people will vary for each organization. A small organization may choose to have a single person perform several of the roles. Larger organizations may need a team of people-or possibly a virtual team-to perform a role.
Figure 2.5 shows the MOF Team Model with these six role clusters. The Team Model shows communication at the center. Clear, effective, accurate, and timely communication is important for all roles.
Figure 2.5: Microsoft Operations Framework Team Model
The tasks and activities needed to keep production systems operational are complex. Performing those activities and processes requires organization and coordination, but the complexity of the work makes this hard to accomplish. The Team Model helps simplify the complexity and provides guidance on team roles and ways to effectively organize the team.
The activities in the release role cluster are responsible for identifying and tracking resources, documenting processes, and maintaining the history of all IT environmental changes. This includes activities such as change management, release engineering, configuration control, asset management, software distribution, software licensing, and quality assurance. To meet this responsibility, the people performing these activities typically use a corporate knowledge base to track changes and lessons learned and a configuration management database to track inventory and changes to the environment.
The activities in the infrastructure role cluster are responsible for defining the physical environment standards, managing assets, maintaining the IT infrastructure, and overseeing the evolution of the architecture. This includes activities such as capacity management, IT cost management, enterprise architecture, infrastructure engineering, resource planning, and long-range planning.
The support role cluster is responsible for supporting internal and external customers. This includes activities such as product support, production support, problem management, service desk (or help desk), and service level management.
The activities in the operations role cluster are responsible for reliably performing daily, routine operational tasks. This includes activities such as availability management, archiving and storage management, database operations, file and print server management, messaging operations, system monitoring, and network administration.
Providing IT services requires cooperation with many groups outside the IT organization and outside the enterprise. The partner role cluster manages these partnerships in mutually beneficial and cost-effective ways. The role cluster also includes the external partners who provide critical services, including environmental support groups, hardware suppliers, managed services groups, maintenance vendors, software suppliers, and trading partners.
The security role cluster is responsible for ensuring data confidentiality, data integrity, and data availability. This includes activities such as audit administration, compliance administration, contingency planning, intellectual property protection, intrusion detection, network security, system security, and virus protection.
Even with the best processes and the best IT operations staff, you will still encounter unexpected problems. Many IT operations teams are unprepared for the unexpected problems. They perform their daily tasks with the naÔve assumption that everything will work as planned. These IT groups are usually easy to identify by the fear, panic, and finger pointing that is present when the unexpected disrupts their daily routine.
Successful IT operations teams plan for an uncertain future. They view the unexpected as a normal part of operations and proactively work to identify and control the risk. They view the risk management process as a continuous, visible, and important process. They have metrics for measuring their ability to evaluate risks and take actions that address the causes or problems, rather than just the symptoms. Microsoft based the MOF Risk Model on guiding principles that are common to these successful IT operations teams.
Continuously assess risks. Assessing risks is not a one-time project; it is a continuous process of searching for new risks and periodically reevaluating existing risks.
Formal, proactive risk management process. Success requires a process that the team understands and uses. The risk management process should be visible. The IT team should view it as an important process, and the process should have visible metrics.
Integrate risk management into every process and role. You should design every IT process with risk management in mind, and every IT role shares part of the responsibility for managing risk.
Risk-based scheduling. Changing an existing production environment often means implementing a set of related and interdependent changes. When planning and testing these proposed changes, it is easy to postpone testing the difficult pieces. In risk-based scheduling, the team focuses on the most difficult-and riskiest-changes first to avoid wasting time on changes that they will not be able to release.
Treat risk identification positively. Team members must be willing to identify potential problems without fear of criticism.
The risk model applies a structured, repeatable, five-step process to the daily problems that IT operations face. Figure 2.6 shows the five steps of the risk management process. Each risk goes through the complete process at least once and often cycles through several times. Because each risk goes through the process on its own schedule, it is common for multiple risks to be in each step simultaneously. The five steps in the process are as follows:
Figure 2.6: Microsoft Operations Framework risk model
Identify the risk. The purpose of this step is to determine the source of the risk (technology, people, process, or external), the mode of failure (performance, cost, agility, or security), the conditions that cause the failure (e.g., server's sole power supply fails), the operational consequences of the failure (i.e., what impact will the failure have on the operations team), and the business consequences (i.e., how will the business as a whole be hurt).
Analyze the risk. This step determines the risk's probability and the impact to the business (on a scale from 1 to 10) if failure occurs. The risk exposure is calculated by multiplying the probability by the impact. You can use the exposure value to prioritize risks.
Plan. The purpose of this step is to define mitigations to reduce the probability and/or impact, identify trigger conditions that indicate the failure is imminent but has not yet occurred, and define contingencies to execute if you detect the trigger condition.
Track. Continually gather information about how elements of the risk are changing over time.
Control. Continually manage the risks. Execute the contingency plan if you detect a trigger condition. Retire the risk if you no longer need it. If risk factors change (e.g., impact or probability), restart the cycle at Step #2 to reevaluate the risk.
The risk process includes the following lists of risks:
Risk assessment document. The identify, analyze, and plan steps gather information about a particular risk, and the track and control steps use the collected information as input for decision making. The risk assessment document includes all information from each of the five steps. This includes the source of risk, mode of failure, condition, operations consequence, business consequence, probability, impact, exposure, mitigation, triggers, and contingency.
Top risks list. This list is a ranked list of a small number of major risks that have the greatest exposure and warrant the most attention.
Retired risks list. It is important to keep information about retired risks for historical reference. Whenever a risk becomes irrelevant, you should move it to the retired risks list.
|< Day Day Up >|| |