You can gather data to help show the quality and increase the acceptance of changes, and you can work through measurable requirements that define the desired future state. However, you must not focus only on the production environment.
Implementing a service is rarely a one-time process. The service is deployed and debugged in a development environment before it is put into a test environment. After it has been load tested and certified to be ready, it is deployed into the production environment. In addition to the changes in the lower layers (for instance, operating system changes, patches, or new, upgraded, or expanded hardware), the service often changes (for instance, patches, configuration, and tuning changes and things to compensate for or take advantage of activity in the lower layers of the stack). Metrics relating to the number and frequency of services and changes, and the time spent in the life cycle at each stage and concerning activities within each stage should be gathered. The N1 Grid software enables, and businesses require, measurements that businesses can use all through the enterprise.
The N1 Grid service life cycle model encompasses a number of high-level stages that are presented in this section using UML state diagrams (FIGURE 5-6). A state represents a specific cycle in the overall life cycle, and the transition between cycles represents a shift in the phase of the life cycle of a service. The transitions between phases in the life cycle are critically important for the following reasons:
Figure 5-6. Example Service Lifecycle Phases
Testing Structure to Prove SLRs
Accompanying each SLR and requirement are the method to show it has been satisfied and the owner to which the SLR and requirement must be shown. Although many requirements are more binary in nature (the architecture is either an ISP with web services or it is not), many SLRs require testing harnesses and data gathering and roll-up activities (for instance, it supports 8000 web connections per second under this distribution of use cases) that must be included when considering the project effort and cost.
The data from a load test (for example, it only reached 7000 web connections per second) or the operational maturity analysis might result in changes to the original architecture. This iterative activity is a natural part of most complex architectural design methodologies.
The collection and presentation of this data takes time and disk space, but it is the only way to unambiguously demonstrate that requirements have been met. Make sure you are collecting data for all of your needs:
Operational Maturity Measurements
This section describes measuring the people and process aspects of your organization's operational capability. The ideas, definitions and methodology are based on Sun's Operations Management Capabilities Model (OMCM), which can describe the current state an organization's realization of the SunTone Management Framework. The tools aspects of the OMCM have been previously introduced. The authors would like to again thank Michael Moore and Edward Wustenhoff for allowing us to include an overview of their "Operations Management Capabilities Model" white paper.
The different levels of the OMCM are categorizations of an organization's service delivery capabilities. A degree of implementation description is used to characterize the extent to which an individual component of capability has been realized by an organization. There are five potential scores for degree of implementation:
Though this scoring mechanism creates a consistent terminology and simplifies the application of the model to real situations, the definition of each characteristic applies only to the component being analyzed. For example, the characteristics of a functional IT operational process are described differently than the characteristics of a functional monitoring infrastructure (for example, the characteristics that make a monitoring process optimized are very different than those that make asset management optimized).
The characteristics adhoc, emerging, functional, effective and optimized are fully defined and described for each management practice and subpractice in "Operations Management Capabilities Model." After being described, the degree of implementation for a management process can be mapped to the OMCM levels. This mapping allows the creation of a capabilities profile that describes the degree of implementation for every component at a given OMCM level. You use this profile to determine an organization's OMCM level.
A major part of delivering IT services is managing the organizations that have responsibility for executing the various IT management processes. People management describes a set of practices necessary to ensure that the IT infrastructure is staffed in an appropriate fashion and that people have the necessary skill sets. The people management practice should be a process-oriented improvement model in which the IT organization is matured through the institutionalization of different workforce management processes. The more integrated into this organization these activities become, the more effective and efficient the organization will be.
The OMCM measures capabilities that describe the degree to which people management practices have been implemented within an organization. The measuring is performed by evaluating the subpractices that comprise the five people management practices:
The individual People Management practices and their subpractices are described below. The characteristics Adhoc, Emerging, Functional, Effective and Optimized are fully described for each practice and subpractice in the OMCM white paper.
Organizing IT services refers to activities that are related to the design of the organization's structure. These would include items such as identifying organizational groups, developing specific roles and responsibilities for each group, and describing the interfaces among groups.
The following practices are part of the organizing activity grouping:
Skills development is the set of activities that helps individuals acquire the knowledge and practical abilities necessary to perform current jobs or prepare them for future assignments:
Resourcing is the set of activities necessary to acquire the individuals to meet the goals of the organization. This would include activities to identify required skill sets, determine how many of each type is required, develop a timeline for acquiring them, and identify sources to fill the requirements. The following practices support resourcing activities:
Knowledge management is the set of activities related to the capture, documentation, maintenance, and dissemination of organizational learning. Knowledge management activities enable the creation and maintenance of competency-based practices. Through the execution of knowledge management, organizations can take successful solutions and institutionalize them for reuse. This set of practices ensures that organizations move effective processes and make them repeatable:
Workforce management is the set of activities performed to control and support individuals as they perform their tasks. This includes management of individual performance and compensation and activities necessary to provide the workforce with the infrastructure to successfully perform their job functions:
Business process management is required to support the business service life cycles the existence and management of processes for creating, deploying, and managing IT and business services. The OMCM measures capabilities that describe the degree to which each process management practice has been implemented within an organization. The measuring is performed by evaluating the IT service subpractices that comprise the six process management practices:
The individual process management practices and their subpractices are described below. The characteristics adhoc, emerging, functional, effective and optimized are fully described for each practice and subpractice in "Operations Management Capabilities Model."
Creating IT Services
This category describes all processes related to the creation of new services, which includes activities necessary to identify, quantify, architect, and design IT services:
Implementing IT Services
This category describes all aspects relating to the physical realization of the IT service as it is defined and created in the previous category. It addresses all aspects that ensure proper rollout of a new or updated service.
The degree of implementation is assessed by analyzing the ITIL release management process. This process protects the live environment (or IT service delivery environment) and its services through the use of formal procedures and checks. Release management works closely with the change management and configuration management processes.
Delivering IT Services
Delivering IT services is the most visible part of an IT organization's activities. This category addresses activities for the proper delivery and ongoing operation of the IT services. It is often referred to as "IT operations" or "data center operations." To assess the degree of implementation of this category, the OMCM looks at the following ITIL defined processes:
Improving IT Services
This category addresses all activities surrounding the measurement and optimization of IT service activities with the goal of continuously improving service levels. To assess the level of operational capability in this category, the OMCM looks at the following processes:
Controlling IT Services
This category addresses activities to deliver the IT service within the constraints identified by the governing body, including the processes that facilitate the IT governing activities. Examples of governing functions are financial controls, audit, and alignment with business objectives.
To assess the level of operational capability in this category, the OMCM looks at the following processes:
Protecting IT Services
This category addresses all activities that ensure that IT services are still available under extraordinary conditions such as catastrophic failures, security breaches, or unexpected heavy loads. As businesses depend more and more on IT services, this area becomes more and more important to address.
To assess the degree of implementation of this category, the OMCM looks at the following ITIL defined processes:
This chapter has discussed metrics and useful data for provisioning and observability. Policy combines those functions into a feedback loop that anticipates, corrects, and improves your business. There are many opportunities to observe and react to events in your data center. Examples of measurements to consider include:
Planning for a policy can start immediately, solidifying the measurements to guide the N1 Grid software policy. It is worthwhile to organize a policy model, begin populating the information model, and start to connect the business and system viewpoints. The N1 Grid vision, architecture, and products can help inform this effort.
Just because the N1 Grid software flawlessly automates the installation of your approved and hardened golden services does not automatically mean that someone cannot log onto one of your servers and make undesired changes. N1 Grid solutions require the same vigilance, defense in depth, and change control that your environment deserves today, but it offers several additional capabilities that make some of that work easier.
Many N1 Grid software products support security processes by making it possible to compare the current environment against the approved and hardened golden image that is expected to be running in a particular location. Clearly, it is beneficial to be able to easily identify unexpected deltas to your builds. The roles used by the N1 Grid software can also segment access and entitlements. They divide not only along lines of operation (for instance, only certain roles can create services and only certain roles can install them), but also along access and control within business groups or within services in a given group. You can best choose how to parse out the identity and entitlements needed to control and run your business, but the N1 Grid software can help you leverage the value and safety out of a strong identity infrastructure.
Mobility is another area where policy, security, and a common information model combine to provide safety and efficiency. Having identity and rules to guide what entities can (and cannot) reside in common locations or in particular combinations speeds up the process to deployment or to react to a business need or observed and reported situation. A secure environment runs with data. Information must be collected and presented to enable command and control of who, when, and under what conditions an action is taken in the N1 Grid operating environment. Your security officer should have clear requirements for the confidentiality, integrity, and availability of the data in your data center and a clear picture of the security architecture and security operations that combine to minimize risk. Sun's layered security architecture approach elevates security to a systemic quality that must be viewed holistically. Data comes from many individual layers. FIGURE 5-7 shows an example of a possible layered security architecture.
Figure 5-7. Layered Security Architecture Example
Each layer represents a required area of discussion, action, and instrumentation, but equally important are the connections between layers and the capability of the stack as a whole. Security is a systemic quality and must be viewed holistically, and its CTQs must be identified, measured, and rolled up appropriately for reporting.