CTQ Measurements

People are often not prepared for the work required to collect, roll up, analyze, and report CTQs much less for the effort to preemptively react to maintain CTQ levels against events or other impacting situations. Beyond these common gaps, IT has typically not been run with business-level CTQs, nor has it worked out relationships between system-level events and information from probes and agents to components that help a business unit to make business decisions. The IT organization has been in the business of keeping the disks spinning and the servers running, and the CTQs for those types of activities are often granular enough to not require analysis at a higher level than disk metering and an occasional ping(1M) command to the server. Integration is occasionally done, but only for a few combinations of components. Typical questions are:

  • How is the disk stripe doing in the array?

  • Was there excessive network traffic at a specific time (for instance, during a denial-of-service attack that finally brought down some of the servers)?

The ability of the N1 Grid software to enable operational agility further up the stack also presents an opportunity to observe and use information from the layers further up the stack layers more closely connected to the business applications and services. Although people and processes must be in place to use this information after it is collected, the basis of observation and the information that can be used should come from an organized framework that includes all of the needed layers. The next section illustrates a framework that can organize the collection and correlation of business information.

OMCM Tool Framework

Chapter 4 introduced the Operational Management Capabilities Model (OMCM) as a means to help define key process areas that provide a measurement of the people, processes, and tools in a given organization. For visibility into the enterprise stack, the tools component of the OMCM details the functional components and their relationships to enterprise management technology. The tools are used to measure and report the performance of the system and to provide visibility into and reporting of the operational systemic qualities of the system.

The management tools framework (FIGURE 5-4) consists of a layered combination of management applications that are tied together, when appropriate, through the integration of specific components. This section contains a brief description of the management tools framework. For more information, refer to the Sun white paper "Operational Management Capabilities Model" by Michael Moore and Edward Wustenhoff.

Figure 5-4. Management Tools Framework

  • The instrumentation layer consists of all management elements that enable the various management tools to gain access to managed resources. Instrumentation is generally implemented within the context of the execution framework where managed resources reside through the appropriate agents, probes, or other adhoc scripts and executables.

  • The element and resource management layer consists of management applications that directly interact with the execution environment to query or modify managed resources.

  • The event and information layer consists of applications that manage events and information generated by the lower layers of the framework. The focus of the applications at this layer shifts from the measurement and modification of technical metrics to the management of data and alarms.

  • The management data repository is a logical representative of the storage and management of operational data.

  • Service-level managers are applications that provide the tie-in between business requirements as defined by SLAs and the technical status of the execution environment as determined by the lower layers of the framework.

Workflow technology is used to automate the management processes described in the IT Management Framework (ITMF). Examples of this type of technology are a trouble ticket system to support the problem management process or the automation of a change approval system used to support the change management process.

The management portal is a collection of applications that provides external entities access to selected portions of the management framework. Examples would be a web interface for reviewing service level management (SLM) reports, web or other types of user interfaces for various tools, or an application used by end users to submit requests for a service. It should also be possible, and even desirable, to use this portal to expose management information and facilities to people outside of the IT organization.

Example CTQs

The N1 Grid architecture enables businesses to consider new types of activities gauged by new types of CTQs. This section provides an overview of some of those business CTQs and a discussion of the ways that N1 Grid solutions can begin to help you focus on and deliver value to the business represented in those CTQs.

New Systems

In this book's Foreword, Greg Papadopolous emphasized the changes and opportunities coming to business organizations an Internet of things, where trillions of devices (as well as sub-IP devices like RFID tags) are connected to the Internet. N1 Grid architectures and products position you to be prepared for the new distributed computing paradigms, where business services are created on-the-fly from loosely coupled coarse-grained shared services that live on the Internet. Service Oriented Architectures (SAOs) and other distributed computing ideas are gaining a lot of attention as a way to solve the challenges of:

  • Hyper-frequent change Driven into the data center by competitive pressures and unrelenting customer demand for new services

  • Lots of users The exploding and unpredictable nature of web access coupled with the distributed computing world where other services will now be interacting with your services

  • Lots of devices The challenges of filtering and aggregating useful information from the signals of trillions of active devices

  • Lots of data Whether harvesting from designed loads (for example, data grids between collaborating sites) or organizing oceans of data from things like customer self-service access, negotiation and filtering for localized services, and streaming video and other content to a variety of devices

  • Lots of calculations Driven by both the type and amount of data (in which interesting things reside) and the ubiquity of compute resources that can be ganged together in many different ways (for example, grids, vertical scaling, horizontal scaling, distributed object-oriented computing)

Businesses are now allowed to think of undertaking calculations never before contemplated simulations that save research and development dollars, data mining to uncover new business information, genome and nanotech investigation and support for realtime enterprises are now not only possibilities, but regularly undertaken.

Time to Market

How many services do you release each year? What is the distribution of those release times? Exactly how have these distributions changed over the last five years? How long does it take to provision a new service into production? Is that time shorter or longer in the test environment?

Knowing the answers to these types of questions enables you to start treating IT like a business and to make more accurate projections of resources and timing. You can also break these times down even further and look for additional waste. You can set up this measurement infrastructure to enable yourself to address business issues because you know your provisioning is solid and the mobility of the applications and services you support enable you to react to issues or opportunities that arise. Think of the implications of being able to confidently tell your boss that you will be able to have six releases of a product this year versus the four released last year. Efficient design and implementation using N1 Grid architectures and products can make the IT department the trigger of bold business-changing initiatives.

Increased Availability

How many times was your environment impacted by an avoidable human error? Has that distribution changed over the last five years? How many different ways are the same servers hosting the same applications configured in your environment? How much time is spent figuring out exactly what is on a server before actual upgrade or repair work is started? Is your production environment configuration exactly equal to your test environment, and is your test environment exactly equal to your development environment? Can you guarantee that you can recreate the test environment that was in existence on a certain date? Can there be cost reduction (or better service for the same cost) in your budget due to a better distribution of skill sets if the N1 Grid architecture helps reduce complexity?

Do you expect more or less complexity in the next few years compared to the last few years? What are you doing to prepare for your answer? Fewer errors and outages let you focus on what matters in your environment getting the environment up and keeping it up. Reducing complexity also enables you to enlist skilled resources to support new or better business initiatives because you require fewer people and less costly skill sets to handle the automated deployment of well-understood deployable entities. In addition, the simplification is achieved with solid builds and configurations can be well understood for updating or servicing. Besides the reduced outages and escalations, stability helps to reduce your help desk call volume.

Cost Measurements

Who uses what services? What is growing, and what is not growing? Have you tried to understand your costs and users? How have those costs and users changed over the last five years? Even if you do not yet use it for charge back, this utility computing type of data is useful to know and track. You can use the N1 Grid software to ease the installation of a utility computing measurement infrastructure, or to increase the utilization of your datacenter resources.

The N1 Grid vision motivates you to use good architecture and service decomposition activities. Leveraging your efforts to meter and capture use cases that influence those decomposed entities is a small change to add into the provisioning automation efforts you choose to undertake. Because the tools solution framework also maps to deployable entities in all layers of the stack, leveraging the framework for the utility computing agents and reporting structure enables you to start your efforts as small or as large as you wish. Solid builds and automated deployments reduce errors and outages, which reduce help desk call volume. The N1 Grid software products Sun is shipping today can deploy service components into a variety of operating systems (Solaris OS, Linux, IBM-AIX, and Windows), enabling your staff to leverage N1 Grid investment in architecture and process across your enterprise.

Measuring Utilization

What is your current utilization of resources? What would the business impact be if you could double, triple, or quadruple it? How many different versions of the Solaris OS, Netscape Navigator™ browser, or Oracle database do you have to deploy and support? Do you have different deployment life cycles and provisioning processes for different types of compute, storage, and network resources?

Is your business better served by you supplying a solid standard environment with enough capacity for people's services or by the application development teams spending time learning about and specifying the servers you should buy for them? Is your business better served by your team learning about and practicing with the new application's configuration files and installation mechanisms or by the developers using your standard provisioning framework and common information model naming conventions to turn over the reins to your life cycle management team when their service is smoothly deployed?

People and processes are the largest sources of cost in your data center. The N1 Grid software provides you the opportunities to install frameworks to reduce the cost and complexity. The naming conventions ensure collisions will not occur on the same compute, storage, or network resources. Sun offers several technologies that keep disparate applications and processes protected, separated, and able to count on their share of needed resources. The mobility frameworks are easy to use and simplify the movement of entities into and out of denser standardized environments. Heterogeneous environments can be provisioned using the same provisioning framework and user interfaces to the N1 Grid SPS tools.

Regulatory Measurements

How many combinations of operating systems, patch levels, and applications do you currently run? Does your testing life cycle take so long that the standard build you have been approved to use has layers dangerously out of date (no patches for six months, old firmware, not able to run to gain the benefits of new hardware)? Can you go back in time and recreate the exact build used in your modeling research environment on a certain date when you made your company's groundbreaking discovery?

New regulatory constraints often change processes and require standard and approved builds, along with other types of record keeping. Legal and intellectual property issues might necessitate the need to reproduce a research or compute environment that existed at your company on a particular day. By creating a build that the N1 Grid software deploys, you automatically provide the means to store each build. The N1 Grid architecture's separation of the stack layers means that you can maintain an application build without having to tie it to a particular operating system build. You can upgrade or work on the layers below the application (for instance, adding more compute power, storage, or network resources), while maintaining regulatory compliance by continuing to run the approved and certified application layer.

Disaster Recovery and Business Continuity

Can you redeploy your business-critical applications and keep delivering services if a disaster happens? If you fail things over to a business continuity site, can you easily redeploy in the original data center after things are fixed? How long does it take to move services to the business continuity site? Can you guarantee that what is running in the new site will be exactly what was running in the original data center? Can you use the servers in the business continuity site for your current needs until they are needed for business continuity?

N1 Grid solutions enable you to rapidly deploy services, regardless of which data center those services are deployed. You can leverage the automated mobility you use every day to provide business value to also aid in providing business continuity.

Buliding N1 Grid Solutions Preparing, Architecting, and Implementing Service-Centric Data Centers
Buliding N1 Grid Solutions Preparing, Architecting, and Implementing Service-Centric Data Centers
Year: 2003
Pages: 144

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net