Availability Metrics

team lib

Providing a measurement of services within the storage network requires some type of metric. The most encompassing is the service level. Service levels exist on several levels. There are specific agreements with end users that define the level of service the IT department provides. There are internal service levels that are managed to provide a working measurement of systems and networking performance. Indeed, working with existing service levels for storage, if they exist, may be either too specific or fail to have sufficient depth to cover performance beyond pure capacity measurements. The key is the establishment of meaningful measurements for the SAN infrastructure that encompasses both storage systems and network specifications.

Establishing service levels for storage network configurations requires the integration of dependent system functions, related network activities, and specific storage elements as they come together to form the storage network infrastructure. First and foremost in the development of these metrics is the association of the workloads. A discussion of estimating workload I/Os can be found in Chapter 17. This is an important exercise because it forces discipline regarding establishing the user s side of the equation and assists in integrating storage configuration service levels into supported applications.

As discussed, two forms of service levels should be developed: an external articulation of services to the supporting business units and end users, and an internal service level that supports both the data center infrastructure and your application development colleagues (DBAs, other systems administrators, and business continuity teams ). The external service to the business units is explained in the following section.

External Service Level for Storage Networks

The following can provide a data availability framework for working with external service levels.

Data Availability

This component defines which data is referred to with the service and when it will be available. In addition will be a understanding as to when the data will be unavailable. All data becomes unavailable at some point because of the need to back up files and volumes periodically, and because of the occasional system upgrade that makes all data unavailable while hardware, operating system, or application software is upgraded. These are typical of general service levels and should be included within those commitments. For storage service levels, however, the data center should note that particular availability requirements will be subject to additional or similar instances where the hardware, switch OS, or a management application will be upgraded. These may require the user data to be unavailable during those times.

Data availability is extremely important to the data center. The ability of meeting external service levels will center upon IT activities to define the service levels along with supporting the service levels.

  • Uptime This metric is usually expressed as a percentage of the data availability period that the data center has committed to. This is driven by the need to keep data available for particular applications, user transactions, or general-purpose access (as with network file systems), and reflects the sensitivity of the business application to the companys operations. Examples are expressed in terms of 99.999, the five nines to distinguish that the data is committed to be available for that percentage of the overall time period it has been committed to. In other words, if an OLTP application requires a 99.999 uptime during the transaction period of 6 A.M. to 6 P.M. PST to support the operations of both coasts, then the percentage is applied to that 12-hour period. Therefore, in this example, the OLTP application requires the data be available 719.9928 minutes of that 12- hour (720 minutes) time periodeffectively rendering this a 100-percent uptime for the data.

  • Scheduled Downtime This metric defines the known time required to perform periodic maintenance to the data. Knowing this up front provides additional credibility to the storage infrastructure and reflects the time the data is unavailable to end users and applications. The important distinction is that the same data used in the transaction system very likely will be the same data that is accessed in batch mode later that night. Although transparent to the end users, this can be critically important to internal service levels.

    Note 

    Its important to note the distinction between data availability uptime and response time. The main purpose of the storage infrastructure is to keep the data available for the required applications. Given that actual application response time is made up of a series of components that include the operating system, database, and network, the ability to provide metrics beyond the availability of users is beyond the storage infrastructures scope. However, the commitment to I/O performance must be reflected in the internal service levels leveraged as a component of the overall response time.

Data Services

This component of the service level defines the services that the storage administration offers.

Although the services listed next are traditionally accepted services of the data center, the storage infrastructure will accept increasing responsibility in meeting service levels surrounding these items.

  • Backup Service Operations to copy specific sets or groups of data for recovery purposes due to data corruption or service interruptions.

  • Recovery Service The corresponding recovery operation that restores data compromised due to data corruption or service interruption.

  • Replication Services Services provided by the storage infrastructure that send copies of user data to other processing infrastructures throughout a company. Generally used to copy data to remote offices for local processing requirements.

  • Archival Services Services providing periodic copies of data for necessary legal, local policy, or governmental archival purposes. Used in conjunction with the backup service, they provide a duality of service.

Disaster Recovery Commitments

As part of the business continuity plan, these services and service levels offer data availability during the execution of the disaster plan.

Consider the key storage items listed next when developing service levels for business continuity and disaster recovery plans.

  • Data Recoverability Matrix Its unlikely that all data will be covered under a disaster site recovery plan. Therefore, a matrix, list, or report should be available regarding what data is available at any point in time prior to executing the disaster site recovery plan. The service levels and services here are to some degree a subset of services available during normal operations.

  • Data Uptime The same metric as the normal operating data uptime percentage and predefined processing time period.

  • Data Scheduled DownTime The same metric as the normal operating percentage, but using a predefined processing time period.

  • Restoration of Services A new level of service driven by the schedule for total restoration of services for a disaster site recovery plan.

User Non-Aggression Pact

It is always helpful to reach an agreement with a user community that accounts for the unexpected. This allows both parties to agree that unexpected circumstances do happen and should they occur, they should be resolved with mutual understanding and changes in the existing plan. This can be critical given the volatility of SAN installations and NAS data-volume requirements.

The following are a few of the unexpected circumstances that are worth discussing with end-user clients . These can form the basis for a non-aggression pact that will benefit both parties in managing to the agreed upon service levels.

  • Unexpected Requirements The most common set of unexpected circumstances is the unforeseen application that requires support but which hasnt been planned for. In terms of affecting the SAN installation, this can be costly as well as disruptive, given the scope of the new application storage requirements. Within the NAS environment, this is one of the strengths NAS brings to the data center. If the requirements can be handled through file access and NAS performance and capacities , then the NAS solution can be used effectively during these circumstances.

  • Unforeseen Technology Enhancements This is the most common circumstance when the initial SAN design proves insufficient to handle the workload. The requirement to retrofit the SAN configuration with enhanced components means additional cost and disruption. This can at least be addressed with the understanding that new technology installations are available.

  • Mid- term Corrections It is likely that any enterprise storage installation will experience either one or both of the preceding conditions. Consequently, it is extremely important to build into the user agreements an ability to provide mid-term corrections that are an evaluation of the current services and corrections.

Internal Service Levels for Storage Networks

Within the storage infrastructures, and especially the storage networking areas, the support is both to end users as well as other individuals in the data center. Therefore, there will be an internal set of service levels that support the other infrastructures within the data center. These at a macro level are the systems organizations, of which storage may be a component, the support of systems administrators, and from which web masters, systems programmers, and system-level database administrators (DBAs) will be critical. On the applications side, storage remains an integral part of applications programmers, analysts, and maintenance programmers requirements. Certainly, we cant forget the network administrators, help desk, and network maintenance people.

The following is a more detailed discussion on best practices for negotiating and determining service levels for internal IT co-workers and management.

  • Storage Capacity The most common requirement for the data center is the raw storage capacity necessary for applications, support processing, and database support, just to mention a few. A service level here will prove very productive in staying ahead of user requirements and system upgrades that require additional raw storage capacity. This is critical with SAN and NAS configurations, given that each has attributes that make upgrades more support- intensive . With the manipulation of switch configurations, zoning, and LUN management upgrades, storage capacity in the SAN environment is not a snap on activity. Although NAS provides a much easier way of increasing storage, the bundling of the storage device may provide more storage than required and external network performance effects may be more intense and challenging than they first appear.

  • Data Availability This service level is similar to the end-user service level, and in many ways forms the foundation for the application service level with the same end user. Over and above the commitments for production data, which should be addressed with the supporting applications personnel, is the need for data to be available for testing, quality assurance functions, and code development.

  • Data Services Again, similar to the end-user services specified in the external functions, are the services provided to assure backup/recovery and data archiving. These services include operating system copies, code archives, and backups for network configuration data to name a few. Relating these to the SAN infrastructure initiates an increase in the complexities of recovery operations. Given these installations are new and require additional procedural recovery operations to be developed, a learning curve is expected.

  • Storage Reporting Key to internal constituencies, this provides the basis for monitoring data usage and activity within the storage network configurations. It sets the stage for effective housekeeping activities by establishing either new or preexisting quotas, archival policies, and ownership tracking. In addition to these administrative tasks , it establishes a tracking mechanism to help fine-tune the storage configuration. This is key to the SAN environment where volume allocation, physical data placement, and systems/application access are controlled through the storage support personnel and your configuration.

 
team lib


Storage Networks
Storage Networks: The Complete Reference
ISBN: 0072224762
EAN: 2147483647
Year: 2003
Pages: 192

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net