IO Content and Workloads | Storage Networks: The Complete Reference

I/O Content and Workloads

Potential resource utilization and traffic arrival rates play a very important role in integrating the workload into an existing infrastructure, or in building a new infrastructure, for that matter. As discussed previously, the categorization and accumulation of workload attributes provided the definition and requirements for the workload. Continuing with our banking application example, Figure 17-6 illustrates the final inventory, categorization, and definition of the workloads. This allows us to accumulate a set of estimates of resource requirements for each workload and provide an estimated sum of the entire set of workloads. In order to accomplish this, we must look closely at each category of workload.

Figure 17-6: The workload inventory, categorization, and definition of the Banking Application System

The resource requirement details are contained in the major areas of I/O activities, data organization, data paths, and user access. Putting these areas together forms a picture of the required infrastructure for the workload and, as we will see, a picture of the total infrastructure.

The Data Organizational Model

The most important aspect of the workload is the data organizational model it uses. In todays inventory of applications, both internally developed by IT application groups or IT implemented through application packages, the majority are based upon a relational model. The use of relational database technology (RDBMSs) defines the major I/O attributes for commercial applications. This is a good thing because it provides those workloads that use RDBMSs a set of processing metrics for estimating I/O behavior and utilization. The use of the relational database has become so accepted and widespread its macro behavior is very predictable. Additional resource utilization is provided by the workload characteristics that define internal processing requirements such as caching, temporary workspace, and partitioning.

Dont make the mistake of overlaying the storage infrastructure too quicklythe consideration of recovery scenarios and requirements needs to be considered at the macro level first and then decisions made to handle specifics of the workload or particular subset of workload (for example, the specific application program). Therefore, we will have all our workload consideration taken into context before we make any conclusions about storage system features such as RAID, cache sizes, and recovery strategies.

The data organizational model provides the following sets of information to the workload behavior:

Block Size The size of the block of data moving from the application transaction is dictated by the setup of the database. This can be either file or database attributes.
Partitioning This attribute defines behavior regarding user access patterns in relation to the data itself. It influences decisions regarding the type of fault resiliency strategy for the workload (for example, RAID levels) and software recovery mechanisms.

Physical Design The physical design of the database drives how a supporting file system is used. This is perhaps one of the most important attributes to consider given the type of database and its performance when using a file system.

Note

Relational databases continue to prefer the use of raw disk partitions. This is important given the performance penalties one may encounter when performing I/O operations that are duplicated through the file system and passed off to the RDBMS system for further read/write operations. Referred to as the double-write penalty, this will be covered in more detail in Part VI.

Maintenance Probably the most overlooked attribute in todays implementation- crazy environments, this defines workload behavior itself and includes requirements for backup/recovery, archival, and disaster recovery.

Note	Regarding the use of RDBMS technology, the backup/recovery group not only includes basic backup operations but more important recovery operations that have to occur within a transactional basis. They must therefore include the necessary log files and synchronize the database to a predefined state.

This articulates a macro view of workload considerations and attributes supported by the analysis of the data organizational model.

User Access

User traffic defines the data hi-way attributes for the workload. Notice that we address this set of information prior to our topic on data paths. Obviously, we need to know the estimated traffic to understand the type, number, and behavior of the data hi-ways prior to associating that with the workload expectations. We basically need three types of information from the end users or their representatives, the application systems analysts. This information is shown in Figure 17-7 and consists of a number of transactions, the time-period transactions needed to execute, and the expected service level.

Figure 17-7: User access information for workload analysis

This activity is the most challenging. Trying to get either end users or their application representatives (for example, application system analysts) to estimate the number and type of transactions they will execute is difficult and, although important, can and should be balanced with a set of empirical information. Lets look at our banking application example and the deposit transactions. If we query the user community, we find that each teller handles approximately 100 deposit transactions per day. There are ten branch locations and, on average, five tellers per branch working an eight-hour shift. This results in an estimated 5,000 deposit transaction per day that need processing between the hours of 9 A.M. to 5 P.M.. Tellers expect their deposit transactions to be available during the entire eight- hour shift and have transactional response times ranging from subsecond processes to those lasting no more than five seconds.

However, the deposit transaction comes in three forms: simple, authorized, and complex. Simple requires an update to a customer table, authorized needs an authorized write to the permanent system of record for the account, which requires a double-write to access more than one database table. The complex, meanwhile, requires the authorized transactions but adds an additional calculation on the fly to deposit a portion into an equity account. Each of these has a different set of data access characteristics yet they belong to the same OLTP workload.

This is important given the potential for performance degradation if the appropriate data access points are not taken into consideration. Not all of this information can be expected to come from the end users, although we can estimate by analyzing the transaction history to determine the mix of subtransactions and thus plan accordingly . Consequently, the importance of fully understanding the transactional estimates generally goes beyond the end user or even the application developers estimates, providing critical information regarding decisions in I/O configuration.

However, its also important because it defines the I/O content of the transactions. I/O content is defined as the amount of user data transferred during an I/O operation. We have discussed previously that both bus and network transfer rates differ in the amount of bytes transferred. However, this is dependent on several variables including operating system, file system, and data partitioning in RDBMSs. Consequently, it is not always the case that the bus is full when executing an I/O transaction. Therefore, the more intensive the I/O content, the more throughput occurs, and the less time it takes to complete a transaction based on obtaining the amount of data needed.

An example is the deposit transaction set where the simple transaction only requires access to a database record within the database table. Even though this customer record only consists of a small amount of data, it still requires the server OS to execute an I/O operation. This design hardly makes the trip productive given the system overhead of the I/O operation using SCSI/PCI configurations or the larger amount of system process overhead necessary to leverage the tremendous payload of Fibre Channel. However, if the application design requires that each separate transaction is provided a single transfer packet or frame facilitated by the I/O operation, then the efficiency of the I/O must be considered to understand the necessary system requirements to support the I/O workload. Although this may provide an extremely fast I/O operation and subsequent response time, the amount of system resources dedicated to accomplishing this limits the number of transactions supported.

Using this example analysis of a single I/O per transaction, the number of transactions processed within the stated time period becomes very important. In the case of the example deposit application, the simple transactions make up the bulk of the workload. This leads to the conclusion that the capacity of the system is simply based upon the number of I/Os required for processing. Nevertheless, we know that the efficiency of the I/O system is highly inefficient and subject to non-linear response time service should any anomaly in increased transactions occur. Consequently, basing a capacity estimate simply on the number of I/Os a system is capable of is not necessarily the only metric required for balancing or building an effective and scalable I/O infrastructure.

The final piece of critical information regarding user access is the expected service level. This places our eventual, albeit simple, calculations into a framework that defines the resources needed to sustain the amount of operations for the OLTP workload. From our initial information, we find that there are two goals for the I/O system. First, the banking applications data needs to be available from 9 A.M. to 5 P.M. each workday . Second, the transactions should complete within a time frame ranging from subsecond response times to those lasting no more than five seconds. For the sake of our example, we will not address the networking issues until later in Part VI. However, it is important to note that although the I/O can, and will, take up a great deal of response time factors, network latency issues do need to be considered.

By adding our service-level expectations to the mix, comparing these to user trans- actional traffic estimates, and considering the detail of the subset of deposit transactions, we find that the deposit transactions require a system that enables a transaction rate of 5,000 transactions/8 hours, or approximately 10 transactions per minute. That adds to our cumulative amount, which provides the capacity for the entire banking application. The cumulative analysis further concludes that a 500-MBps transfer rate is needed to successfully meet user expectations.

In addition, the infrastructure must support the 100 percent uptime that users expect in terms of data availability. This is a foundational requirement for meeting response time (for example, the data must be available to process the transactions). However, this begins to define the type of storage partitioning and structure necessary to provide this. Consider also that in meeting these goals the system must continue to process in the event of erroran area handled by RAID. In particular is the decision to provide level-1 or level-5 solutions necessary for efficient response time even during recovery processing.

Data Paths

Now, lets look at the data hi-way required for this workload. From an analysis of our first two categories we create a picture of the logical infrastructure necessary for the workload. By comparing the data organizational model (for example, the type of database and characteristics) and byte transfer requirements with something called the concurrent factor, we can begin to formulate the number of data paths needed to meet workload service levels. The concurrent factor, as mentioned previously during the user access discussion, determines the minimum, logical set of paths required to sustain our service level given the probability that all tellers at some point may execute deposit transactions simultaneously .

This calculation provides a more accurate picture of the resources needed to sustain the service level in real time. In reality, the probability that all the tellers will execute a deposit simultaneously is actually quite high and is calculated at 90 percent. Therefore, for each time period, 90 percent of the total tellers would be executing a deposit transaction. From our previous calculation, we estimate a mix of simple, authorized, and complex deposit transactions, which would be something like 80, 15, and 5 percent, respectively. This calculation provides for the average number of bytes transferred while taking into account the different I/O content of each transaction.

Figure 17-8 illustrates the accurate requirement for our workload. With this I/O workload analysis information, we can evaluate existing configurations to see if any of them will sustain the load, or develop a new model to configure a system that will sustain the load. Most likely you will want to do both. As in our sample case, we can see that we need a large amount of sustained workload I/Os for the entire business application. If we overlay this existing solution of direct-attached SCSI storage systems with capacities of no more than 50 MBps and arbitrarily-based device execution, it is likely this will be completely deficient in meeting workload service goals.

Figure 17-8: Calculating workload requirements for byte transfer rates

However, if we develop our own model, we find that a simple FC SAN solution with a 100-MBps rate through a frame-based transport will likely sustain the workload and support the concurrent transactional workload I/Os. If you add storage controller requirements of RAID, data maintenance, and recovery applications, we can estimate three data paths totaling 300MB burst rates. An estimated 240MB sustained rate thus will not only provide sufficient transfer rates but also a safe zone to compensate for peak utilization and growth factors before additional paths are required. Figure 17-9 illustrates a logical model built from our calculations.

Figure 17-9: Workload analysis logical model

Note	The 240MB transfer rate referred to previouly is calculated at 80 percent of total capacity (2,000 bytes — 80% = 1,600 bytes), which provides for switch and device latency.

From this model, we can begin to assign specific technologies to find the most appropriate fit. Cost will certainly be a factor in determining the best solution for the workload. However, cost notwithstanding, the value of workload identification, definition, and characterization starts to become evident when moving the workload analysis into real implementations .