10.3 Sizing the Link Bandwidth for Long-Distance Storage Networking Applications | IP Storage Networking: Straight to the Core

As described previously in this chapter, there are two primary types of data mirroring: synchronous and asynchronous . In determining which to choose, IT managers first must answer the question, Should the storage in our primary data center become completely inaccessible, how much data can we afford to lose? If the answer is none, then synchronous mirroring must be used; otherwise , either asynchronous or synchronous mirroring can be used.

Synchronous mirroring requires that each write must be completed successfully on both the primary and the secondary disk array before the servers or initiators can write more data. That ensures that both sets of data are identical at all times. In general, increasing the network bandwidth available to synchronous mirroring applications increases the write performance of both the primary and the secondary disk arrays, as the pacing factor is the time it takes to transmit the backup data across the network and receive a response from the secondary array. But even with ample network bandwidth, other factors may come into play, which limit the performance. Those factors are explained in the next section of this chapter.

Given enough network bandwidth, disk arrays that are mirrored asynchronously also can be kept in lockstep, but the asynchronous mirroring application provides no guarantee of that. Instead, only a best effort is made to keep the secondary disk array updated by the primary array. Some IT managers may elect to perform batch updates only periodically ”say, once per hour or once per day ”rather than allow continuous updates.

For continuous updates, increasing the network bandwidth available to asynchronous mirroring applications reduces the amount of lag between the updates to the primary and the secondary arrays, so less data is lost should the primary array go off line. For periodic updates, an increase in the amount of bandwidth available simply shortens the backup time. That has become increasingly important as overnight backup time windows shrink due to globalization of commerce and the resulting need for around-the-clock access to data.

There are many ways to estimate the amount of bandwidth needed for metropolitan- and wide-area asynchronous mirroring, but a fairly good starting point is to size the network links for the peak hour . At times within that hour, there may be insufficient bandwidth to handle sporadic bursts of data, but for most applications, the bursts last only a few seconds, after which there is ample bandwidth again and the secondary array can catch up.

Assuming that the bursts are not sustained, the peak hour estimate can be used. This approach also can be used for applications in which there are sustained bursts, but which can tolerate the loss of a few minutes of data. In this case, it is okay for the secondary array to fall behind for extended periods of time.

If historical traffic patterns are not available for the primary array, then the activity rate for the peak hour can be estimated by using the procedure shown in Figure 10-3.

Figure 10-3. Estimating the amount of data to be mirrored during the peak hour.

graphics/10fig03.jpg

Figure 10-4 shows a numerical example of an estimate of the activity rate for the peak hour.

Figure 10-4. Numerical example of a peak hour data activity estimate.

graphics/10fig04.jpg

Once the data rate for the peak hour has been measured or estimated, the network bandwidth requirements can be calculated. As mentioned in the previous chapter, network bandwidth is measured not in bytes, but in bits, per second, so care is needed to insure the correct units are used. Also, some framing overhead is required to transport the data over the network, and that must be added to the bandwidth calculation, as shown in Figure 10-5.

Figure 10-5. Estimating the amount of traffic on the network links.

graphics/10fig05.gif

Extending the results of the numerical example for calculating the peak hour, the network bandwidth estimate is shown in Figure 10-6.

Figure 10-6. Numerical example of a network link traffic estimate.

graphics/10fig06.gif

Some of the new multiprotocol storage switches and IP routers have a data compression facility, which reduces the amount of network bandwidth needed. For highly repetitive data, the compression ratio can be better than 10:1. Ratios for mixed storage data typically range from about 2:1 to 5:1, so an estimate of 3:1 is fairly conservative.

For the example shown in Figure 10-6, the assumption of a 3:1 data compression capability would reduce the bandwidth requirement from 384 Mbps to 128 Mbps, which could be handled safely by a standard OC-3c (155 Mbps) network link. Without data compression, a much more costly OC-12c (622 Mbps) network link would have been required for this application.

If, during the peak hour, there are sustained bursts of data, and the loss of a few minutes of data is unacceptable, then a better network bandwidth estimate may be needed. Fortunately, in situations where network designers know that there are sustained bursts, the knowledge most likely was derived from historical traffic patterns on the primary disk array, so that same traffic pattern information also can be manipulated to produce a better bandwidth requirement estimate.

For example, if the goal were to limit the data loss to no more than 10 seconds, the historical traffic pattern observations could be sliced into 10-second samples and the network link could be sized to handle the largest observed value. That may call for a link that has much more bandwidth and costs much more than the one that was derived from the peak hour calculations.

If the cost of the faster link is too great, a calculated risk can be taken as a compromise. More sophisticated statistical analysis could be used to determine, say, the 95th percentile value of the peak 10-second observations, and the necessary network bandwidth could be derived from that. In that case, the link would be expected to handle all but 5 percent of the peak data rates for any 10-second interval. Depending on the variability of the data rate, it's possible that the 95th percentile of the 10-second interval data rates could be covered by the bandwidth needed for the peak hour.

Other analysis tools can be used to produce bandwidth requirements estimates. For example, if the raw traffic data is available for the primary array, it can be relatively easy to use time series analysis to produce an estimate of the peak loading for various intervals of time. A basic moving average model can be used, adjusting the averaging period to correspond to the maximum amount of data loss that can be tolerated and taking the peak values produced by the model.

In general, as the sample intervals become smaller ”one second or subsecond ”bandwidth models for asynchronous mirroring become virtually identical to bandwidth models for synchronous mirroring. However, since there is no possibility of data loss for synchronous mirroring, the amount of bandwidth allocated to those applications affects instead the amount of time that the servers have to wait for the data to be written remotely before they can proceed to the next write operation. That waiting time also is affected by the latency of the network, which is examined in the next section.