7.6 Optimizing storage

For the last two decades storage space has been both limited and relatively expensive; hard disks, for example, have been largely restricted to the MB range on all but very large mainframes. However, memory and media costs have continued to fall, to the point where today's laptop computers are available with over 80 GB of disk storage and over 64 MB of main memory. With front- and back-office applications generating huge amounts of critical data, storage management has become a high priority for enterprise administrators and network designers.

Traditionally, data have been stored locally on application, file, and database servers. On these systems, there is a loose hierarchy of storage components, in terms of expense, speed, and the average volume of data stored. We can rank each component in descending order of speed and decreasing cost per MB as follows:

CPU registers
Cache memory
Main memory
Extended memory
Fast hard disk
Slow (or compressed) hard disk
Tape or optical library
Vaulted tape (offline)

Clearly, the storage components with the fastest access time are typically the most expensive. In order to optimize data access time, the general theory is to place the most volatile (frequently accessed) data into higher-speed storage components. However, cost constraints mean that it is highly unlikely that all useful data could be placed in memory, so the current focus is on high-speed bulk disk storage solutions for the majority of applications. In fact, a combination of fast disks, compressed disks, and an automated tape library are commonly employed to provide a complete storage solution.

7.6.1 Disk compression techniques

Disk compression techniques are effective, useful, and widely available. Compression increases the apparent capacity of disks by encoding the data to take up less physical space. Disk compression may be performed in hardware (by a disk controller) or via software. In the latter case compression will require resources from the system CPU and is normally done in the background. Current techniques achieve an average of 2:1 compression ratio and up to 4:1 depending on the type of data. Although compression and decompression are transparent to applications, there is typically a small performance hit. Performance for compressed disks is generally slower than for noncompressed disks, especially for sequential write applications. In effect, disk compression allows the network designer to trade performance for storage at a lower cost per megabyte. When deploying disk compression, it is important to scope out the performance bounds through thorough testing.

7.6.2 Hierarchical Storage Management (HSM)

In practice the great majority of online data is used infrequently. Given that storage space is at a premium and archiving is a lengthy process that often renders the network unusable for several hours, it makes sense to organize data in such a way that space is optimized and to automate backup procedures so that they are both transparent and unobtrusive to network users. Any new methodology must also address the basic user requirements for data; critical data need to be available quickly, data that are required to be available for extended periods must be archived, and mechanisms must be in place to prevented data loss. Hierarchical Storage Management (HSM) is a powerful method of classifying and managing data according to use patterns in order to better optimize storage. In broad terms three classes of data can be defined, as follows:

Level 1: Online material—where data must be immediately accessible on a 24/7 basis.
Level 2: Near-line material—which must be accessed periodically, but not necessarily immediately or 24/7.
Level 3: Offline material—historical archives that are accessed infrequently but may be required to comply with legal, business, or security policy rules.

Level 1 material is normally held directly on the server. Level 2 material is migrated to a storage device such as a Magnet-Optical (MO) disk jukebox or tape library. Level 3 material is typically held in a tape storage library and stored offline in a vault (ideally fireproof). In practice both tape and optical libraries offer data recovery in the order of a few minutes, although data transfer rates for tape tend to be higher than for optical at present. An HSM server uses policy rules set by the network administrator to decide how data should be made available, and data are migrated transparently to the appropriate storage mechanism. The network administrator should ideally be able to set high watermarks so that critical online storage is managed dynamically, with less frequently accessed data migrated off the server in peak periods. Data may be statistically monitored to allow frequently used data to automatically move to faster storage and seldom used data to move to slower storage. As a general rule, data are not migrated until they are backed up.

Some HSM systems offer file tracking, which is integrated with directory services and operating system services (such as Novell's NDS). This can enable file pointers to be updated so that they remain consistent regardless of the data location. Other systems provide file stubs, which are managed in a separate database. IBM OS/400, for example, supports Media and Storage Extensions (MSE) and supports dynamic data retrieval [IBM]. From the user's perspective all files appear to be available; some just take longer to access than others since they are migrated from different storage devices to the server on demand. In effect, HSM operates like an intelligent caching system. Using HSM it is not unusual to reduce server storage demands to less than 5 percent of previous storage requirements. Aside from optimizing and consolidating precious server storage space, this data management technique can dramatically reduce backup times, which in turn increases network availability.

7.6.3 NAS and SANs storage strategies

Currently there are three main ways in which bulk storage is implemented, as follows:

Server attached storage—Disk drives and other persistent storage media are installed directly in application, file, and database servers, typically via interfaces such as IDE and SCSI. The traditional way of increasing storage is to add another disk or a larger server. This legacy solution has served the industry well for many years but does not scale and increases the Total Cost of Ownership (TCO). Since storage is not managed holistically at the enterprise level, there is likely to be substantial spare capacity that is effectively wasted.
Network Attached Storage (NAS)—NAS systems connect to the network using traditional LAN protocols such as Ethernet and TCP/IP. The system typically has an IP and/or other network address. NAS systems include intelligent disk arrays, online tape backup devices, or Just a Bunch of Disks (JBOD) connected to a network controller. See Figure 7.16(a).
Storage Attached Networks (SANs)—SANs separate storage devices from processing devices (servers or mainframes) via dedicated, highspeed Enterprise Systems Connection (ESCON) and Fibre Connection (FICON) [45]; Fiber Channel; and SCSI links. See Figure 7.16(b).

click to expand
Figure 7.16: (a) Network Attached Storage (NAS), and (b) Storage Attached Network (SAN) topologies.

Note in Figure 7.16 that although the NAS network is traditionally easier to manage and deploy, all storage-related traffic is inline with user traffic, wasting valuable bandwidth and degrading overall network performance. In the SAN topology all storage-related traffic is offloaded to a private network, improving both data and network access speeds.

With the increased use of multimedia and the reliance on mission- and business-critical applications, such as e-business, Customer Relationship Management (CRM), Sales Force Automation (SFA), and Enterprise Resource Planning (ERP), large enterprises are now deploying bulk storage in the terabyte range. Storage requirements for these large organizations are predicted to increase to tens and even hundreds of terabytes over the next few years. In order to facilitate high availability at this level, application and database servers are typically located at different sites, with all critical data mirrored between them. To provide scalability, load balancing is configured across the cluster, with high-speed optical fiber between sites.

7.6.4 Optical Storage Networks (OSN)

Current wide area network solutions provided by legacy operators are perceived to be bandwidth poor, very expensive, and slow to deploy, particularly as viewed by large enterprises. Companies are either suffering these inadequacies or installing their own private fiber links between sites. Neither situation is sustainable, because the demands for bandwidth are escalating, and the need to manage, optimize, and consolidate storage over this infrastructure represents a significant cost and maintenance burden for many organizations.

The potential of optical networks has been predicted for years, and with the emergence of new DWDM equipment that is both protocol and bit rate independent and the substantial increase in fiber-optic cable deployments, there is now a real foundation for a high-performance network infrastructure. Optical Storage Networks (OSN) employ DWDM technology as a wide area transport backbone to connect high-speed servers and storage components. This enables organizations to consolidate ESCON, Fiber Channel, and 100/1,000 Ethernet circuits onto the same optical fiber (by multiplexing feeds at different wavelengths). OSNs also offer the potential to aggregate voice, data, and storage networking traffic on a single, high-bandwidth, fiber-optic infrastructure that spans campus, metropolitan, and intercity networks. It is likely that this will be offered as a managed service through a combination of Storage Service Providers (SSPs) and Optical Service Providers (OSPs).

The evolution of OSNs and increased deployment of Fibre Channel are likely to have a significant impact on enterprise storage strategies over the next decade. As such, network planners should pay particular attention to the design of Fibre Channel SANs to ensure that they have a clear migration path to OSN in the future. Once OSNs are deployed, organizations may begin to consider the possibilities of extending the advantages of optical networking to traditional data network applications. Since an OSN is effectively a high-speed private WAN backbone, it could, for example, be used to support applications such as distributed server clustering (server-to-server protocols could be directed across the OSN rather than in-band). OSNs might also be useful for out-of-band management applications; this would, for example, enable the network to be monitored and configured even if the main backbone is offline.