Virtualization Implementations: Host, Storage Subsystem, and Network


Virtualization is perhaps the most abused buzzword in the storage industry. It is often vaguely defined in trade magazine articles, touted by vendors as the storage cure-all and, not surprisingly, seldom understood by consumers. Even SNIA's published definition is so abstract that it seems not to be about storage at all. That said, SNIA has published two storage virtualization tutorials that dispel some of the vagueness around this topic. A simple, down-to-earth definition of virtualization is the act of abstracting physical resources into logical resources. This definition might not be considered accurate or comprehensive by all who read this book, but it is useful as an introduction to the concept of storage virtualization.

Storage can be virtualized in two ways; in-band or out-of-band. In-band techniques insert a virtualization engine into the data path. Possible insertion points are an HBA, a switch, a specialized server, a storage array port, or a storage array controller. All I/O traffic passes through the in-band virtualization engine. Out-of-band techniques involve proprietary host agents responsible for redirecting initial I/O requests to a metadata/mapping engine, which is not in the data path. If an I/O request is granted, all subsequent I/O traffic associated with that request goes directly from the host to the storage device. There is much religion about which approach is best, but each has its pros and cons. Hybrid solutions are also available. For more information about the various types of virtualization solutions, consult SNIA's storage virtualization tutorials.

You can use many techniques to virtualize storage. An example of a file-oriented technique is to present a portion of a local volume to clients on a network. The clients see an independent volume and do not know it is part of a larger volume. Another file-oriented technique grafts multiple volumes together to present a single volume to local applications. All UNIX and Linux file systems use this technique to create a unified file name space spanning all volumes. Virtualization also applies to block-oriented environments. With striping techniques, the individual blocks that compose a logical volume are striped across multiple physical disks. These are sometimes referred to as stripe sets. Alternately, contiguous blocks can be grouped into disjointed sets. The sets can reside on different physical disks and are called extents when concatenated to create a single logical volume. Extents are written serially so that all blocks in a set are filled before writing to the blocks of another set. Sometimes a combination of striping and extent-based allocation is used to construct a logical volume. Figure 1-10 illustrates the difference between a stripe set and an extent.

Figure 1-10. Stripe Set Versus Extent


The benefits of abstraction are many, but the most compelling include reduced host downtime for operational changes, ease of capacity expansion, the ability to introduce new physical device technologies without concern for operating system or application backward compatibility, a broader choice of migration techniques, advanced backup procedures, and new disaster recovery solutions. There are several ways to virtualize physical storage resources, and three general categories of implementations:

  • Host-based

  • Storage subsystem-based

  • Network-based

Each has its pros and cons, though network-based implementations seem to have more pros than cons for large-scale storage environments. Enterprise-class virtualization products have appeared in the market recently after years of delays. However, one could argue that enterprise-class virtualization has been a reality for more than a decade if the definition includes redundant array of inexpensive disks (RAID) technology, virtual arrays that incorporate sophisticated virtualization functionality beyond RAID, or host-based volume management techniques. One can think of the new generation of enterprise-class virtualization as the culmination and integration of previously separate technologies such as hierarchical storage management (HSM), volume management, disk striping, storage protocol conversion, and so on. As the new generation of virtualization products matures, the ability to seamlessly and transparently integrate Fibre Channel and iSCSI networks, SCSI and ATA storage subsystems, disk media, tape media, and so on, will be realized. Effectiveness is another story. That might rely heavily on policy-based storage management applications. Automated allocation and recovery of switch ports, logical unit numbers (LUN), tape media, and the like will become increasingly important as advanced virtualization techniques become possible.

Host Implementations

Host-based virtualization products have been available for a long time. RAID controllers for internal DAS and just-a-bunch-of-disks (JBOD) external chassis are good examples of hardware-based virtualization. RAID can be implemented without a special controller, but the software that performs the striping calculations often places a noticeable burden on the host CPU. Linux now natively supports advanced virtualization functionality such as striping and extending via its logical volume manager (LVM) utility. Nearly every modern operating system on the market natively supports mirroring, which is a very simplistic form of virtualization. Mirroring involves block duplication on two physical disks (or two sets of physical disks) that appear as one logical disk to applications. The software virtualization market offers many add-on products from non-operating system vendors.

Host-based implementations really shine in homogeneous operating system environments or companies with a small number of hosts. In large, heterogeneous environments, the number of hosts and variety of operating systems increase the complexity and decrease the efficiency of this model. The host-oriented nature of these solutions often requires different software vendors to be used for different operating systems. Although this prevents storage from being virtualized in a consistent manner across the enterprise, storage can be virtualized in a consistent manner across storage vendor boundaries for each operating system.

Large-scale storage consolidation often is seen in enterprise environments, which results in a one-to-many relationship between each storage subsystem and the hosts. Host-based virtualization fails to exploit the centralized storage management opportunity and imposes a distributed approach to storage management activities such as capacity tracking, LUN allocation and configuration, LUN recovery, and usage trending. On a positive note, host-based implementations make it relatively easy to failover or migrate from one storage subsystem to another, although some storage vendors restrict this capability through the imposition of proprietary host-based failover software.

Storage Subsystem Implementations

The current generation of storage subsystems implement virtualization techniques that can be described as RAID on steroids. One or more controllers in the subsystem manage the physical devices within the subsystem and perform the striping, mirroring, and parity calculation functions in RAID configurations. These controllers have processing power and memory capacity on par with the servers that use the subsystem. In some cases, the controller actually is a high-end personal computer (PC) running specialized operating system software optimized to perform storage-related functions. The subsystem controls the presentation (masking and mapping) of LUNs.

One of the benefits of this model is independence from host operating systems, which allows storage to be virtualized in a consistent manner across all host operating systems. However, this benefit might not extend across storage vendor boundaries. The proprietary nature of most of these solutions prevents storage from being virtualized in a consistent manner across the enterprise.

Another benefit is improved centralization of various management tasks as compared to host-based implementations. This is because there are typically far fewer subsystems than there are hosts. Subsystem-based implementations also enable advanced mirroring and data backup techniques that can be completely transparent to the attached hosts and can be completed without any effect on host CPU or memory resources. One drawback of these solutions is the acquisition cost as compared to host-based implementations with JBOD. Perhaps the most significant drawback is the proprietary nature of these solutions, which can be very costly to enterprises over the life of the subsystem. Much of the functionality of subsystem-based implementations is confined within each subsystem even in homogeneous environments. For example, it is typically not possible to stripe a RAID volume across physical devices in separate subsystems from a single vendor. However, recent advances in distributed cluster storage architecture and advanced virtualization techniques give new promise to such possibilities.

Network Implementations

The widespread adoption of switched FC-SANs has enabled a new virtualization model. Implementing virtualization in the network offers some advantages over subsystem and host implementations. Relative independence from proprietary subsystem-based solutions and host operating system requirements enables the storage administrator to virtualize storage consistently across the enterprise. A higher level of management centralization is realized because there are fewer switches than storage subsystems or hosts. Logical disks can span multiple physical disks in separate subsystems. Other benefits derive from the transparency of storage as viewed from the host, which drives heterogeneity in the storage subsystem market and precipitates improved interoperability. However, some storage vendors are reluctant to adapt their proprietary host-based failover mechanisms to these new network implementations, which might impede enterprise adoption.




Storage Networking Protocol Fundamentals
Storage Networking Protocol Fundamentals (Vol 2)
ISBN: 1587051605
EAN: 2147483647
Year: 2007
Pages: 196
Authors: James Long

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net