7.4 Storage technologies overview


7.4.1 Small computer systems interface (SCSI)

When vendors introduced the first RAID controllers into the PC server marketplace in 1989, their drive technology of choice initially was IDE. At the time, IDE was the best choice for several reasons. In the preceding section, I pointed out that the objective of RAID technology was to increase performance and reliability, but reduce cost. At the time, the SCSI standards were emerging and becoming solidified (the SCSI-1 specification was available). SCSI drives for the PC server market still did not provide a good balance between performance, reliability, and cost. Supplies of SCSI drives were not ramped up either. IDE drives, on the other hand, were in good supply due to their popular usage in desktop computers. As a result, vendors initially chose IDE technology for the first RAID controllers available. SCSI technology did not stay behind for long, however. Drive manufacturers and customers quickly began to pressure vendors to switch to SCSI-based RAID technology. This resulted in IDE being relegated to the desktop and portable computer market where we now see IDE (and evolutions of IDE such as Ultra DMA or Ultra ATA). As drive manufacturers quickly ramped up production and customers and vendors began demanding SCSI drives, SCSI technology quickly surpassed IDE, and SCSI-based RAID technology controllers began to emerge from the leading server vendors and third parties. Overall, SCSI provides many benefits over IDE, but these could not be realized until the early 1990s when vendors like Compaq and IBM first began to transition their RAID controllers to SCSI technology.

SCSI technology, in my opinion, has evolved as a specification rather than as a standard. The difference is that a standard requires complete compatibility and conformance, whereas a specification is more of a menu of technology implementation options. Initially, one of the problems with SCSI technology was the lack of compatibility between different vendors’ SCSI products. A drive from vendor A would not work with a controller from vendor B. This was mainly due to the fact that, when the SCSI specifications emerged, vendors simply selected which features they wanted to include with their products. In my experience as a network manager and IT guy back then, the fact that SCSI products were not interoperable lead to a slow adoption of SCSI products. Once vendors began to see the impact on their differing implementations of the SCSI specifications, they made attempts to design a bit more commonality into their products. Since an in-depth discussion of SCSI technology is really beyond the scope of this book, I provide Table 7.9 as an overview of the SCSI technology evolution over the last 15-plus years.

Table 7.9: The Evolution of SCSI Specifications

SCSI Specification

Throughput (Mbps)

Bus Width (bits)

Single-Ended

Maximum Bus Length (meters)
Differential

Low Voltage Differential (LVD)

Maximum Number of Devices

SCSI-1

5

8

6

25

N/A

8

SCSI-2 (fast SCSI)

10

8

3

25

N/A

8

Fast wide SCSI

20

16

3

25

N/A

16

Ultra SCSI

20

8

3

25

N/A

4 or 8

Wide ultra SCSI

40

16

3

25

N/A

4, 8, or 16

Ultra2 SCSI

40

8

N/A

25

12

8

Wide ultra2 SCSI

80

16

N/A

25

12

16

Ultra160 SCSI

160

16

N/A

25

12

16

Ultra320 SCSI

320

16

N/A

25

12

16

Serial-attached SCSI

320

N/A

N/A

N/A

N/A

4,032

Source: SCSI Trade Association.

The main benefit of SCSI over IDE technology is performance and design flexibility. SCSI technology is essentially a network protocol for disks. SCSI devices are attached to a channel where they can communicate with the host controller independently of one another. This allows for overlapped or interleaved processing of commands and I/O requests between the controller and all devices on the SCSI channel. For applications like Exchange Server and other database applications, this provides maximum I/ O performance for the small 2–8-KB I/Os that are common for these types of applications. In a RAID environment, SCSI technology also allows vendors more design flexibilities than IDE. While initially SCSI did not compete with other drive technologies in terms of reliability and cost, SCSI technology now measures up well. Most SCSI drives on the market today have MTBF ratings of 800,000 hours or more—well beyond what other technologies are touting. With regard to cost, you only need look in your local newspaper or on-line computer shopping Web site to see how cost effective SCSI technology has become. SCSI vendors are in the midst of a rush to yet another enhancement to the SCSI specification—serial-attached SCSI (SAS). This time the goal is a standard that leverages the history and experience with SCSI, but gets beyond the attachment limits of traditional parallel attached SCSI. Unlike parallel SCSI, serial-attached SCSI promises an end to the parallel-attached SCSI connection limit of 16 devices per bus and increases this to 4,032 devices per connection scheme. In addition, serial-attached SCSI also plans to offer transfer speeds up to 3 Gbps— rivaling current FC technologies. While I want to avoid arguments of SCSI versus IDE or Fibre Channel, I think that the current state of technology and the market will demonstrate that all have a place in the enterprise.

7.4.2 Fibre Channel

As SCSI vendors reach for higher data rates in SCSI devices, Fibre (yes, for differentiation, “Fibre” is spelled this way by the ANSI Standard) Channel has the promise of features that SCSI will not be able to provide. With Exchange 5.5, we see average information store sizes in the 20–50GB range. With Exchange 2000/2003, we have seen total server storage greatly increased due to enhanced storage allocation techniques and flexibility. As Exchange information stores grow, I am afraid SCSI will soon fall short as a large-scale Exchange Server storage technology. This will be proven on three fronts—performance, capacity, and manageability. While SCSI technology adequately provides for Exchange information stores less than 100 GB, it will be FC that meets the demands of large-scale Exchange 2000/2003 deployments of the future. Most major computer and storage vendors have selected FC as the next step for enterprise storage. Many of these vendors are providing FC as a standard disk interfaces already. In the near future, storage vendors will not be taken seriously unless they offer FC–based systems.

FC, although discussed in this chapter as a storage technology, is really designed to be much more than that. FC is really a network architecture that provides features for connectivity, increased distance, and even protocol multiplexing. While FC works as a channel technology like SCSI, it also was designed to work as a transport mechanism—something that SCSI cannot provide. In using FC as a storage interface, vendors are seeking to eliminate the distance, bandwidth, scalability, and reliability issues that have always plagued SCSI technology. FC is a true integration of both channel and network technologies that supports active and intelligent interconnection between all connected devices. Unlike SCSI, FC is actually an ANSI standard from Technical Committee T11. It is an I/O interface standard (X.3230-1994) that supports storage, video, networks, and even avionics interconnection. Table 7.10 provides a comparison of SCSI and FC technology. Note that while FC is intended to use fiber-optic technology as the medium, copper topology implementations are available from some vendors.

Table 7.10: Comparison of SCSI Versus FC As a Storage Technology

SCSI

FC

Market segment

Low to mid-range server

High-end server

Cost

$$

$$$

Maximum transfer rate

320 MB/s (3 Gb/s – SAS)

2 Gb/s

Devices

Disk, tape, optical

Disk, controller, other

Devices per port

128 (4,032 future)

128 (4,096 future)

Distance

25 m

20 km

Arbitrated loop versus switched Fibre Channel

Another important differentiation about FC storage is that it comes in two flavors: Fibre Channel Switched (FCS) and Fibre Channel Arbitrated Loop (FCAL). Initially, there were only FCAL implementations available from most vendors because FCAL’s simplicity made its time to market a little faster. Now most vendors are either offering both technologies or phasing out FCAL in favor of FCS products. The difference between FCAL and FCS is very much the same as the difference between a switched and a bus-based LAN architecture. In an FCAL-based storage implementation (Figure 7.4), all FC devices share, and contend for, a common bus for communication and data-transfer purposes. Therefore, the channel is shared by all devices is a similar fashion to a SCSI bus. In addition, FCAL is limited in distance to 30 meters, making it not well suited as a backbone or SAN technology.

click to expand
Figure 7.4: FCAL-based storage attachment.

FCS, on the other hand, is a switched implementation in which each port on an FC switch provides full bandwidth to devices attached to that port. FCS systems can also be meshed into an FC “fabric” architecture, as illustrated by Figure 7.5. This configuration is well suited for the SAN scenarios that I will discuss in the next section.

click to expand
Figure 7.5: FCS-based storage attachment.

Storage area networks

In the past, we have managed our data storage on a server-by-server basis. This host-based mentality has placed many limits on the manageability of that storage. We have traditionally married all storage decisions to the server. When configuring an Exchange server, we select how much storage we will attach to that server. The application (in this case, Exchange Server) is then optimally configured to use the storage provided by its host server. Another problem we face is reliably protecting our data. RAID technology does offer some protection against the loss of physical disk drives. Well planned and executed disaster-recovery policies and procedures can provide additional protection. However, human errors, software errors, data corruption, catastrophes, and other losses still are a fact of life in most organizations. While all of the techniques we employ can aid in providing mission-critical services, they sometimes fall short because they are expensive, complex, or cumbersome to manage and implement. One major problem area for Exchange is the time it takes to restore data. This time continues to increase as the volume of data outpaces the technology available to back up and restore that data. Storage area networks (SANs) were conceived with these issues in mind. The idea behind a SAN is to provide storage to the organization in a manner similar to the way a power or telephone company provides services—as a utility. Raw storage capacity can be added to the network and then virtualized for use by servers attached to the SAN. From the application point of view, storage should be a transparent service. Underlying a SAN can be many different technologies such as SCSI, FC, or even ATM or gigabit Ethernet. With storage as a utility, SANs move the focus from host-based storage to storage as a centralized and unified enterprise data store.

The three characteristics of SAN technology that are especially applicable-to our Exchange Server deployments are scalability, manageability, and multiple host attachment. From a scalability point of view, SAN technologies will take us to the hundreds of gigabytes or even several terabytes to which we would eventually like to scale storage on our Exchange servers. When discussing scalability here, I am referring to capacity scalability. SCSI-based drive and array technology has served the industry well.

However, in order to build large storage pools for our Exchange servers, simple SCSI arrays will not suffice in the long term. FC-attached, SAN-based storage will be the key to large multiterabyte storage pools. Furthermore, these storage pools will also contain directly SAN-attached devices capable of providing disaster recovery for these large pools of storage. The scalability characteristics of SANs will provide the next step in storage technology that is required for Exchange Server consolidation projects as well. Scalability, in my mind, also includes the ability to access storage over greater distances. With SANs, a server can access storage many kilometers away as if it were locally attached to the server itself.

In terms of manageability, SAN technology offers many options that will benefit Exchange deployments. Storage manageability gives system managers the flexibility to plan, allocate, move, and expand storage within the context of the SAN. For example, if one server attached to the SAN is running low on storage capacity, that server can be allocated more storage from the overall pool. If a server no longer requires an increment of storage, it can be reallocated to other hosts within the SAN. In addition, if the overall storage pool in the SAN has reached capacity, more physical drives can be added to the SAN storage pool dynamically. The SAN storage pool can also be managed in terms of performance. A system manager may even choose to provide different services levels within the SAN. These service levels may be performance- or reliability-based and may be based on the level of each that an application or client device requires.

The cornerstone of this great management flexibility in SAN technology lies in the concept of storage virtualization. With virtualization, the underlying physical devices are not as important and become simple building blocks. Groups of physical disks are combined into arrays, and these arrays become logical drives defined to host devices attached to the SAN interface. Virtualized storage allows for disks of differing capacities, performance characteristics, and data-protection levels (RAID) to be created from physical disk resources. Most SAN products on the market offer intelligent front ends to aid the system manager in this process. The virtualization of storage within the SAN allows storage units to be dynamically allocated and assigned to any host attached to the SAN. The management functionality of SANs includes the following:

  • Dynamic and transparent storage pool allocation

  • Redistribution/reallocation of capacity

  • SAN-attached backup and advanced disaster-recovery techniques

  • Improved monitoring for capacity planning, performance, and accounting

Multiple host attachment is an important SAN characteristic that makes this possible. SAN technology is really a paradigm shift from traditional PC-server-based storage. Traditionally, we have looked at storage on a server as that which is attached to a controller (array or nonarray) residing in the server itself. In many cases, storage was also physically located in the server cabinet or at least in an external cabinet nearby. With SAN technology, the array controller moves out of the box to the storage cabinet itself, instead of the server, as illustrated in Figure 7.6.

click to expand
Figure 7.6: The paradigm shift from host-based storage to networkbased storage.

This shift to external SAN/controller-based storage is a key concept that has not completely caught on in our industry yet. However, this concept is important to our ability to take advantage of the capabilities that SAN technologies offer. We have always looked at storage in terms of what is on the server itself. Now storage can be seen as a resource that the server can take advantage of. By moving storage management out of the server, we can truly begin to treat storage as a utility for our Exchange deployments. In fact, I look for the day when storage is managed as a totally separate IT resource from servers. Imagine application server managers who simply request allocations from an enterprise storage pool in which to store Exchange Server data. SAN technology and the detailed discussions around it are much more complex and extensive than I have the time and expertise to discuss. In addition, an entire book on the subject of SAN technology could certainly be written. In fact, there are several good resources available on SAN technology (a favorite of mine is Tom Clark’s Designing Storage Area Networks). As an augment to this discussion for Exchange, take some time to do some in-depth research on SAN technologies. Also spend some time with your storage vendor of choice and identify product features that you would find useful for your Exchange deployments. For the purpose of this chapter, I will stick to the specific storage-technology capabilities that SANs can offer to mission-critical Exchange deployments.

7.4.3 Advanced storage functionality

As Exchange information stores hold increasing amounts of data, it will become increasingly difficult to back up, restore, and maintain that data using traditional methods. Beyond the ability to use SAN-based storage as a utility, the capabilities of storage virtualization, and the shift from host based to SAN-based storage are the advanced capabilities of SAN technology to aid in issues such as disaster recovery and high availability. This involves functionality that SANs or other advanced storage technologies offer that provide additional tools beyond the traditional tape-based backup and restore capabilities we are used to. The key technologies that I believe will have the most impact on Exchange deployments are these:

  • Data-replication technologies

  • Snapshot and cloning technologies

  • Network storage technologies

Data replication and mirroring technologies

Data replication and mirroring technologies can provide an additional degree of data protection beyond what is available via traditional techniques such as RAID. With replication and mirroring techniques, mission-critical data is copied to an alternate storage set used for disaster-recovery purposes. In the event that a primary storage set is lost or unavailable, the replicated storage can be used to keep the application up and running. A data-replication scenario for Exchange Server is illustrated in Figure 7.7.

With data replication, there are basically two techniques: synchronous and asynchronous. These two techniques impact how data is copied ( mirrored) between a primary dataset and a backup copy set. Synchronous replication forces an update operation to be complete to the remote or backup copy set before the I/O is returned as complete to the application. With asynchronous replication, I/O completion is not dependent upon a successful operation to the backup set. Data replication technology can be categorized into two flavors: hardware based and software based. Hardware-based data-replication products are usually made available by vendors of SAN products as an extension in functionality. Examples of hardware-based data replication products include HP’s StorageWorks Data Replication Manager (DRM) and EMC’s Synchronous Remote Data Facility (SRDF). Hardware-based data replication is certainly a superior technology to Software based solutions, but lock you into a particular vendor’s storage technology. This is due to the fact the hardware-based implementations have their functionality embedded in the storage controller and are usually transparent to the operating system. Hardware-based data-replication products also usually take advantage of other advanced technologies, such as FC and ATM, as potential methods for extending the distance of data replication. Hardware-based solutions often rely on a software element to manage the replication activity. However, the software usually runs within the confines of a real-time operating system on the controller or the storage enclosure. The main factor in hardware-based solutions is that the functionality is implemented mostly in hardware and is not managed by and transparent to the operating system. Software-based solutions, on the other hand, usually rely on filter drivers. Filter drivers act as a “shim” layer between the operating system disk services layer and the device drivers. Filter drivers typically intercept I/Os destined for the disk subsystem and redirect them or perform some manipulation depending on the functionality configured.

click to expand
Figure 7.7: Using data replication with Exchange Server.

Software based data-replication products utilize filter drivers to perform the replication operations. As the software filter driver sees disk requests from the operating system, it can redirect them to meet the needs of the replication activity. For example, if the Exchange database engine sends an I/O request to write 4 KB, this request will proceed down to the disk services layer and be intercepted by the filter driver. The filter driver will then perform a data replication operation and attempt to ensure that the write occurred at both the primary data location as well as the replicated data location (see Figure 7.8 for an illustration). Although, most vendors of these solutions will tell you that they are completely safe, there is inherently more risk in a software implementation because the operating system must carry the overhead and management of this functionality and because there are more opportunities for things to go wrong (with a hardware-based implementation, the activities would occur at a controller level and be transparent to Windows Server and Exchange Server).

click to expand
Figure 7.8: Implementation of software-based data-replication technology

Also, as a software component installed with the operating system, software-based implementations are subject to interaction and potential conflicts with other software components and their issues. As one would expect, software-based solutions to data replication tend to be less expensive than hardware-based solutions.

One additional variant or implementation of data-replication technology is the concept of a “stretched” cluster. With a stretched cluster, cluster nodes are physically separated by large distances (supposedly unlimited with ATM), and cluster and application data is replicated. When a failure scenario occurs, the remote cluster node can fail over the application and continue client service. When investigating and potentially selecting data-replication technology for your Exchange deployment, the choice usually comes down to a cost/benefit analysis weighed against the risks of either implementation. To give you a head start, Table 7.11 provides a summarized look at some popular hardware- and software-based data-replication products.

Table 7.11: Data-Replication Technology Vendor Survey

Vendor/Product

Implementation

Remarks

HP StorageWorks Data

Replication Manager

(DRM)

Hardware-based solution that requires some operating system support. Supports both FC and ATM connection.

Supported on high-end HP SAN enclosure products. No specific Exchange Server support.

EMC Synchronous Remote Data Facility (SRDF)

Hardware-based solution that requires some operating system support. Supports both FC and ATM connection.

No specific Exchange Server support.

Veritas Volume Replicator and Storage Replicator

Software-based implementation in Windows filter driver. Uses either network or locally attached storage.

No specific Exchange Server support.

NSI DoubleTake

Software-based implementation in Windows filter driver. Uses either network or locally attached storage.

Excellent WAN and cluster support. No specific Exchange Server support.

Marathon Technologies Endurance

Hybrid hardware/ software solution that goes beyond data replication to full server image mirroring. Uses proprietary hardware interconnect.

Specific Exchange Server focus and support, but not supported by Microsoft.

Exchange and data replication

For an application like Exchange Server using data-replication technology, care must be taken to design a data-replication scenario that does not create problems for Exchange’s ESE when writing transactions to the database files or transaction logs. To illustrate my point, suppose that Exchange performed a write operation to storage. With data replication configured, that write would need to be replicated to both the local dataset and the remote copy set. In a hardware-based implementation, this would occur transparently to the operating system and Exchange. In the case of a software implementation, the operating system would have a filter driver performing the replication activity, but would otherwise have no exposure to the data-replication function. The problem comes with transactional integrity. Since Exchange’s database engine is performing transacted, oriented operations, there is no exposure of the transactional states and atomic units that Exchange uses to the data-replication function. The data-replication activity just occurs with Windows I/O operations to disk and is none the wiser as to whether these are simple I/O operations or transactional units. The real issue comes when the write that Exchange issues to the storage does not complete to both sides of the replication set or incurs a timing penalty to perform the operation. Since data-replication products may support asynchronous or synchronous modes of operation, the real key is whether they support synchronous replication (asynchronous provides no guarantee of successful simultaneous completion). This means that when Exchange Server performs a write operation (regardless of whether it is to the transaction logs or the database files), there must be assurance that the operation completed at both the local dataset as well as the remote replicated set. Synchronous replication ensures this since the I/O completion is not returned until the operation has completed at both locations (local and remote). Unfortunately, if the operation takes too much time, Exchange’s ESE will fail the transaction and return an I/O error. This may be most likely in a scenario in which the connection for replication is via ATM or another medium (such as long distances over FC) that could be subject to latency or timing issues.

The idea behind using this technology with Exchange is that you could “mirror” or replicate your complete Exchange dataset to an alternate storage location that could be used to restore a downed Exchange server to operation if the primary location were lost or corrupted. Since you could replicate both the transaction logs and the Exchange database files, the complete transaction contents would be available to recover Exchange Server. In the future, as functionality such as log shipping emerges for applications like Exchange, data replication will become even more powerful due to application support from a transactional perspective.

From a Microsoft PSS and Exchange development perspective, I would not expect official support of this technology with Exchange Server, since none of the products guarantee transactional integrity. If you have problems, Microsoft will make a best-effort attempt to help you, but will probably point you in the direction of the vendor from which you purchased the technology. That being said, I believe this technology can be successfully implemented in your Exchange deployments and have seen several organizations do so. My recommendation is the following. First, I would only deploy data-replication technology for Exchange that supports synchronous replication operations. Don’t take chances—ensure that your I/Os are at both the initiator and target locations before completion. Next, ensure that you thoroughly plan, test, and pilot this technology in your Exchange deployment before putting it into production. There are many nuances, configuration choices, and implementation options that you will need to understand. Also, don’t just test the replication—test the recovery. If you are deploying data replication over a wide area using ATM or another network technology, be sure you understand the impact bandwidth and latency problems can have on Exchange Server data availability and data integrity. Finally, I would replicate your entire Exchange dataset ( transaction logs and database files). This could be done on a per-storage-group basis or on a per-database basis. This, combined with a synchronous operation, ensures that your Exchange data (uncommitted transaction logs and committed database files) is identical at both replication sets and can be recovered by the Exchange database engine. For some, data-replication technology may be a little “bleeding-edge.” I am not advocating data replication, but simply presenting it as another option that is complementary to your existing technologies and practices as you endeavor to build mission-critical Exchange deployments.

7.4.4 Cloning and snapshot technology

Also available to Exchange implementers and system managers are storage cloning and snapshot technologies. These technologies, while discussed together here, provide similar but different functionalities. Rather than use the terms clone and snapshot (the later of which is actually an owned trademark of StorageTek), I would prefer to use the terms BCVClone and BCVSnap, since they are not owned by anyone, and I can avoid hurting anyone’s feelings. First is the term Business Continuance Volume (BCV). The idea behind a BCV is a stand-alone data volume that can be used for recovery of an application for the purpose of continuing a business operation or function. The two variants of BCV that help us understand the use of this technology for Exchange across several products on the market is BCVClone and BCVSnap. For clarification, let’s approach them separately and then unite the concepts later when we discuss their use with Exchange Server.

BCVClone

As the name implies, a clone is an exact copy of an existing dataset or volume. For vendors of BCVClone technology, implementations may vary slightly, but the idea and functionality are basically the same. To use our discussions on RAID from earlier in this chapter as a basis, a BCVClone is simply an additional RAID0+1 mirror set member. For example, suppose your current SAN-based storage configuration (this technology usually only accompanies SAN product solutions) is a RAID0+1 array for your Exchange database files. This provides a high degree of performance and data protection. Now suppose that, in addition to the two-member mirror set (RAID 0+1), you added a third member to your existing mirror set. Upon creation, data would be mirrored or normalized to that third member. The process of normalization is the most expensive part of using BCVClone technology since it is a complete mirroring of the production volume. Once normalized, the additional member would continue to stay in synch as an additional mirror of your production volume. At this point, this could be classified as a BCV or, more appropriately, a BCVClone. The BCVClone could either continue as a mirror set member or could be split off, or broken, from the existing mirror set. This broken-off member could be used for a variety of purposes including off-line backup or as a readily available business continuance option (shown in Figure 7.9). Using the BCVClone for backup purposes would help offload the primary storage array and provide a quicker and more efficient means of backup. The BCVClone could also be used as a quick recovery option in the event that the production copy of the data becomes corrupt. Additionally, the BCVClone could be merged back into the production dataset as well. Most implementations of BCVClone technology have limits on the number of clones that can be created (a function of the controller and cache size). You could theoretically break off BCVClones at several points during your production periods as an additional means of disaster recovery and high availability for Exchange. Once BCVClones are no longer needed or become updated, they can be deleted or even joined back into the mirror set and renormalized to become a mirror set member once again.

click to expand
Figure 7.9: An illustration of BCVClone.

BCVSnap

Like the BCVClone, the BCVSnap functions as a point-in-time copy of production data. However, a BCVSnap is not a mirror set member at any point. A BCVSnap is a business continuance volume that has been created from the production data, but the creation process is much different. The BCVSnap (snapshot) creates a virtual metadata map of the volume blocks on the production volume. As the name implies, it is a point-in-time picture of the volume blocks when the BCVSnap was created. Once a BCVSnap is created (which takes only a few seconds in contrast to minutes for a BCVClone), the production volume becomes a different animal. The original blocks of data comprising the BCVSnap continue to stay intact for the life of the BCVSnap. As the data blocks on the production volume change (i.e., an Exchange database engine transaction commits to the database files), these changed blocks are copied out to a new location allocated from the storage pool, and the production volume map is updated to reflect the copied-out data block. As more and more data on the production volume changes, more blocks are allocated from the pool and are copied out. Meanwhile, the BCVSnap continues to provide a map to the original set of production volume blocks that represent the point in time at which the BCVSnap was created. Thus, the production data volume is a combination of the unchanged blocks in the BCVSnap volume map and the copied-out data blocks resulting from changes to the production data as illustrated in Figure 7.10.

click to expand
Figure 7.10: Illustration of BCVSnap.

Like its close brother the BCVClone, a BCVSnap can be used for a variety-of purposes that contribute to enhanced disaster recovery and highavailability capabilities and options for the system manager. Unlike a BCVClone, the BCVSnap incurs performance overhead because changed data in the production volume causes more copy-out operations to occur and overhead increases as the production volume spans both unchanged blocks in the BCVSnap and changed blocks in the copied out allocations in the storage pool. Whereas the BCVClone incurs its cost upfront when normalization is initiated and when the clone is split off, the BCVSnap’s overhead is dependent upon the degree and intensity of changed data copy-out operations. Both BCVClone and BCVSnap technology can be a useful addition or complement (not a replacement) to your existing disaster-recovery techniques for Exchange. This technology also comes in both hardware and software implementations. As with data replication, I prefer hardware-based solutions since the software-based choices are also implemented using Windows filter driver technology. In the case of BCVSnap, you can see the potential issues that could be created with a software-based implementation. Table 7.12 provides a comparison of BCVClone and BCVSnap technology to aid in your assessment of this technology for your own deployment.

Table 7.12: A Comparison of BCVClone Versus BCVSnap Technology

Functionality

BCVSnap

BCVClone

Persistence

Typically short-lived as the overhead to manage a BCVSnap increases with time and usage

Longer-term. Once BCVClone has been created, no additional overhead is required

RAID support

All RAID levels

RAID0+1 (10) only.

Capacity overhead

Equal to the size of the source volume

Equal to two times the size of the source volume since a minimum of a three-member mirror set is required.

Initialization/ normalization

Instantaneous, minimal time required for controller to create metablock map

Normalization time is dependent on the source volume. BCV break-off is virtually instantaneous.

Volume restore

Must copy BCV to another volume and revert to that volume

Present BCV as original volume.

Performance impact

Source volume and

copy-out snapshot data are both accessed during both read and write I/O operations

Source volume and

BCVClone are accessed as independent volumes both read and write I/O operations.

Using BCVClone/BCVSnap technology with Exchange

When using this technology with Exchange Server 2000, as in the case of data replication, several caveats exist. The first issue arises when creating the clone or snapshot. When Exchange is running and services are on-line, the database files are open and do not represent the exact transactional state of the database. Remember from earlier discussions that the consistent state of the Exchange databases includes not only the database files, but the transaction logs as well. The creation of a clone or snapshot of the Exchange database volume does not have exposure to this. Furthermore, all the implementations of either BCVClone or BCVSnap products require varying degrees of time for the snapshot or clone to be created. The snapshot will require less time and the clone will require more. Remember that the cloning or normalization will depend on the amount of data and the performance capabilities of the system. To make matters worse, during the time the BCV is being created, Exchange cannot access the disk subsystem. Because of this, Exchange services (in the case of Exchange 5.5) will need to be shut down for the BCV to be created. For Exchange 2000 and later versions (without VSS support), only the databases need to be dismounted, although you will have to dismount all of the databases in a storage group if you want to capture the SG’s log files cleanly. Of the vendors that provide this technology, all have done some level of integration work for Microsoft Exchange Server. For Exchange Server 5.5, only HP, Network Appliance, and EMC have developed solutions that provide for the shutdown services and management of the transaction log files. Typically, only the Exchange database volume is the target of a clone or snapshot, and the transaction logs are managed separately. Table 7.13 provides a reference listing of Microsoft support articles on the subject of using BCVs with Exchange Server 5.5 or Exchange 2000 Server.

Table 7.13: Microsoft Support Knowledge Base Article References for Using BCVs with Exchange 5.5/2000

Knowledge Base “Q” Article

Subject

Q311898

Hot Split Snapshot Backups of Exchange

Q296787

Offline Backup and Restoration Procedures for Exchange Server 5.5

Q296788

Offline Backup and Restoration Procedures for Exchange 2000 Server

Q237767

Understanding Offline and Snapshot Backups

If you are interested in using BCV technology with Exchange Server 2003 with Windows 2003 VSS, this subject was covered in detail in Chapter 4. Storage vendors who provide BCV (snapshot and clone) technology in their hardware need merely leverage the Windows Server 2003 VSS framework and develop a VSS hardware provider. If you have VSS-aware components (Windows Server 2003, Exchange Server 2003, a VSS hardware provider, and a VSS requestor application) included in your disaster-recovery solution for Exchange, a Microsoft-supported BCV solution is within your grasp.

7.4.5 Network-attached storage (NAS)

Another interesting technology that you may want to investigate for use in your Exchange deployments is network-attached storage (NAS), also known as network disk technology. As the name implies, a network disk is a virtual device made available over either a shared or dedicated network connection. Essentially, vendors or network disk products have implemented a disk block protocol over UDP or other networking protocol such as NFS, CIFS, or NCP. Systems that utilize the network disks do so via client software that allows them to see the network service as a disk device. Vendors of this technology for Windows typically either have a specialized proprietary device (i.e., network appliance) or use a host-based filter driver that provides other features such as virtualization and snapshots combined with a network disk capability (such as HP’s Virtual Replicator or Veritas Volume Manager). The use of this technology with Exchange may not be that obvious. The first option is the utilization of network disks as primary storage for Exchange information stores, as shown in Figure 7.11.

In this scenario, an Exchange server runs the network disk client software or redirector and mounts network disk resources over the network for use as storage for the Exchange databases. The network disk software is transparent to applications like Exchange (because it is implemented as a filter driver) and makes the network disks appear to the system as a local disk resource (an oversimplification, but accurate). With network-disk-based storage, storage resources can be centrally managed as a utility similar to the SAN-based approach discussed earlier. In my first look at this technology, its viability was somewhat suspect. This was due to the fact that the capabilities of the disk subsystem are key to Exchange’s performance. The ability of the disk subsystem to deliver the I/O requirements that Exchange demands is more important than any other server subsystem (CPU, memory, and so forth) in achieving maximum system performance. With NAS, the I/O requirements of Exchange must be delivered over the network and via several layers of software. For many Exchange servers, it is just not possible to deliver the required performance using network disk technology. This one reason that Microsoft does not officially support NAS for Exchange. Nevertheless, many organizations have still chosen to deploy network disks anyway.

click to expand
Figure 7.11: Using NAS with Exchange server.

If you are an organization that has made a decision to use NAS with your Exchange servers, please understand that Microsoft offers strict and limited support for this storage scenario. In general, Microsoft will always tell customers that it prefers direct-attached or SAN-based storage due to several characteristics of physically attached storage that Exchange Server requires. Microsoft is primarily concerned with performance and reliability. The bottom line is that Microsoft will support block-mode NAS devices that are HCL-certified. All others get no support.

NAS performance

Exchange Server, like other enterprise messaging systems, can add an extremely large load on the disk I/O subsystem. In most large database programs, physical I/O configuration and tuning play a significant role in overall system performance. There are three major I/O performance factors to consider:

  1. I/O bandwidth: The aggregate bandwidth, typically measured in megabytes per second that can be sustained to a database device

  2. I/O latency: The latency, typically measured in milliseconds, between a request for I/O by the database system and the point at which the I/O request completes

  3. CPU utilization: The host CPU cost, typically measured in CPU microseconds, for the database system to complete a single I/O

Any of the these factors can become a bottleneck, and they must all be considered when designing an I/O system for a database application like Exchange Server. If disk I/O is processed through the client network stack, the I/O is subject to the bandwidth limitations of the network itself. Even when you have enough overall bandwidth, you may have issues of greater latency and increased processing demands on the CPU, as compared with locally attached storage.

NAS reliability

Exchange Server uses a transaction log and associated recovery logic to make sure that there is database consistency if a system failure or an unmanaged shutdown occurs. When the ESE writes to its transaction logs, ESE must depend on the return of a successful completion code from the operating system as a guarantee that the data has been secured to disk, not just to a volatile cache that will be lost if there is a system failure. In addition, the limits of recoverability are determined by the ability of the disk system to make sure that data written to the disk is stored and retrieved reliably. Microsoft recommends that you use disk systems that can detect imminent failures and salvage or relocate affected data when you use Exchange 2000/2003.

The specific concerns over performance and reliability have led Microsoft to establish some stringent guidelines for supporting NAS solutions with Exchange Server. The key to these requirements lies in Exchange Server’s dependence on a “block-mode” storage device (typically only physically attached storage meets this criterion). In the past, most NAS devices used a “file-mode” access method and therefore were not supported by Microsoft for use with Exchange Server. In recent times, however, many NAS vendors have begun to develop block-mode support for access to their NAS solutions. Block-mode storage devices (whether DAS, NAS, or SAN) that have received a “Designed for Windows” logo through submission to Windows Hardware Quality Labs (WHQL), as storage/Raid controllers or atorage/RAID systems have been shown to meet the requirements for block storage for the Windows platform and are therefore the most suitable storage devices for use with Exchange Server. Table 7.14 provides a list of Microsoft Knowledge Base articles related to utilizing NAS solutions with Exchange Server.

Table 7.14: Microsoft Knowledge Base NAS Reference Articles

“Q” Article

Title/Description

Q317172

Exchange Server 5.5 and Network-Attached Storage

Q317173

Exchange 2000 Server and Network-Attached Storage

Q314916

Issues That Might Occur If You Place Exchange Data Files on Network Shares

Q314917

Understanding and Analyzing –1018, –1019, and –1022 Exchange Database Errors

In the not-to-distant future, when DAS, SAN, and NAS begin to converge and new technologies such as iSCSI become available, many of the issues and concerns that Microsoft has regarding the use of NAS solutions with Exchange Server will disappear. In addition, as Microsoft further develops core support for NAS technology within the operating system (for example, making all NAS support utilize block-mode access), most vendor solutions will be fully supported by Microsoft. In my interactions with many Exchange deployments, I have often seen NAS solutions deployed and customers of these solutions completely satisfied to use NAS with Exchange.

7.4.6 Storage designs for Exchange 2000/2003

After a look at the various technologies and solutions available for Exchange storage, it is important that we drill down to an application and best-practices level before concluding this chapter. In this final section, I would like to take the preceding discussions on storage technology and functionality as a foundation on which we can overlay Exchange 2000/2003–specific storage designs. These storage designs for Exchange should be viewed from two perspectives—performance and reliability/disaster recovery. Since Exchange 2000/2003 now allows for some fairly complex storage designs by supporting multiple database engine instances and databases, it would be impossible to discuss the entire matrix of possibilities. To keep the discussion focused, let’s discuss some guidelines for leveraging storage technology and advanced functionality to maximize performance and reliability for Exchange deployments.

Performance 101: Separate random from sequential I/O

In previous versions of Exchange Server, this was an easy rule to follow. Since I/O to the transaction log files is purely sequential in nature (unless recovery is in progress), we can provide a dedicated volume for the transaction log files. Since I/O activity to the database files (only PRIV.EDB and PUB.EDB) in previous versions of Exchange was very random in nature, we could simply allocate a separate array for the database files as well. Now, in Exchange 2000/2003, the same rule still applies. However, it is much more complex to implement. This is due to the innovation of multiple storage groups and databases. Technically, Exchange 2000 was originally designed to support up to 15 configured storage groups (plus a 16th for recovery), each with a maximum of five databases. Because of the overhead required for each additional storage group, Microsoft will only support a maximum of four storage groups for Exchange 2000 and 2003. I don’t expect these numbers to change within the confines of the current Exchange Server 2000/2003 code base. However, in the next generation of Exchange Server (codenamed Kodiak), which will rely on SQL Server’s Yukon release, I would expect the storage configuration of Exchange Server to further increase and previous limits imposed by a 32-bit virtual memory space and other factors to become a thing of the past.

Since, in effect, previous versions of Exchange Server supported only one storage group with two databases, allocating and designing server storage was easier. Figure 7.12 illustrates how this sequential versus random I/O rule may be applied to an Exchange 2000/2003 server with four storage groups.

In Figure 7.12, each of the four storage groups has one database configured. A two-drive RAID1 volume is configured for each set of transaction logs (one set per storage group), and the transaction logs are placed on that volume. For the database files, a separate RAID1 volume is also configured for each of the four databases. Remember that Exchange 2000/2003 database files are now a pair of database files—the properties store (*.EDB) and the streaming store (*.STM). If the streaming store files will see high activity (i.e., as in the case of Internet Protocol clients), an additional performance-enhancing step may be taken by placing the STM files on yet another physical array since they have a large random I/O profile. There also may be difficult questions to answer about controller cache. For example, which volumes should use the cache and how should the cache be configured? In most cases, I believe we will see the same answers to these questions as in earlier versions of Exchange Server. When configuring the controller cache, choose the setting of 100% write-back (or the largest write-back ratio allowed) in order to maximize performance for expensive write activity and to help cope with the additional overhead that RAID configurations like RAID5 create. Next, configure all volumes to take advantage of the write-back caching provided on the controller (don’t forget to ensure that the controller cache is protected). Both transaction log volumes and database volumes will benefit from write-back caching. Finally, I would recommend RAID0+1 for both transaction log volumes as well as database file volumes. RAID0+1 provides much better overall performance and is not subject to the intense write penalty of RAID5. However, if the best price/performance solution is sought, you can deploy RAID5 for the database files. For the transaction log files, however, stick with RAID1 or RAID0+1 to ensure maximum performance and protection.

click to expand
Figure 7.12: Optimizing performance: separation of random and sequential I/O.

Using advanced storage technology to increase Exchange reliability

When I discussed storage technology options and advanced capabilities earlier, I pointed to several key features that are of particular usefulness when deploying Exchange Server. Specifically, the most useful technologies are multiple-host attachment, BCVs, and data replication. What is most important from these discussions is not any one individual technology by itself, but the combination and integration of all these technologies for your specific performance, reliability, and disaster-recovery needs. Since every Exchange deployment is different, there are a variety of performance and reliability service levels that Exchange system managers must provide. The great opportunity afforded by all of the storage technologies available is that system managers can pick and choose from them based on their individual needs and risk assessments. In addition, these technologies can be combined with others, like clustering, to provide additional functionality. Using clustering as an example, the features of a clustered Exchange scenario could be combined with a technology like data replication to achieve what is known as the stretched cluster or geocluster. In another example, since clustering requires a shared storage mechanism such as a SAN, the advanced features of SANs, such as BCVs, can be used in conjunction with clustering to add reliability and recovery options for an Exchange server.

Obviously, I speak of these configuration options and choices as if cost were no issue. Bear in mind that overall cost of configurations that are 100% RAID0+1 and the allocation of an individual array for every database and transaction log set can be quite expensive. In real-world practice, I expect the trade-offs will first be made in the area of RAID. Not everyone can afford to purchase a 100-GB RAID0+1 array for every Exchange server. In the event that RAID0+1 is cost prohibitive, make sure you understand the performance and fault-tolerance trade-offs associated with RAID5 before selecting it as your best option. Another cost trade-off will be strict adherence to the sequential versus random I/O rule. Must every sequential activity be separated from every random activity? In an ideal scenario, we could allocate an individual array for every transaction log set and database file. However, in practice, I expect that most configurations will combine the transaction logs from multiple storage groups onto one array. Based on data from previous versions of Exchange, I would conclude that this is not a huge performance factor. You will also be faced with manageability trade-offs in your storage design. For example, you may want to achieve maximum performance, but certain choices about the provisioning or disaster recovery considerations (such as using Windows VSS) may force you to configured your Exchange Server storage in a less than optimal fashion from a performance point of view. As with the design of any mission-critical system, there will be performance, cost, and manageability trade-offs involved when designing storage for Exchange 2000/2003 servers.

What is most important about all of the storage technologies we have discussed in this chapter is that there is no single solution for every Exchange deployment. System managers must arm themselves with the knowledge of what technologies are available. They must also understand if and how these technologies can be leveraged for their Exchange deployments. Furthermore, technology implementations and their integration with Exchange will vary by vendor. Each solution will need extensive testing and piloting before being put into production. Put into proper perspective and deployed in a conservative fashion, I believe that the tremendous capabilities available in today’s storage technology can be one of your most valuable assets in building mission-critical Exchange deployments.




Mission-Critical Microsoft Exchange 2003. Designing and Building Reliable Exchange Servers
Mission-Critical Microsoft Exchange 2003: Designing and Building Reliable Exchange Servers (HP Technologies)
ISBN: 155558294X
EAN: 2147483647
Year: 2003
Pages: 91
Authors: Jerry Cochran

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net