7.2 Key server technologies for Exchange Server 2003


Over time, server technologies have evolved significantly. When I entered the industry in the early 1980s, there was no such thing as a PC server. My first “server” was desktop machine (Intel 80286 CPU) with extra memory and a single “big” (80 MB) disk drive. Since then the capabilities of servers have grown at dizzying rate. At around that same time, Intel’s Gordon Moore made his now-famous statement that processing power (actually transistor count—which implied processing power) would double every 18 months. From a CPU perspective, that is exactly what has happened ( storage capacity has grown even faster), although it is interesting to note that storage capacity to cost ratios have increased faster than Moore’s law would predict. In this section, I will spend a brief moment touching on some important server technology innovations that are important to Exchange deployments. I do not plan to delve deeply into any of these technologies since that is beyond the scope of this book. My focus is on pointing key server technologies that make a difference in the building of reliable systems. A final note here is that I will not venture too far down the path of performance and scalability. A colleague of mine, Pierre Bijaoui, has written an excellent book entitled Scaling Microsoft Exchange 2000 , which covers this topic in great detail.

7.2.1 How server architecture impacts abilities

Server architecture may be a bit of an overused term. Here I mean it simply to refer to processors and buses—the guts of any server. The core components of any server architecture are an assembly of interconnecting buses and devices. Server architecture is very important to the reliability of your Exchange server in that pretty much every piece of data that flows in and out of Exchange must travel across this assembly of buses and devices. If any part of this architecture breaks or slows down, the reliability and service level of your Exchange server is directly impacted. Many would argue that server architecture (illustrated in Figure 7.1) is largely commoditized, and I would have to agree somewhat. However, let’s take a look at the key components of the server architecture and briefly discuss how they can impact mission-critical Exchange servers.

click to expand
Figure 7.1: Typical simplified server architecture.

Processor and host bus technology

For processor technology in the Microsoft Windows space, there is only one game in town—Intel. Several years back, Digital Equipment Corporation

(DEC, later Compaq, now HP) did offer the Alpha processor for the Windows platform, but that battle was soon lost, and Intel has emerged as the singular player (Motorola’s PowerPC and the MIPS R4000 processors were also supported at one time as well). Largely due to Intel’s virtual monopoly in the PC server space, the company has also reached out into the server architecture space and, for all intents, owns and has commoditized server architectures as well. This does not mean server architectures have no room left for innovation, but it does mean that it is more difficult to achieve innovation and market differentiation. Most of the innovation now comes in the area we care most about—redundancy and fault resiliency. Other innovations like Intel’s hyper-threading (www.intel.com/technology/hyperthread/) focus on increased performance and can have a significant impact on your Exchange server. For a look at Intel’s core server processor and host bus offerings, see www.intel.com/products/server/processors. Table 7.1 provides an overview of the different Intel processors and associated architectures.

Table 7.1: Intel Processor/Architecture Information (as of June 2003)

Pentium III

Xeon

Xeon II

XeonMP

Maximum speed

1.4 GHz

3.0 GHz

3.06 GHz

2.0 GHz

Technology

Intel P6

Intel NetBurst with hyper-threading

Intel NetBurst with hyper-threading

Intel NetBurst with hyper-threading

Fab process

0.13 Micron

0.13 Micron

0.13 Micron

0.13 Micron

Host bus speed

133 MHz

400 MHz

533 MHz

400 MHz

Chipset

ServerWorks

Intel E7500

Intel E7501

Intel E7500

ServerWorks

Memory technology

PC133 DDR200/266

DDR200

DDR200/266

DDR200

Level 1 cache size

16 KB

Execution trace cache

Execution trace cache

Execution trace cache

Level 2 cache size

512 KB

512KB

512 KB

512 KB

Level 3 cache size

N/A

N/A

N/A

1 MB 2 MB

I/O bus support

PCI 2.2

PCI-X/133

PCI-X

PCI-X

PCI-X

Memory protection technologies

ECC

Thermal protection

ECC

ChipKill scrubbing

ECC

ChipKill scrubbing

ECC

ChipKill scrubbing

SMP support

2P

2P

2P

32P

Source: www.intel.com.

As you can see from Table 7.1, Intel controls almost every aspect of the performance and reliability of your server architecture; most server vendors use Intel’s support chipsets, which gives them a further degree of control. You could argue that it is Intel’s fault that your hardware and not your server vendor fails (in many cases, you would be right). However, even though Intel provides much of the core architecture, further innovation in the area of processor, host bus, I/O bus, memory performance, and fault resiliency is possible for server hardware OEMs like IBM, HP, and Dell. The degree to which each of these chooses to take advantage of that opportunity often sets them apart. This is particularly true for multiprocessor servers leveraging the XeonMP processors for 4-, 8-, 16-, and 32-way servers. At this high end of the server market, many vendors have invested in further innovation to provide extra performance and reliability features in their high-end server systems. An example of this is HP’s off-line processor technology, which allows a system to recover from a failed processor by rebooting and mapping out the bad processor to continue system operation. Fortunately (or unfortunately for server vendors), over time, many of these innovations end up being commoditized in the base Intel chipsets rendering the differentiation achieved by vendors a short-term win at best. However, if you are looking for any edge against downtime, perhaps many of these features are worth the investment.

Memory and memory bus technology

As you can see from the Intel server processor family shown in Table 7.1, Double Data Rate (DDR) SDRAM memory has become the current industry standard for servers. DDR SDRAM is similar in function to regular SDRAM, but doubles the bandwidth of the memory by transferring data twice per cycle—on both the rising and falling edges of the clock signal. The clock signal transitions from 0 to 1 and back to 0 each cycle; the first is called the “rising edge” and the second the falling edge. Normally only one of these is used to trigger a data transfer; with DDR SDRAM both are used. The only variation then comes in the speed of the DDR memory (Intel chipsets support 133-, 200-, and 266-MHz DDR SDRAM). This certainly matters for Exchange server performance, but if DDR SDRAM is your only choice for servers, how does this impact reliability? The good news is that several server hardware vendors have invested heavily in building memory subsystems into their servers.

Businesses are becoming more dependent on industry-standard servers to run memory-intensive and mission-critical applications. This trend is driving operating systems to support more memory and pushing the memory capacity of servers to new levels. System memory has become more reliable over the years because of better manufacturing processes and memory protection technologies like ECC, first introduced in industry-standard servers by HP.

However, as the density of memory components and server memory capacity increase to meet the demand, there is a higher probability of memory errors occurring. Figure 7.2 illustrates this assertion well and is taken from IBM’s ChipKill Memory white paper (January 1999). The effectiveness of ECC memory protection decreases as memory capacity rises. The IBM research (see www-3.ibm.com/pc/support/site.wss/MCGN-46AMQP.html) indicates that server outages for a 1-year period increases from 3% (1-GB RAM) to 48% (16-GB RAM) as memory capacity increase. For a 3-year period, server outages due to memory failures for servers with 1-GB ECC memory are actually higher than those for servers with 32-MB parity memory! The reason is that the number of 64-MB DRAM modules required to supply 1 GB of memory has increased to the point that there is a greater potential for a memory chip failure than there is for a single-bit parity error when using 32 MB of parity memory.

click to expand
Figure 7.2: Server outages due to memory failures.

Several vendors have further improved on ECC memory such as HP (actually a Compaq patent), which supports what the vendor refers to as advanced ECC. While standard ECC memory corrects for only single-bit memory errors (where any single bit in the data word, usually 64 bits in total, is lost or in error), advanced ECC utilizes extra correction bits distributed across multiple ECC devices to allow for multibit error correction (where more than 1 bit is in error). Advanced ECC can even correct for an entire DRAM chip failure, provided the memory architecture is properly designed. Table 7.2 compares the differences between traditional and advanced ECC memory-protection schemes. More and more, server hardware vendors are putting advanced ECC as standard into their midrange and high-end server products. Basic ECC seems to have been relegated to the entry level for server products from most manufacturers (Intel commoditization of servers wins again).

Table 7.2: Comparing ECC with Advanced ECC Memory Protection

Error Condition

ECC Protection

Advanced ECC Protection

Single-bit error

Correct

Correct

Double-bit error

Detect

Correct or detect (depends on location in word)

DRAM failure

Detect

Correct

ECC circuitry detection fault

None

None

Of course, ECC memory cannot take care of every memory failure event and server vendors realize this fact. To deal with this issue, many have resorted to advanced protection technologies built on standard industry memory technologies and leveraging proven innovations in availability and fault tolerance. The technologies include on-line spare memory, mirrored memory, and hot plug RAID memory.

  • On-line spare memory: Enables you to configure a server with a dedicated portion of main memory (thus reducing the total available to the system) as an on-line spare portion. This is configured on a permemory-bank basis and allows one memory bank to be designated as the spare while the remaining banks function as standard ECC system memory. In the event of a significant memory failure in one of the primary memory banks (many servers let you specify up to half of the system memory banks as spares), which is known when the number of memory errors reaches a certain threshold, the contents of the failing memory bank are transferred to the on-line spare bank and server operation continues (depending on the severity and extent of memory failure) without an outage.

  • Mirrored Memory: Provides a higher availability option (that is also more costly, of course). When a server is configured with mirrored memory, data is written to redundant memory banks (yes, like mirrored disks). If a memory failure occurs to the primary memory set, data can be read seamlessly from the mirrored memory bank of memory modules. Because most of these server configurations support hot plug replacement, no downtime is required to replace the bad memory modules. Combined with ECC memory, mirrored memory can correct for every type of memory error except the unlikely event where an error occurs in both the primary and mirrored memory banks.

  • RAID5 Memory: Like its grandfather (disk RAID5), this configuration is the ultimate in memory protection for servers. It allows a memory subsystem continuous operation even in the event of a memory device failure. A common configuration is a five-bank design that essentially provides a similar scheme to disk-based RAID5 ( discussed later in this chapter). Four production memory banks contain the main system memory, while a fifth bank stores parity information (in actuality, more like disk RAID4). Data written to memory locations is run through XOR calculations in a custom ASIC chip and parity information in generated for each data word and stored in the parity memory bank. In the event that virtually any memory error occurs, data can be recreated on the fly by using the remaining data and parity information stored in the memory array. What makes this configuration even more compelling is that hot plug replacement allows for the failed memory to be replaced and the lost data recreated on the fly.

Memory protection schemes will continually be evolved through the investment of server vendors and Intel. If you would like more information on the above memory protection technologies, please see www.intel.com, www.hp.com, and www.ibm.com, as well as other sources, for detailed information on server memory protection technologies and how various vendors implement them in their server products. Regardless of the server vendor you favor (assuming you have followed my earlier advice on server vendor selection), you will most likely be able to take advantage of one or all of these technologies (in addition to several that Intel has commoditized into the processor chipsets).

I/O bus technology

For Exchange Server, I do not often hear discussion or debates about which I/O bus is best. However, processor and host bus speeds continue to increase, and I/O buses must continue to improve in order to keep up with them. In addition, as more data travels at higher speeds over I/O buses, additional protection for that data is also required. Most servers today employ as many as three or four I/O buses due to the fact that various lowand high-bandwidth I/O devices must pump data in and out of a server. As the speeds of I/O devices such as network interface cards (NICs), disk controllers, and host bus adapters (for SAN attachment) increases, I/O bus speeds need to keep up as well so that they are not saturated and provide hungry processors and system memory with the steady stream of data they require. Table 7.3 provides a look at the evolution of I/O bus technologies.

Table 7.3: I/O Bus Technology and Bandwidth

I/O Bus Technology

I/O Speed and Bus Width

Bus Protection

ISA

6 MHz/16 bit

None

EISA

10 MHz/32 bit

Parity

MicroChannel (MCA)

16 MHz/32 bit

Parity

PCI

33 MHz/32 or 64 bit

Advanced Parity

PCI-2

66 MHz/32 or 64 bit

Advanced Parity

PCI-X

100-133 MHz/32 or 64 bit

Advanced Parity

Exchange administrators do not often care or spend much time on I/O bus technology for their Exchange servers, especially since you often do not have a choice when buying from a particular vendor. However, it is worth your while to investigate the technology available and to plan your servers with performance and reliability in mind for the I/O bus. Above all, ensure that you are designing and configuring your servers for optimal usage of the I/O bus. If you are concerned about I/O bus or device reliability, invest in key technologies like hot plug PCI that enable you to replace PCI boards while the server is operational. Again, consult your favorite server vendor to determine what is offered in specific product models.

7.2.2 Other server redundancy systems

Today’s servers from most vendors also contain many other protection measures to ensure optimal reliability. These ancillary systems mainly cover everything from power supplies to cooling. For cooling, many servers are available with redundant cooling fans, thermal sensors, and hot plug fan replacement. For power, there are a plethora of UPS vendors, available and server vendors tout such features as redundant power supplies, RAID-like power supply and module configurations, and self-healing power supplies that take server power availability to a whole new level. Finally, redundant I/O devices are also becoming popular with features such as NIC Teaming, where two NICs are paired together in a lock-step redundancy mode enabling transparent fail over for the server’s network connection—without clustering or network load balancing. These ancillary redundancy areas are where many vendors can truly differentiate their server products and fight the constant wave of commoditization from Intel. If you are experiencing pain in these various ancillary areas, first identify the root cause. If that cannot be alleviated, invest in these technologies to ensure your Exchange server can stand up in the event of a failure in one of these areas.

There are other minor factors that should influence your server choice or design. These have to do with how easy a server is to manage and deploy. Microsoft and server vendors have several initiatives underway (such as HP and Intel’s Adaptive Infrastructure initiative—www.hp.com/large/globalsolutions/ai) that seek to improve in the areas of rapid deployment, provisioning, and manageability through key technologies and best practices. Many would argue that the server hardware choice or technology employment is irrelevant to Exchange deployments since a high degree of commoditization has already occurred. I won’t argue this point further than to point out that most mission-critical Exchange deployments I know are not built with commodity pieces and parts, but on server platforms from top-tier vendors that meet the criteria I discussed earlier. The main factor affecting Exchange Server availability is which of these server technologies you decide to employ. If you are not seeing a high incidence of server failures in a particular area, investment in this or another similar area or technologies may not be justified. Remember my golden rule of server technology usage: If it doesn’t hurt, don’t spend money making it feel better!




Mission-Critical Microsoft Exchange 2003. Designing and Building Reliable Exchange Servers
Mission-Critical Microsoft Exchange 2003: Designing and Building Reliable Exchange Servers (HP Technologies)
ISBN: 155558294X
EAN: 2147483647
Year: 2003
Pages: 91
Authors: Jerry Cochran

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net