Understanding Novell Cluster Services

Test Objective Covered:

  1. Identify the purpose and advantages of implementing an NCS solution.

Earlier this year, we asked 100 nerds at BrainShare (Novell's annual engineer conference) to define high availability. The myriad of answers we received were as diverse as the stars in the sky. For some, high availability meant 100 percent uptime (virtually impossible), while others believed that a few hours of downtime each month was acceptable (not by most standards). For the most part, though, everyone agreed that high availability was a critical aspect of any productive network. In addition, all of the engineers we surveyed felt that Novell's networks are among the most highly available in the world.

One of the main reasons for Novell's success in the high-availability realm is NCS. NCS 1.6 is Novell's latest evolution of a three-year-old product. NCS 1.6 is multinode, multiprocessor, eDirectory enabled, and optimized for NetWare 6. In fact, the NetWare 6 operating system includes a license for a two-node NCS 1.6 cluster. And because NCS 1.6 has been optimized for NetWare 6, you cannot mix it with prior versions.

In this section, we will perform a slightly more scientific study of high availability terms and definitions. In addition, we will explore the key factors of computer system outages and learn about all the features and benefits of NCS 1.6.

High Availability Terms and Definitions

Before you can build a high-availability solution, you must first understand what it is. The definition of high availability centers on the term service. A service is the very thing that is being made highly available. From a user's perspective, service availability is the purpose of a network. In this case, services include printing, file access, Web services, and e-mail. Of course, network administrators are responsible for the platform of service availability (the server) and because servers serve services, you must make sure that servers don't go down.

High availability is analogous to server availability. So what determines server availability? Availability is the percentage of total system time that the service (and server) is accessible for normal use. Ergo, outage is the enemy of availability. Outage is the loss of a computer service. Finally, these three concepts combine to create two important measurements: uptime and downtime. Uptime is the duration of time the service is functioning, while downtime is the duration of any planned or unplanned outage.

High availability is measured by the amount of time a system (and server) is operational this is known as reliability. Furthermore, reliability is measured by these two metrics:

  • Mean Time Between Failures (MTBF) The average time that a device or system works without failure (usually listed in hours). You can calculate the MTBF by dividing the total number of operating hours by the total number of failures.

  • Mean Time To Recovery (MTTR) The average time that a device takes to recover from a nonterminal failure. MTTR is often part of a maintenance contract where you would pay more for a system with a 24-hour MTTR than for a system with an MTTR of 7 days. The ultimate goal of high availability is an MTTR of zero. This means that the system has integrated fault tolerant components that take over the instant the primary components fail.

With all of this in mind, we can define high availability as 24x7x365 at 100 percent availability of services with zero downtime, high reliability, and an MTTR of zero.

Although you can work toward 100 percent availability 24 hours a day, 7 days a week, 365 days a year, it's practically impossible to achieve because of unforeseen natural and manmade disasters.

If 100 percent availability is not accessible, then what is your goal for high availability? It all depends on your company's accessibility tolerance. For example, a high-availability quotient of "three 9s" (99.9 percent uptime) may be adequate for your employees, customers, and partners. Three 9s high availability equates to 8.7 hours of downtime each year. On the other hand, you may be required to make the high investment necessary to achieve "five 9s" (99.999 percent uptime), which equates to only 5.2 minutes of downtime each year.

To achieve this level of high availability, you will need to recruit help from power vendors, application retailers, and a clustering consultant. Believe us, five 9s high availability does not come cheap. Table 7.1 compares five popular high-availability quotients.

Table 7.1. High-Availability Quotients

HIGH-AVAILABILITY QUOTIENT

UPTIME (PERCENTAGE)

DOWNTIME ) (PER YEAR

Five 9s

99.999

5.2 minutes

Four 9s

99.99

52.5 minutes

Three 9s

99.9

8.7 hours

Two 9s

99.0

87.6 hours

One 9

98.0

175.2 hours

TIP

If you are still motivated to achieve 100 percent availability, you may want to compromise by restricting it to a particular time period. For example, some SLAs (Service Level Agreements) define 100 percent availability as zero downtime between the hours of 6:00 a.m. and 11:00 p.m. This is known as "6 11."


Computer System Outages

Although NCS provides all the software that you need to configure a high-availability clustering solution, many other factors might impact your high-availability quotient. The following factors can cause computer system outages at anytime:

  • Physical Physical faults are hardware failures in your network system. This includes temperature, air quality, and network magnetism. A good rule of thumb is that if you're comfortable in a room, computers are probably comfortable as well. However, certain simple precautions must be taken in the physical environment to protect network components.

  • Design Design errors in both the hardware and software subsystems can cause a network to fail. You should be particularly sensitive to the design of cluster-enabled components.

  • Operations Users can be your network's biggest enemy. Errors caused by operations personnel or users themselves can cause computer systems to fail. In this case, education is the key to high availability. For example, you should proactively educate your users that the CD-ROM tray is not a cup holder and eMail worms are "bad." This will improve the availability of your disk subsystems.

  • Environmental In addition to the physical environment, you may have to be concerned about static electricity (ESD), lightning, electromagnetic interference (EMI), and other power anomalies. As if that's not enough, your external network connections can fail because of natural disasters and so on.

  • Reconfiguration Scheduled maintenance, upgrades, or configuration changes can also bring networks down.

Of course, all of these computer system outage factors are exasperated when they converge on a single point of failure. Try to avoid this disastrous situation by building fault tolerance and redundancy into all your network components.

So how highly available do you want to be? It really depends on what business you're in and how valuable your data is. Table 7.2 shows the thousands (and sometimes millions) of dollars that are lost every time your network fails. In many of these cases, Novell Cluster Services is more than a nice thing to have it's job security.

Table 7.2. What Does High Availability Mean to You?

SERVICE

OUTAGE COST (PER HOUR)

ANNUAL LOSS (AT 99.999 PERCENT UPTIME)

Brokerage

$5.6 7.3 million

$485 633 thousand

Credit Card

$2.2 3.1 million

$191 269 thousand

Pay-per-View

$67 233 thousand

$6 20 thousand

Home TV Shopping

$87 140 thousand

$8 12 thousand

Catalog Sales

$60 120 thousand

$5 10 thousand

Airline Reservations

$67 112 thousand

$6 10 thousand

ATM Fees

$12 17 thousand

$1 2 thousand

Novell Cluster Services Features and Benefits

Novell Cluster Services 1.6 is not a magic pill for 100 percent high availability. It is, however, a major leap toward five 9s. NCS helps you avoid all network outages caused by the NetWare server. In addition, it covers any hardware outages associated with the server's power, internal components, or storage devices. This is accomplished by a zero MTTR failover from one server to another. And, miraculously enough, all files and applications are maintained because both servers share a common disk system.

The most impressive benefits and features provided by NCS 1.6 are

  • Multinode All-Active Cluster (up to 32 nodes) NCS allows you to configure up to 32 NetWare servers (nodes) into a high-availability cluster, where resources can be dynamically switched or moved to any server at anytime. Furthermore, services can be assigned across the cluster to different services. This means that any NetWare server in the cluster can instantly restart resources from a failed server. This helps NCS achieve an MTTR of zero.

  • Multiprocessor and Multithreading Enabled Because NCS 1.6 sits on a NetWare 6 platform, it is both multiprocessor and multithreaded enabled. This means each processor can be maximized to execute commands fasters and more efficiently, providing faster network throughput that delivers 24x7x365 availability.

  • Consolidation of Applications and Operations NCS allows you to tailor a cluster to the specific applications and hardware infrastructure that fit your organization's needs. You can also reduce unplanned and planned outages by offloading services to nonactive nodes. This means that you can reduce the number of servers that you need to provide your services by 50 percent or more.

  • Flexible Resource Management You can configure resources to automatically switch to an active node when a server fails or you can move services manually to troubleshoot hardware or balance the workload. This flexible resource management allows you to optimize the resources you are using to deliver highly available services.

  • Shared Storage Support NCS provides support for shared SCSI devices or for Fiber Channel SANs. In addition, you can achieve shared disk fault tolerance by implementing RAID Level 5.

  • Single Point of Control NCS 1.6 enables you to manage a cluster from a single point of control by using ConsoleOne or NetWare Remote Manager. In fact, the browser-based NetWare Remote Manager enables you to load balance network services across the cluster from a remote location.

  • Cluster Event and State Notification You can configure NCS 1.6 to notify administrators through e-mail when cluster states change. This is a critical component of your high-availability maintenance and notification procedures.

Now that you have gained a greater appreciation for the meaning of high availability and have mastered the fundamentals of NCS, let's learn how to design a highly available system of our own.



Novell's CNE Update to NetWare 6. Study Guide
CNE Update to NetWare 6 Study Guide
ISBN: 0789729792
EAN: 2147483647
Year: 2003
Pages: 128

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net