Chapter 3: Designing and Planning a Cluster | TruCluster Server Handbook (HP Technologies)

3.1 Introduction

A cluster can have many benefits over an otherwise non-clustered solution, but there are many configuration options in designing a TruCluster Server cluster solution which have a direct impact on how beneficial it will be over a standalone system. How do you make these configuration choices in order to meet the goals for your solution? This chapter will walk you through these configuration choices beginning with basic high-level options and working towards more detailed levels including:

Model, configuration, and number of member systems
Whether or not to include a Quorum Disk
Type of cluster interconnect
Number and type of external network connections
Use of Hardware and/or Software RAID solutions for storage
Number of storage Host Bus Adaptors and Fabrics

In each of these areas, we will cover the following trade-offs concerning the solution's end goals:

Availability
Performance
Ease of Management and Workload Consolidation

3.1.1 Default Options for Most Solutions

Before addressing each area in detail, it is useful to summarize a best practice or "all other things being equal" configuration that should be your default starting point when evaluating options. Note, these are suggestions and in each case the alternative is a configuration supported by HP for TruCluster Server as of this writing (reference the QuickSpecs for the latest supported information). That being said, "safe" default configuration choices are as follows:

Keep to a moderate size (i.e., something below the currently supported maximum size).

This is recommended for a number of reasons:
- Despite TruCluster Server being the easiest cluster to manage due to its Single System Image, the larger the cluster, the more complex it is to manage, especially when problems occur.
- Choosing the largest possible configuration means choosing a less typical and field proven solution. To pick the largest possible cluster is to pick something that is "not the norm" in the TruCluster Server user community and hence puts you more towards the bleeding edge.
- It is wise to leave room for additional cluster members to be added in the future to accommodate growth in the solution.
Stick with homogeneity of members.

Keep your cluster with members of the same model and computing resources. In heterogeneous clusters, it is harder to manage workloads because some members cannot take as much work or maybe cannot take the same classes of work that are being run in the cluster. This complicates your administration and failover planning. Heterogeneous members can also complicate patching and support as different hardware models may need different patches, drivers, and kernel builds.
Use redundant Memory Channel as the cluster interconnect.

As of this writing, Memory Channel is the older, more established, and better performing interconnect. Unless distance or cost factors are significant, choose Memory Channel.
Use Fibre Channel based storage backed by hardware RAID^[1] with dual-redundant fabrics connected by two Host Bus Adapters (HBAs) on each host.

Your solution is only as available as its data. Give each host at least two HBAs with paths to storage endpoints to ensure multiple paths for failover. Also, implement at least two fabrics that connect all HBAs from the servers to all the storage end points. This is done so that a failure or need to perform maintenance on a fabric (firmware, addition of more switches, etc.) doesn't leave you high and dry. Finally, most enterprise sites will require a storage solution that will include RAID controllers simply to get the necessary Gigabytes or Terabytes of storage in a convenient manageable package.
Use redundant network interface cards (NICs) configured for NetRAIN^[2] to interface with public networks.

On the networks that connect your cluster to client systems, utilize the NetRAIN facility to make sure that your network connections stay up even with the loss of an individual NIC card.

3.1.2 Example: A TruCluster Server for a Biotech Research Department

To illustrate the trade-offs and decision process covered in this chapter, a common example will be used and revisited as each major concept is presented. The example is a hypothetical Biotech Company planning a new TruCluster Server cluster. The cluster is intended for a research department that has a mixture of custom written and off-the-shelf applications. They currently have a collection of individual standalone systems that host various applications and share data using network protocols such as NFS and FTP.

^[1]Redundant Array of Independent (or Inexpensive) Disks.

^[2]Redundant Array of Independent NICs