4.4 Configuring and Verifying the Cluster Interconnect Hardware


4.4 Configuring and Verifying the Cluster Interconnect Hardware

Implementing a cluster requires a dedicated Cluster Interconnect in which all cluster members are connected. The cluster interconnect provides the foundation for a virtual private network that allows communications among all cluster members. This cluster interconnect provides for Internode Communications (via the ICS[12]) between cluster nodes or members. The TruCluster Server software creates this virtual private network within the cluster, and this virtual private network exists side-by-side with the physical communications channel provided by the cluster interconnect.

How does this work? For each member, the cluster software creates a virtual network device for the cluster interconnect. In V5.1A, this device is named ics0. In V5.0A and V5.1, this device was named mc0 as only Memory Channel adapter cards were supported as cluster interconnects. In any event, this device has its own IP name and IP address, which are used when establishing the system's membership in the cluster.

The hardware that may be used for the cluster interconnect can be either the Memory Channel (MC) adapter or an Ethernet local area network (LAN) card but not a mix of both.

The Cluster Interconnect is used to provide the following functions within a cluster:

  • Health and status messaging between cluster members. The Connection Manager uses this information to monitor the cluster members and to coordinate membership. For more information on the Connection Manager subsystem, see Chapter 17.

  • Distributed lock manager (DLM) locking between cluster members. This is used to coordinate access to shared resources. For more detailed information on the DLM, please see Chapter 18.

  • Accessing file systems between cluster members. For storage located on a cluster member's private bus but visible to all members of the cluster, all reads and writes from other cluster members are performed across the cluster interconnect. As Cluster File System uses a client-server model, this is also true for file systems on the shared bus.

  • The cluster interconnect, through the ICSnet, includes a full IP stack. While this usage is not recommended for general-purpose network traffic, nothing can prevent an application from taking advantage of this with the exception of the bandwidth of the interconnect hardware. All non-cluster-related traffic should be kept off the cluster interconnect, and if a private network is needed, one should be added.

  • Cluster alias routing. For more information, please see Chapter 16 on Cluster Alias.

To repeat, as of this writing, there are two approved types of Cluster Interconnect hardware for use with TruCluster Server – the Memory Channel (MC) adapter card and the Ethernet Local Area Network (LAN) card[13]. TruCluster Server supports up to eight members in a cluster regardless of which type of cluster interconnect is used.

In this section we will discuss how to configure and verify these different cluster interconnects, and why you may want to use one interconnect over another.

4.4.1 Memory Channel as a Cluster Interconnect

There are three variants of Memory Channel adapter cards supported: Memory Channel 1, Memory Channel 1.5, and Memory Channel 2. The older Memory Channel 1 adapter card and Memory Channel 1.5 adapter card are collectively referred to as Memory Channel 1 (or MC1). Only the newer Memory Channel 2 (MC2) adapter card is supported on the DS, ES, and GS[14] classes of AlphaServers.

Memory Channel (MC) as a cluster interconnect can be used in either a virtual hub mode configuration or a standard hub mode configuration. A virtual hub mode configuration is only supported on two-node clusters. A standard hub mode configuration may be used for two-node clusters but is required for clusters that have three or more cluster members. A Memory Channel Hub is also a requirement for a cluster in a standard hub mode configuration.

4.4.1.1 Virtual Hub Mode

As stated earlier, using the MC as a cluster interconnect in a virtual hub mode is only supported in two-node cluster configurations. It consists of both cluster members directly connected together via the Memory Channel adapter cards using a MC cable. Each of the Memory Channel adapter cards is physically jumpered for virtual hub mode. The Memory Channel adapter card variant determines the proper jumper settings.

On MC1, there is an adapter jumper J4 that determines whether the configuration is in virtual hub mode or standard hub mode. For virtual hub mode, the Memory Channel adapter jumper J4 on one cluster node is configured to be virtual hub 0 (VH0) and the Memory Channel adapter jumper J4 on the other cluster node is configured to be virtual hub 1 (VH1). Table 4-6 represents the Hub Mode and the Memory Channel adapter jumper J4 pin outs for the MC1 adapter card.

Table 4-6: Memory Channel 1 – Jumper Settings

Memory Channel 1 Jumper Settings

Hub Mode

Jumper Pin Outs

Standard Hub

J4: Pins 1 to 2

Virtual Hub 0:VH0

J4: Pins 2 to 3

Virtual Hub 1: VH1

J4: All open

Unlike the older MC1 adapter card, the MC2 adapter card has six jumpers – J1, J3, J4, J5, J10, and J11 – see Figure 4-4. Setting the MC2 adapter card for virtual hub mode is similar to setting the MC1 adapter card; all you need to do is set the adapter jumper J1.

click to expand
Figure 4-4: Memory Channel 2 – Virtual Hub Configuration

Table 4-7, provides a description of the MC2 adapter jumpers - what each is for and the possible jumper settings.

Table 4-7: Memory Channel 2 – Jumper Settings

Memory Channel 2 Jumper Settings

Jumber

Description

J1: Hub Mode

Standar Hub: Pins 1 to 2

Virtual Hub 0 (VH0):Pins 2 to 3

Virtual Hub 1 (VH1): All open

J3: Window Size

512 MB: Pins 2 to 3

128 MB: Pins 1 to 3

J4: Page Size

8 KB Page Size: Pins 1 to 2 (default for UNIX)

J5: AlphaServer 8x00 mode

8x00 mode slected: Pins 1 to 2

8x00 mode not slected: Pins 2 to 3

J10 and J11: Fiber Optic Mode Enabled

Fiber Off: Pins 1 to 2

Fiber On : Pins 2 to 3

For more information on the other individual Memory Channel 2 jumper settings, please refer to Chapter 5 in the TruCluster Server Cluster Hardware Configuration Guide.

For the purposes of redundancy of the cluster interconnect, dual-rail MC is supported in a virtual hub mode. Dual-rail MC occurs when you have two Memory Channel adapter cards in one cluster node directly connected to another two Memory Channel adapter cards in another cluster node. How would this be configured from a Memory Channel adapter card perspective? You would identically jumper both Memory Channel adapters in cluster member1 as VH0. In the other cluster member, member2, both Memory Channel adapters cards would be jumpered as VH1. Each of the adapters cards with the VH0 setting in member1 would then be cabled to an adapter with the VH1 setting in member2. See Figure 4-5.

click to expand
Figure 4-5: Redundant MC Virtual Hub Configuration

4.4.1.2 Standard Hub Mode

In standard hub mode, Memory Channel adapter cards are again in each cluster member, but instead of being directly connected to a Memory Channel adapter card in another cluster member, each is connected to a Memory Channel Line card in a central Memory Channel hub. All internode communications flows between the Memory Channel adapter cards through the Memory Channel Hub. Please see Figure 4-7 for an illustration.

click to expand
Figure 4-7: Memory Channel 2 – Standard Hub Configuration

Like the Memory Channel adapter cards in a virtual hub mode configuration, the Memory Channel adapter cards in standard hub mode configuration must be properly jumpered. For an MC1 adapter card jumper J4 is set accordingly for standard hub mode - see Table 4-6. For an MC2 adapter card jumper J1 is set for standard hub mode - see Table 4-7.

Dual-rail Memory Channel is also supported in a standard hub mode. How would this work? To implement a dual-rail Memory Channel configuration in a standard hub mode, it would require two Memory Channel adapter cards per cluster member and the use of two separate Memory Channel Hubs. As each primary Memory Channel adapter card is connected to a Memory Channel Line card in the primary Memory Channel Hub, take note of the physical slot of the Memory Channel Line card that is being used. When the secondary Memory Channel adapter card is connected to a Memory Channel Line card in the secondary Memory Channel Hub, it must use the same numbered slot in the secondary Memory Channel Hub as the primary Memory Channel Hub.

Let's illustrate how to implement a dual-rail configuration on a three-node cluster as an example. In the primary Memory Channel Hub, the primary Memory Channel card in cluster member1 is physically connected to the Memory Channel Line card in slot 0. The primary Memory Channel card in cluster member2 is connected to the Memory Channel Line card in slot 1. The primary Memory Channel card in cluster member3 is connected to the Memory Channel Line card in slot 2. This same Memory Channel Hub configuration is replicated for the secondary Memory Channel Hub using the secondary Memory Channel adapter cards in each cluster node. See Figure 4-6.

click to expand
Figure 4-6: Redundant MC Standard Hub Configuration

4.4.1.3 Caveats on Using Memory Channel Adapter Cards

In planning the deployment of a cluster, there are a couple of caveats to keep in mind when using Memory Channel adapter cards. These apply to the use of Memory Channel adapter cards in both virtual and standard hub modes. They are as follows:

  • The MC cable connectors have 100-pins, and extreme care should be taken in connecting or disconnecting the MC cable. Bending or breaking even one pin can be disastrous to the operation of a cluster.

  • MC1 and MC2 can exist in the same cluster but cannot be intermixed on the same Memory Channel rail.

  • Only one MC2 may be jumpered to a 512 MB memory window per PCI bus (reference Table 4-7).

  • Although not specific to either Memory Channel cards or Ethernet LAN cards, ICS names and ICS IP addresses should never be registered with DNS.

4.4.1.4 Testing and Verifying the Memory Channel Adapter Cards

After the Memory Channel hardware is installed and configured, it should be tested to verify that it's working properly before starting the installation of the TruCluster Server software. To do this, there are two different Memory Channel diagnostic commands that are available at the system console: mc_diag and mc_cable.

4.4.1.4.1 mc_diag diagnostic

The main purpose of the mc_diag diagnostic command is to test out the Memory Channel adapter card. In detail, the mc_diag diagnostic command provides the following:

  • Tests all the Memory Channel adapters on the system.

  • This is executed as part of the initialization sequence when the system is powered up.

  • This can be run on a stand-alone cluster member while other clusters are up and available.

Below is an example of the execution of the mc_diag diagnostic command:

 P00>>> mc_diag Testing MC-Adapter(s) Adapter mca0, Passed Adapter mcb0, Passed 

As this server is configured for dual-rail Memory Channel, both Memory Channel adapter cards passed this hardware diagnostic test.

The mc_diag command can output detailed "diagnostic" information by using the "-d" option. See Appendix A for more information.

4.4.1.4.2 mc_cable diagnostic

The mc_cable diagnostic command's main purpose is to provide an end-to-end interconnect data flow check. This verifies that data can flow from one cluster member to all the other cluster members and vice versa. In detail, the following must be considered when using the mc_cablediagnostic command:

  • This command should be run on all systems simultaneously so all cluster members must be down at the system console.

  • This diagnostic is designed to isolate problems among the cables and hardware components that make up the complete Memory Channel interconnect – from the Memory Channel adapter cards, through the Memory Channel cables, to the Memory Channel Line cards, and, to a certain extent, the Memory Channel Hub.

  • Indications of data flow through the Memory Channel interconnect are by response messages.

  • While this diagnostic does not produce error messages, the change in connection state is an indication of the data flow between different points in the Memory Channel interconnect.

  • This diagnostic can be run in either a virtual hub mode or a standard hub mode.

  • Once the mc_cable command is executed on a system, it runs continuously until it is terminated using <CTRL/C>.

Warning

Never execute the mc_cable command on any node of a cluster if a portion of that cluster is still up. This command will crash all members in a running cluster.

The following are examples of the execution of the mc_cable diagnostic command.

  • mc_cable in a virtual hub mode with dual-rail Memory Channel.

    The mc_cable command is executed on System 2. The two Memory Channel cards are online for this system, but there is no response from the MC card in the other system.

     >>> mc_cable To exit MC_CABLE, type <Ctrl/C> mca0 node id 1 is online No response from node 0 on mca0 mcb0 node id 1 is online No response from node 0 on mcb0 

    The mc_cable command is then executed on System 1. The Memory Channel cards on this system are online, and a response is received from the other system that is running the mc_cable command.

     >>> mc_cable To exit MC_CABLE, type <Ctrl/C> mca0 node id 0 is online Response from node 1 on mca0 mcb0 node id 0 is online Response from node 0 on mcb0 

    System 2 receives a response from the mc_cable command running on System 1.

     Response from node 0 on mca0 Response from node 0 on mcb0 

    On System 1, we then issue <CTRL/C> to abort the mc_cable command.

     <CTRL/C> >>> 

    On System 2, we receive a response that there is no further communication with System 1. On System 2, we also issue <CTRL/C> to abort the mc_cable command running there.

     mcb0 is offline mca0 is offline <Ctrl/C> 
  • mc_cable in a standard hub mode with two cluster members.

    The mc_cable command is executed on System 1. Notice that the Memory Channel card is online for this system, but there is no response from any others.

     >>> mc_cable To exit MC_CABLE, type <CTRL/C> mca0 node id 0 is online No Response from node 1 on mca0 No Response from node 2 on mca0 No Response from node 3 on mca0 No Response from node 4 on mca0 No Response from node 5 on mca0 No Response from node 6 on mca0 No Response from node 7 on mca0 

    The mc_cable command is then executed on System 2. The Memory Channel card on this system is online, and a response is received from the other system that is running the mc_cable command.

     P00>>> mc_cable To exit MC_CABLE, type <CTRL/C> mca0 node id 2 is online Response from node 0 on mca0 No Response from node 1 on mca0 No Response from node 3 on mca0 No Response from node 4 on mca0 No Response from node 5 on mca0 No Response from node 6 on mca0 No Response from node 7 on mca0 

    System 1 receives a response from the mc_cable command running on System 2. On System 1, we then issue <CTRL/C> to abort the mc_cable command.

     Response from node 2 on mca0 <CTRL/C> >>> 

    On System 2, we receive a response that there is no further communication with System 1. On System 2, we also issue <CTRL/C> to abort the mc_cable command running there.

     No Response from node 0 on mca0 <CTRL/C> >>> 

    In this example, notice that although we only have two nodes in this cluster, the mc_cable diagnostic command is referring to these nodes as node 0 and node 2. Why? Because of the connections to the slots for the Memory Channel Line cards in the Memory Channel Hub. In this case, slots 0 and 2 are used to connect to the two cluster members. If we had used slots 0 and 1, the nodes would be identified as node 0 and node 1.

4.4.1.5 Memory Channel Information for the Creation of a Cluster

During the creation of a cluster or the addition of a new member to an existing cluster, certain information is required to identify the cluster interconnect. For a cluster using Memory Channel, the IP name and IP address for the virtual cluster interconnect device on each cluster member is needed. This is also known as the ICS name and ICS IP address.

By default, the cluster installation programs, clu_create(8) and clu_add_member(8), offer the IP address on the 10.0.0 subnet for the virtual cluster interconnect with the host portion of the IP address the same as the memberid of the cluster member. The IP name is set to the short host name of the cluster member followed by "-ics0"[15]. Both the IP address and the IP name of virtual cluster interconnect should not be in DNS.

The following shows an example of the cluster interconnect IP names and IP addresses for two members of the babylon5 cluster, molari and sheridan. This information is from the /etc/hosts file of the cluster. This cluster is running on a Memory Channel cluster interconnect:

 10.0.0.1 molari-ics0             # member1's virtual interconnect IP name and address 10.0.0.2 sheridan-ics0           # member2's virtual interconnect IP name and address 

In planning for the implementation of a cluster, determine ahead of time not only what the cluster's Name and IP address will be but also the IP name and address for each of the virtual cluster interconnects. See the Tru64 UNIX/TruCluster Server Planning Worksheet in section 4.8.

4.4.2 Ethernet Local Area Network (LAN) Card as a Cluster Interconnect

With the release of TruCluster Server version 5.1A came support for an additional type of cluster interconnect from the usual Memory Channel adapter card – the Ethernet LAN card. The Ethernet LAN card automatically provides for a lower cost alternative to Memory Channel, but does that mean you should go out and replace your cluster interconnect hardware with Ethernet LAN? It depends. In section 4.4.3, we will review the advantages and disadvantages of both cluster interconnects. For the time being, let's see what it takes to deploy Ethernet LAN as a cluster interconnect.

4.4.2.1 Hardware Requirements for the LAN Interconnect

Any supported Ethernet adapter, switch, or hub that operates in a standard LAN environment at 100 Mb/sec or 1000 Mb/sec should, in theory, work within a LAN cluster interconnect. Fiber Distributed Data Interface (FDDI), ATM LAN Emulation (LANE), and 10 Mb/s Ethernet are not supported in a LAN interconnect. For more detail information, please see Chapter 22.

The following is required of the Ethernet hardware to operate in a LAN cluster interconnect:

  • The LAN interconnect must be a private LAN accessible only to cluster members.

  • Cluster members in the LAN interconnect must all be operating at the same network speed and can be half-duplex or full-duplex for the transmission mode. Half-duplex transmission mode is not recommended for use in a LAN interconnect because it may limit cluster performance.

  • A LAN interconnect can be a single direct half-duplex or full-duplex connection between two cluster members on either switches or hubs but not both.

  • One or more switches or hubs are required for a cluster of three or more members.

  • No more than two switches are allowed between two cluster members.

  • All cluster members must have at least one point-to-point connection to all other cluster members.

  • The Spanning Tree Protocol (STP) must be disabled on all Ethernet switch ports specifically connected to cluster members. STP should be enabled on ports connecting Ethernet switches together for supporting a highly available LAN interconnect configuration.

  • For the LAN interconnect, link aggregation of Ethernet adapters is not supported as of this writing.

4.4.2.2 Basic Hardware Configurations for LAN Interconnect

There are three basic hardware configurations that support a LAN cluster interconnect. In this section, we will go over these three configurations.

  • For a two-node cluster only, a single crossover network cable directly connects one member's Ethernet card to the Ethernet card of the second cluster member. The crossover network cable provides a direct point-to-point Ethernet connection between the two cluster nodes without the necessity of a switch or a hub. Dual-redundant crossover cables between two cluster members is not supported because the method used to enable a redundant configuration is NetRAIN, so each NIC must be connected to the same physical subnet.

  • A single Ethernet hub or switch could be used for a cluster that has two to eight members. In this type of configuration, a single Ethernet hub or switch would be used to connect to the Ethernet LAN cards in each of the two to eight cluster members. See Figure 4-8 for an example.

    click to expand
    Figure 4-8: LAN Cluster Interconnect

    An Ethernet hub operating at half-duplex transmission mode should not be used in this configuration as it would limit the performance of the cluster.

  • An Ethernet LAN cluster interconnect configuration that has the greatest amount of redundancy is one in which you have two switches (with two crossover cables connecting the switches), with two or more Ethernet LAN cards in each member, configured as a NetRAIN virtual interface, but with each LAN card connected to a different switch. This configuration can survive not only the loss of a cluster member or a break in a LAN interconnect connection but also the loss of a switch or a crossover cable.

4.4.2.3 LAN Interconnect Information for the Creation of a Cluster

Before starting the creation and configuration of a cluster using a LAN cluster interconnect, you should have certain basic information about the LAN interconnect. In this section, we will discuss how to have this information ready and available for when it is needed during the creation of the cluster.

To obtain the device name, speed, and transmission mode of the LAN cards on your system, use the following command.

 # hwmgr get attr -cat network -a name -a media_speed -a full_duplex 137:    name = alt0    media_speed = 1000    full_duplex = 1 138:    name = alt1    media_speed = 1000    full_duplex = 1 

From our example, we see that our two LAN cards (devices alt0 and alt1) are operating at 1000 Mb/sec and at full-duplex.

A cluster that is using a LAN interconnect needs the following information:

  • An IP name and IP address for the virtual cluster interconnect device for each cluster member.

  • The clu_create and clu_add_member programs provide, by default, IP addresses on the 10.0.0 subnet for the virtual cluster interconnect. The host portion of the IP address is set to the memberid of the cluster member being configured, and the IP name is the short form of the member's host name followed by "-ics0". Again, as with the Memory Channel configuration, the IP addresses and the IP names for the virtual cluster interconnect should not be in DNS.

  • For the physical LAN interface for each cluster member, an IP name and address is needed on a different subnet from the virtual cluster interconnect.

  • The cluster creation programs also provide defaults for the physical LAN interface. By default, IP addresses on the 10.1.0 subnet are provided with the host portion of the IP address set to the memberid of the cluster member being configured, and the word "member" appended with the member ID and "-icstcp0". The IP addresses and the IP names for the physical LAN interface should not be in DNS.

The following example provides the cluster interconnect IP names and addresses for two members of the clue cluster, mustard and plum, operating on a LAN interconnect. This information is also contained in the /etc/hosts file of the cluster.

 # # member1's cluster interconnect # 10.0.0.1 mustard-ics0               # virtual interface IP name and address 10.1.0.1 member1-icstcp0            # physical interface IP name and address # # member2's cluster interconnect # 10.0.0.2 plum-ics0                  # virtual interface IP name and address 10.1.0.2 member2-icstcp0            # physical interface IP name and address 

As we stated in the section on the Memory Channel interconnect, you should determine ahead of time not only the cluster's Name and the cluster's IP address but also the IP name and IP address for each of the cluster interconnects – both virtual and physical. See the Tru64 UNIX/TruCluster Server Planning Worksheet in section 4.8.

4.4.3 Why Use One Interconnect over Another?

We have had the opportunity to examine both the Memory Channel interconnect and the Ethernet LAN interconnect. We have discussed the hardware requirements for each interconnect, the configuration, and how to obtain interconnect information that is required to create a cluster.

Why use one interconnect over another? Unfortunately, this is not a question we can answer satisfactorily because, well, it depends. It depends on which applications will be run on the cluster, how many users will be on the cluster, how storage will be utilized, and what the cluster's purpose in life is. It depends on many variables, but it does come down to this – the user requirements that are used to do the planning for the deployment of the cluster.

To assist you in selecting which interconnect to use, we provide the following table - Table 4-8. This table compares Memory Channel to LAN Interconnect.

Table 4-8: Memory Channel vs. LAN Interconnect

Memory Channel

LAN Interconnect

Higher cost

  • High bandwidth (100MB/s).

  • Low latency (3μs).

Generally lower cost

  • Medium bandwidth, medium to high latency for 100 Mb/s.

  • High bandwidth, medium to high latency for 1000 Mb/s.

Up to eight members are supported as this is limited by the capacity of the Memory Channel hub.

Up to eight members are supported with the initial release; however, more members may be supported in the future.

The distances supported between members using copper cable is up to 20 meters (65.6 feet) and up to 2000 meters (1.2 miles) with fiber-optic cable in virtual hut mode, and up to 6000 meters (3.7 miles) with fiber-optic cable using a physical hub.

The distances supported between members using LAN interconnect hardware is determined by the length of a network segment and by the capabailities of and options allowed for the

  • Maximum Fast ethernet distance is ~.4Km.

  • Maximum Gigabit Ethernet distance is ~1.6Km (SPOF) or ~1Km (NetRAIN).

Supports the use of the Memory Channel application programming interface (API) library.

Does not support the Memory Channel API library.

Dual-rail redundant Memory Channel Configuration provides for internode communications redundancy.

Internode communications redundancy is achieved by configuring multiple network adapters as a redundant array of independent network adapters (NetRAIN). The virtual interface on each member is accomplished by distributing their connections across multiple switches.[*]

[*]-For more specific information, please review the Chapter on NetRAIN and the TruCluster Server Cluster LAN interconnect guide.

[12]The Internode Communication Subsystem is covered in Chapter 18.

[13]Ethernet Local Area Network card support as a cluster interconnect began in TruCluster Server version 5.1A.

[14]The GS60 and GS140 AlphaServer systems support MC1.

[15]In V5.1A, the name of a member's cluster interconnect virtual device has changed from mc0 to ics0.




TruCluster Server Handbook
TruCluster Server Handbook (HP Technologies)
ISBN: 1555582591
EAN: 2147483647
Year: 2005
Pages: 273

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net