17.6 Configuring the Connection Manager


17.6 Configuring the Connection Manager

When a system that has been configured to run in a cluster is booted, it will perform the following steps when the Connection Manager is configured.

  • Initialize data structures including the cluster membership list.

  • Create the main CNX transaction thread ("csb event thread").

    • Set a wakeup call for 10 seconds. Try to form a cluster when timeout is reached.

  • Create the remote communication threads ("wait for csb"). There are currently 7 threads.

  • Create the grim reaper thread ("cnx grim reaper").

  • Create the CNX pinger thread ("cnx pinger").

  • If a quorum disk is configured, start the quorum disk thread ("qdisk tick delay").

As you can see, at this point the CNX has several threads running and theoretically doing things simultaneously. However, certain things (more or less) need to occur in a sequential order.

17.6.1 The Node Announcement

The CNX pinger thread repeatedly sends out an announcement message until the system becomes a cluster member. Once the system becomes a cluster member the CNX pinger thread kills itself.

Essentially what happens here is the CNX pinger thread broadcasts an announcement message across the cluster interconnect using the ICS BOOT channel[5]. If there are any other systems that are up to receive the announcement message, they will respond by exchanging information with the booting system. Included in the node announcement is the cluster name, node name, incarnation, member ID, version, and how it can be reached via ICS.

You may observe a message similar to the following:

 CNX MGR: Node molari id 1 incarn 0x914b6 attempting to form or join cluster babylon5 

Figure 17-5 illustrates the node announcement communication flow.

click to expand
Figure 17-5: CNX Node Announcement

When the announcement message is received by a cluster member (or potential member), the systems establish a communication channel to each other and exchange more information to flesh out each other's CSB information.

Note that each system contains its own list of cluster members (or potential members) that is used to determine what systems it can communicate with and to calculate the expected vote value and ultimately the quorum vote value.

Since we have been asked many times, "What are all those ICS Server threads use for?," we chose to include Figure 17-5 to illustrate the interaction between the various CNX threads and the ICS threads. As you can see, this remote procedure call (RPC) communication between members in a cluster is done using the ICS. The ICS Server threads are kernel threads that are used to send and receive communication to and from many of the cluster subsystems.

For more information on the ICS, see Chapter 18.

17.6.2 Cluster Formation

After ten seconds, if the node has not heard from a cluster, it will attempt to form a cluster.

 CNX MGR: insufficient votes to form cluster: have 1 need 2 

As you can see by the console output, the node initially counts its own vote toward forming the cluster. The system will effectively loop here until enough votes are attained to reach quorum.

If there is a quorum disk configured, then you will see a message similar to the following:

 CNX QDISK: Adding 1 quorum disk vote toward formation. 

In the last two lines of output, you can see that enough votes have been added to attain quorum because the first message stated, "have 1 need 2". In other words, the CNX has determined that it needs 2 votes to reach quorum. When the quorum disk's vote is added, the CNX will have the necessary second vote.

Once enough votes have been added to reach quorum, the cluster will form as shown in the output below.

 CNX MGR: Cluster babylon5 incarnation 0x94a1e has been formed CNX MGR: Founding node id is 1 csid is 0x10001 CNX MGR: membership configuration index: 1 (1 additions, 0 removals) CNX MGR: quorum (re)gained, (re)starting cluster operations. CNX MGR: Node molari 1 incarn 0x94a1e csid 0x10001 has been added to the cluster 

17.6.2.1 Cluster Formation – The Details

You want additional details? Hold on tight.

When a system decides to form a cluster, it does the following:

  • From the list of nodes the system has established communication with, it determines which ones are potential members.

    • If a node in the list is already a cluster member:

      • Verify this system has the same quorum disk (if configured) or panic.

         CNX MGR: quorum disk doesn't match cluster member 
      • Send a "Join Request" – see section 17.6.3.

    • If a node on the list is a potential member, but the quorum disk doesn't match, the cluster will not form and will loop with the following message until the node(s) that don't match are halted.

       CNX MGR: cannot form: existing nodes disagree on quorum disk. 

      This should only occur if a cluster administrator (or someone with super-user access) modified a member's sysconfigtab incorrectly. Administrative changes regarding voting and the quorum disk should only be done with clu_quorum(8) command.

    • If a node on the list is a potential member with a vote:

      • Count up the number of votes from these potential members.

      • Determine the maximum expected votes count.

  • If the quorum disk is configured, make sure it's not in use by another system or cluster.

    • If the quorum disk is not in use, add its vote to the count of potential votes.

    • If the quorum disk is in use, there is likely a communication problem.

       CNX MGR: cannot form: quorum disk is in use. Unable to establish contact with members using disk. 

      Do not try to form.

  • Check to see that there are enough votes to reach quorum.

    • If not, you will see the following message:

       CNX MGR: insufficient votes to form cluster: have 1 need 2 

    Become the "Coordinator" for this CNX transaction.

    Any member can be the coordinator for any particular CNX transaction, but there can be only one coordinator at a time. The coordinator calls the shots during a CNX transaction so that it is done in an orderly (or coordinated) manner.

  • Allocate a CSID for each node that has been selected as a potential member.

  • Describe the nodes to each other (see Figure 17-6) and prepare a connectivity matrix (or topology).

    click to expand
    Figure 17-6: Describe Nodes

    Think of the matrix as a topology bitmap that shows node connectivity. The goal is to determine the best-connected (or fully connected) cluster. This is done to prevent a cluster partition.

    If every member is communicating with every other member, then the matrix will be identical for all members. This is the normal operating environment.

    When there is a communication problem, this matrix may not be the same. The CNX will determine the best-connected cluster from the topology bitmap. Systems not communicating will hang or panic. The systems in the matrix will continue as the cluster.

    Figure 17-7 shows an example of a connectivity matrix with a communication problem. Every member can communicate with the quorum disk, but member3 and member5 cannot communicate with other members. Since the CNX will detect this problem, member3 and member5 will not be allowed to join the cluster.

    click to expand
    Figure 17-7: Connectivity Matrix (aka Topology Bitmap)

  • Determine the best-connected cluster from the connectivity matrix and clear nodes from the connectivity matrix that are not selected to be part of the cluster.

  • Calculate quorum from the maximum expected votes value from the best-connected cluster configuration.

  • If there are enough votes to attain quorum, form the cluster.

    Committing the transaction in three phases forms the cluster. If all nodes in the best-connected cluster list agree to the proposed cluster, then each node will acknowledge the proposal (phase 1), prepare to commit the transaction (phase 2), and commit the transaction (phase 3). The reason for three phases is to make sure that every node agrees and, as an additional failsafe, to detect any communication problems not previously detected (see Figure 17-8).

click to expand
Figure 17-8: Cluster Formation Transaction

17.6.3 Joining a Cluster

As we learned in section 17.6.1, when the CNX is configured one of the first things to happen is that the CNX pinger thread sends an announcement message. If the system has established communication channels with members of an existing cluster, it will send a request to join the cluster instead of trying to form a cluster.

The boot process for a system joining a cluster is pretty much the same as it is for systems forming a cluster except that once the system becomes aware that there is a cluster, it sends a join request to the existing cluster members as was shown in Figure 17-9.

click to expand
Figure 17-9: Send a Join Request

Once the join request is sent, one of the cluster members will temporarily take charge of responding to this request by becoming the "Cluster Coordinator" for the cluster. The job of the coordinator is as follows:

  • Determine whether or not the node requesting to join would cause the cluster to lose quorum. If this is true, then the node cannot be allowed to join the cluster.

     CNX MGR: join request rejected: adding node would cause cluster to lose quorum ... CNX MGR: halting join rejected 20 times 

  • Allocate a CSID for the node and proceed to describe the requestor to each cluster member and describe each cluster member to the requestor (see Figure 17-10). Since each member should already have a CSB allocated for the requestor and the requestor should already have a CSB for each cluster member, this step really involves verification of the CSB and passing along the CSID.

  • Describe the cluster to the new node (see Figure 17-10). This involves passing the requestor the current cluster's information including the incarnation number, the founder, CSIDs of every member, etc.

  • Calculate Quorum based on the requestor's addition to the cluster.

  • Prepare a connectivity matrix (or topology).

  • Initiate a three-phase transaction to add the requestor to the cluster (see Figure 17-11). If all members and the requestor agree to the proposal, then each node will acknowledge the proposal (phase 1), prepare to commit the transaction (phase 2), and commit the transaction (phase 3).

click to expand
Figure 17-10: Describe Member & Cluster

click to expand
Figure 17-11: Member Join Transaction

[5]See Chapter 18 for more information on the ICS subsystem.




TruCluster Server Handbook
TruCluster Server Handbook (HP Technologies)
ISBN: 1555582591
EAN: 2147483647
Year: 2005
Pages: 273

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net