Cluster Membership Concepts

What is arbitration? Why is it necessary? When and how is it carried out? To answer these questions, it is necessary to explain a number of clustering concepts that are central to the processes of cluster formation and re-formation. These concepts are membership, quorum, split-brain, and tie-breaking.

Membership

A cluster is a networked collection of nodes. The key to success in controlling the location of applications in the cluster and ensuring there is no inappropriate duplication is maintaining a well-defined cluster node list. When the cluster starts up, all the nodes communicate and build this membership list, a copy of which is in the memory of every node. The list is validated continuously as the cluster runs; this is done by means of heartbeat messages that are transmitted among all the nodes. As nodes enter and leave the cluster, the list is changed in memory. Changes in membership can result from an operator’s issuing a command to run or halt a node, or from system events that cause a node to halt, reboot, or crash. Some of these events are routine, and some may be unexpected. There are frequent cases in cluster operation when cluster membership is changing and when the cluster software must determine which node in the cluster should run an application.

How does the cluster software tell where an application should run? In a running cluster, when one system cannot communicate with the others for a significant amount of time, there can be several possible reasons:

The node has crashed.
The node is experiencing a kernel hang, and processing has stopped.
The cluster is partitioned because of a network problem. Either all the network cards connecting the node to the rest of the cluster have failed, or all the cables connecting the cards to the network have failed, or there has been a failure of the network itself.

It is often impossible for the cluster manager software to distinguish (1) from (2) and (3), and therein lies a problem, because in case (1), it is safe to restart the application on another node in the cluster, but in (2) and (3), it is not safe.

When the cluster is part of a disaster tolerant solution that has nodes located in more than one data center, loss of communication can easily happen unless redundant networking is implemented with different routing for the redundant links.

In all the above cases, the loss of heartbeat communication with other nodes in the cluster causes the re-formation protocol to be carried out. This means that nodes attempt to communicate with one another to rebuild the membership list. In case (1) above, the running nodes choose a coordinator and re-form the cluster with one less node. But in case (3), there are two sets of running nodes, and the nodes in each set attempt to communicate with the other nodes in the same set to rebuild the membership list. The result is that the two sets of nodes build different lists for membership in the new cluster. Now, if both sets of nodes were allowed to re-form the cluster, there would be two instances of the same cluster running in two locations. In this situation, the same application could start up in two different places and modify data inappropriately. This is an example of data corruption.

How does Serviceguard handle cases like the above partitioning of the cluster? The process is called arbitration. In the Serviceguard user’s manual, the process is known as tie-breaking, because it is a means to decide on a definitive cluster membership when different competing groups of cluster nodes are independently trying to re-form a cluster.

At cluster startup time, nodes join the cluster, and a tally of the cluster membership is created and maintained in memory on all cluster nodes. Occasionally, changes in membership occur. For example, when the administrator halts a node, the node leaves the cluster, and the cluster membership data in memory is changed accordingly.

When a node crashes, the other nodes become aware of this by the fact that no cluster heartbeat is received from that node after the expected interval. Thus, the transmission and receipt of heartbeat messages is essential for keeping the membership data continuously up-to-date. Why is this membership data important? In Serviceguard, a basic package, containing an application and its data, can only be allowed to run on one node at a time. Therefore, the cluster needs to know what nodes are running in order to tell whether it is appropriate or not to start a package, and where the packages should be started. A package should not be started if it is already running; it should be started on an alternate node if the primary node is down; and so forth.

Quorum

Cluster re-formation takes place when there is some change in the cluster membership. In general, the algorithm for cluster re-formation requires the new cluster to achieve a cluster quorum of a strict majority (that is, more than 50%) of the nodes previously running. If both halves (exactly 50%) of a previously running cluster were allowed to re-form, there would be a split-brain situation in which two instances of the same cluster were running.

Split-Brain

How could a split-brain situation arise? Suppose a two-node cluster experiences the loss of all network connections between the nodes. This means that cluster heartbeat ceases. Each node will then try to re-form the cluster separately. If this were allowed to occur, it would have the potential to run the same application in two different locations and to corrupt application data. In a split-brain scenario, different incarnations of an application could end up simultaneously accessing the same disks. One incarnation might well be initiating recovery activity while the other is modifying the state of the disks. Serviceguard’s quorum requirement is designed to prevent a split-brain situation.

How likely is a split-brain situation? Partly, the answer to this depends on the types of intra-node communication the cluster is using: some types are more robust than others. For example, the use of the older coaxial cable technology makes communication loss a significant problem. In that technology, the loss of termination would frequently result in the loss of an entire LAN. On the other hand, the use of redundant groups of current Ethernet hubs makes the loss of communication between nodes extremely unlikely, but it is still possible. In general, with mission-critical data, it is worth the cost to eliminate even small risks associated with split-brain scenarios.

A split-brain situation is more likely to occur in a two-node cluster than in a larger local cluster that splits into two even-sized sub-groups. Split-brain is also more likely to occur in a disaster-tolerant cluster where separate groups of nodes are located in different data centers.

Tie-Breaking

Tie-breaking (arbitration) is only required when a failure could result in two equal-sized subsets of cluster nodes each trying to re-form the cluster at the same time. These competitors are essentially tied in the contest for the cluster’s identity. The tie-breaker selects a winner, and the other nodes leave the cluster.Tie-breaking is done using several techniques in Serviceguard clusters:

Through a cluster lock disk, which must be accessed during the arbitration process. This can be used with clusters of up to 4 nodes in size. This type of arbitration is available only on HP-UX systems.
Through a cluster lock LUN, a variant of the lock disk, for clusters of up to 4 nodes.
Through arbitrator nodes, which provide tie-breaking when an entire site fails, as in a disaster scenario. Arbitrator nodes are cluster members located in a separate data center whose main function is to increase the cluster size so that an equal partition of nodes is unlikely between production data centers.
Through a quorum server, for clusters of any size or type. Quorum services are provided by a quorum server process running on a machine outside of the cluster. A single quorum server running on either HP-UX or Linux can manage multiple HP-UX and Linux Serviceguard clusters.

Cluster Membership Concepts

Technical documentation

» Table of Contents

» Index

Membership

Quorum

Split-Brain

Tie-Breaking