5.3. Perform a Failover

Being a high-availability cluster, we should test failover of our new resource before moving on.

First, find the node on which the IP address is running.

Shut down Pacemaker and Corosync on that machine.

There are three things to notice about the cluster’s current state. The first is that, as expected, pcmk-1 is now offline. However we can also see that ClusterIP isn’t running anywhere!

5.3.1. Quorum and Two-Node Clusters

This is because the cluster no longer has quorum, as can be seen by the text "partition WITHOUT quorum" in the status output. In order to reduce the possibility of data corruption, Pacemaker’s default behavior is to stop all resources if the cluster does not have quorum.

A cluster is said to have quorum when more than half the known or expected nodes are online, or for the mathematically inclined, whenever the following equation is true:

total_nodes < 2 * active_nodes

Therefore a two-node cluster only has quorum when both nodes are running, which is no longer the case for our cluster. This would normally make the creation of a two-node cluster pointless ^[16] , however it is possible to control how Pacemaker behaves when quorum is lost. In particular, we can tell the cluster to simply ignore quorum altogether.

After a few moments, the cluster will start the IP address on the remaining node. Note that the cluster still does not have quorum.

Now simulate node recovery by restarting the cluster stack on pcmk-1 and check the cluster’s status.

Note

In the dark days, the cluster may have moved the IP back to its original location (pcmk-1). Usually this is no longer the case.

^[16] Actually some would argue that two-node clusters are always pointless, but that is an argument for another time