Corosync Cluster with Failover IP

Corosync Cluster with Failover IP

One of the first customer requirements you usually read is High availability. For a long time now, it has been more the norm that the project is still accessible without problems even in the event of partial failures and that “single points of failure” are avoided. A Corosync / Pacemaker cluster is often used for this purpose; the technology behind it has been tried and tested for more than a decade – the basic idea behind it is: you create virtual resources that can be started on every connected node.

The following describes how to create a Corosync / Pacemaker cluster using Ubuntu. If you are already familiar with these steps and would just like to know how to store the failover IP in OpenStack, you will find the relevant information further down in the article.

Create a Corosync / Pacemaker cluster

Installation

The first step is to install the necessary packages. crmsh offers a shell that can be used to control the cluster.

root@test-node-1:~# apt install corosync pacemaker crmsh

Configuration

Then the authkey is created, which is needed for the communication between the nodes. Without this, the service may not be started.

root@test-node-1:~# corosync-keygen

This can take some time, depending on how busy the server is. For newly created VMs, it would take too long and you can use the following snippet, for example, which writes random data to the hard disk – but please always keep in mind that you should not write all over the hard disk, so if necessary you should adapt the command to your own needs!

while /bin/true; do dd if=/dev/urandom of=/tmp/entropy bs=1024 count=10000; for i in {1..50}; 
do cp /tmp/entropy /tmp/tmp_$i_$RANDOM; done; rm -f /tmp/tmp_* /tmp/entropy; done

When this is done, you copy the authkey onto both servers at /etc/corosync/authkey .

root@test-node-1:~# cp authkey /etc/corosync/authkey
root@test-node-1:~# chown root.root /etc/corosync/authkey
root@test-node-1:~# chmod 0400 /etc/corosync/authkey
root@test-node-1:~# scp /etc/corosync/authkey root@test-node-2:/etc/corosync/authkey

The cluster is then configured in the file /etc/corosync/corosync.conf, in which, among other things, the private IPs of the cluster nodes are defined. This file is also identical on all nodes.

totem {
  version: 2
  cluster_name: test-cluster
  transport: udpu
  interface {
    ringnumber: 0
    bindnetaddr: 172.16.0.0
    broadcast: yes
    mcastport: 5407
  }
}

nodelist {
  node {
    ring0_addr: 172.16.0.10
  }
  node {
    ring0_addr: 172.16.0.20
  }
}

quorum {
  provider: corosync_votequorum
}

logging {
  to_logfile: yes
  logfile: /var/log/corosync/corosync.log
  to_syslog: yes
  timestamp: on
}

service {
  name: pacemaker
  ver: 1
}

The cluster then needs to be restarted, therefore all data is transferred. At this point, the cluster should already display its status and recognise all nodes. Initially, this may take a few moments.

root@test-node-1:~# systemctl restart corosync && systemctl restart pacemaker

root@test-node-1:~# crm status
Stack: corosync
Current DC: test-node-1 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Mon Dec 7 15:40:20 2020
Last change: Mon Dec 7 15:40:20 2020 by hacluster via crmd on test-node-1

2 nodes configured
0 resource configured

Online: [ test-node-1 test-node-2 ]

By means of the crm, the cluster can be controlled and its current configuration accessed. The configuration should be very similar to the following:

root@test-node-1:~# crm configure show
node 2886729779: test-node-1
node 2886729826: test-node-2
property cib-bootstrap-options: \
  have-watchdog=false \
  dc-version=1.1.18-2b07d5c5a9 \
  cluster-infrastructure=corosync \
  cluster-name=test-cluster \
  stonith-action=reboot \
  no-quorum-policy=stop \
  stonith-enabled=false \
  last-lrm-refresh=1596896556 \
  maintenance-mode=false
rsc_defaults rsc-options: \
  resource-stickiness=1000

Now this configuration can be edited directly and resources can be defined. This can also be done using “crm configure“, but in this example the configuration is taken over directly.

root@test-node-1:~# crm configure edit

node 2886729779: test-node-1
node 2886729826: test-node-2
primitive ha-vip IPaddr2 \
  params ip=172.16.0.100 cidr_netmask=32 arp_count=10 arp_count_refresh=5 \
  op monitor interval=10s \
  meta target-role=Started
property cib-bootstrap-options: \
  have-watchdog=false \
  dc-version=1.1.18-2b07d5c5a9 \
  cluster-infrastructure=corosync \
  cluster-name=test-cluster \
  stonith-action=reboot \
  no-quorum-policy=ignore \
  stonith-enabled=false \
  last-lrm-refresh=1596896556 \
  maintenance-mode=false
rsc_defaults rsc-options: \
  resource-stickiness=1000

The attentive reader will notice that the “no-quorum-policy” has also been adapted. This is important for the operation of a cluster that consists of only two nodes, as no quorum could be formed if one node fails.

root@test-node-1:~# crm status
Stack: corosync
Current DC: test-node-1 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Mon Dec 7 15:40:20 2020
Last change: Mon Dec 7 15:45:21 2020 by hacluster via crmd on test-node-1

2 nodes configured
1 resource configured

Online: [ test-node-1 test-node-2 ]

Full list of resources:

ha-vip (ocf::heartbeat:IPaddr2): Started test-node-1

 

Configure Failover IP in OpenStack

There are two possibilities to store the IP in OpenStack. Firstly, you can navigate to the VM’s port in the web interface via Network -> Networks -> said network -> Ports and add the desired IP to the tab “Allowed address pairs“. On the other hand, this is also possible via OpenStack CLI Tool:

openstack port list --server test-node-1
+--------------------------------------+------+-------------------+----------------------------------------------------------------------------+--------+
| ID                                   | Name | MAC Address       | Fixed IP Addresses                                                         | Status |
+--------------------------------------+------+-------------------+----------------------------------------------------------------------------+--------+
| 0a7161f5-c2ff-402c-9bf4-976215a95cf3 |      | fa:16:3e:2a:f3:f2 | ip_address='172.16.0.10', subnet_id='9ede2a39-7f99-48c8-a542-85066e30a6f3' | ACTIVE |
+--------------------------------------+------+-------------------+----------------------------------------------------------------------------+--------+

The additionally permitted IP address is added to the port as follows. A complete network can also be defined here if several IP resources are to be created.

openstack port set 0a7161f5-c2ff-402c-9bf4-976215a95cf3 --allowed-address ip-address=172.16.0.100

This step must be repeated for both servers. After this, the IP is also accessible in the OpenStack project. If this is not the case, it may help to move the IP resource to the other node, therefore the IP is announced there. However, this should not be the case due to the ARP settings of the resource set above.
The additionally permitted IP address is added to the port as follows. A complete network can also be defined here if several IP resources are to be created.

crm resource migrate ha-vip test-node-2

The whole thing doesn’t work as planned, or there are still further questions? Our MyEngineers are sure to help!