Corosync Cluster with Failover IP

Corosync Cluster with Failover IP

One of the first customer requirements you usually read is High availability. For a long time now, it has been more the norm that the project is still accessible without problems even in the event of partial failures and that “single points of failure” are avoided. A Corosync / Pacemaker cluster is often used for this purpose; the technology behind it has been tried and tested for more than a decade – the basic idea behind it is: you create virtual resources that can be started on every connected node.

The following describes how to create a Corosync / Pacemaker cluster using Ubuntu. If you are already familiar with these steps and would just like to know how to store the failover IP in OpenStack, you will find the relevant information further down in the article.

Create a Corosync / Pacemaker cluster

Installation

The first step is to install the necessary packages. crmsh offers a shell that can be used to control the cluster.

root@test-node-1:~# apt install corosync pacemaker crmsh

Configuration

Then the authkey is created, which is needed for the communication between the nodes. Without this, the service may not be started.

root@test-node-1:~# corosync-keygen

This can take some time, depending on how busy the server is. For newly created VMs, it would take too long and you can use the following snippet, for example, which writes random data to the hard disk – but please always keep in mind that you should not write all over the hard disk, so if necessary you should adapt the command to your own needs!

while /bin/true; do dd if=/dev/urandom of=/tmp/entropy bs=1024 count=10000; for i in {1..50}; 
do cp /tmp/entropy /tmp/tmp_$i_$RANDOM; done; rm -f /tmp/tmp_* /tmp/entropy; done

When this is done, you copy the authkey onto both servers at /etc/corosync/authkey . 

root@test-node-1:~# cp authkey /etc/corosync/authkey
root@test-node-1:~# chown root.root /etc/corosync/authkey
root@test-node-1:~# chmod 0400 /etc/corosync/authkey
root@test-node-1:~# scp /etc/corosync/authkey root@test-node-2:/etc/corosync/authkey

The cluster is then configured in the file /etc/corosync/corosync.conf, in which, among other things, the private IPs of the cluster nodes are defined. This file is also identical on all nodes.

totem {
  version: 2
  cluster_name: test-cluster
  transport: udpu
  interface {
    ringnumber: 0
    bindnetaddr: 172.16.0.0
    broadcast: yes
    mcastport: 5407
  }
}

nodelist {
  node {
    ring0_addr: 172.16.0.10
  }
  node {
    ring0_addr: 172.16.0.20
  }
}

quorum {
  provider: corosync_votequorum
}

logging {
  to_logfile: yes
  logfile: /var/log/corosync/corosync.log
  to_syslog: yes
  timestamp: on
}

service {
  name: pacemaker
  ver: 1
}

The cluster then needs to be restarted, therefore all data is transferred. At this point, the cluster should already display its status and recognise all nodes. Initially, this may take a few moments.

root@test-node-1:~# systemctl restart corosync && systemctl restart pacemaker

root@test-node-1:~# crm status
Stack: corosync
Current DC: test-node-1 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Mon Dec 7 15:40:20 2020
Last change: Mon Dec 7 15:40:20 2020 by hacluster via crmd on test-node-1

2 nodes configured
0 resource configured

Online: [ test-node-1 test-node-2 ]

By means of the crm, the cluster can be controlled and its current configuration accessed. The configuration should be very similar to the following:

root@test-node-1:~# crm configure show
node 2886729779: test-node-1
node 2886729826: test-node-2
property cib-bootstrap-options: \
  have-watchdog=false \
  dc-version=1.1.18-2b07d5c5a9 \
  cluster-infrastructure=corosync \
  cluster-name=test-cluster \
  stonith-action=reboot \
  no-quorum-policy=stop \
  stonith-enabled=false \
  last-lrm-refresh=1596896556 \
  maintenance-mode=false
rsc_defaults rsc-options: \
  resource-stickiness=1000

Now this configuration can be edited directly and resources can be defined. This can also be done using “crm configure“, but in this example the configuration is taken over directly.

root@test-node-1:~# crm configure edit

node 2886729779: test-node-1
node 2886729826: test-node-2
primitive ha-vip IPaddr2 \
  params ip=172.16.0.100 cidr_netmask=32 arp_count=10 arp_count_refresh=5 \
  op monitor interval=10s \
  meta target-role=Started
property cib-bootstrap-options: \
  have-watchdog=false \
  dc-version=1.1.18-2b07d5c5a9 \
  cluster-infrastructure=corosync \
  cluster-name=test-cluster \
  stonith-action=reboot \
  no-quorum-policy=ignore \
  stonith-enabled=false \
  last-lrm-refresh=1596896556 \
  maintenance-mode=false
rsc_defaults rsc-options: \
  resource-stickiness=1000

The attentive reader will notice that the “no-quorum-policy” has also been adapted. This is important for the operation of a cluster that consists of only two nodes, as no quorum could be formed if one node fails.

root@test-node-1:~# crm status
Stack: corosync
Current DC: test-node-1 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Mon Dec 7 15:40:20 2020
Last change: Mon Dec 7 15:45:21 2020 by hacluster via crmd on test-node-1

2 nodes configured
1 resource configured

Online: [ test-node-1 test-node-2 ]

Full list of resources:

ha-vip (ocf::heartbeat:IPaddr2): Started test-node-1

Configure Failover IP in OpenStack

There are two possibilities to store the IP in OpenStack. Firstly, you can navigate to the VM’s port in the web interface via Network -> Networks -> said network -> Ports and add the desired IP to the tab “Allowed address pairs“. On the other hand, this is also possible via OpenStack CLI Tool:

openstack port list --server test-node-1
+--------------------------------------+------+-------------------+----------------------------------------------------------------------------+--------+
| ID                                   | Name | MAC Address       | Fixed IP Addresses                                                         | Status |
+--------------------------------------+------+-------------------+----------------------------------------------------------------------------+--------+
| 0a7161f5-c2ff-402c-9bf4-976215a95cf3 |      | fa:16:3e:2a:f3:f2 | ip_address='172.16.0.10', subnet_id='9ede2a39-7f99-48c8-a542-85066e30a6f3' | ACTIVE |
+--------------------------------------+------+-------------------+----------------------------------------------------------------------------+--------+

The additionally permitted IP address is added to the port as follows. A complete network can also be defined here if several IP resources are to be created.

openstack port set 0a7161f5-c2ff-402c-9bf4-976215a95cf3 --allowed-address ip-address=172.16.0.100

This step must be repeated for both servers. After this, the IP is also accessible in the OpenStack project. If this is not the case, it may help to move the IP resource to the other node, therefore the IP is announced there. However, this should not be the case due to the ARP settings of the resource set above.
The additionally permitted IP address is added to the port as follows. A complete network can also be defined here if several IP resources are to be created.

crm resource migrate ha-vip test-node-2

 

 

The whole thing doesn’t work as planned, or there are still further questions? Our MyEngineers are sure to help!

Monitoring for machines with Icinga 2 Master

Monitoring for machines with Icinga 2 Master

With our OpenStack Cloud it is very easy to build your own environment according to your own ideas. Quickly and easily start a few machines with Terraform, make the service available to the outside world with an attached floating IP and associated Security Group and the project is up and running.

But no environment runs without errors and monitoring is a big issue – you like to know in advance of your own users, or customers, when something doesn’t quite work as it should. I think every reader of this blog is aware of the importance of monitoring, but also the evaluation of performance data. How can I easily monitor my OpenStack environment, especially if my servers are not accessible from the outside world? We have prepared something!

In addition to our IaaS offer, we also provide various SaaS solutions. Among them is the app Icinga 2 Master, which provides a complete Icinga 2 Master, including Graphite and Grafana, within a few minutes.

Once this has been started, after logging in you will find various integration scripts for different operating systems under the tab “Add Agent” – depending on the browser language.

You simply download it to the server according to the instructions, execute it and the server is connected to the Icinga2 master.

Everything important is automated here. By default, some checks are created directly and with the help of the director it is also easy to distribute further checks for its hosts. The API of the director can also be addressed directly, so there are almost no limits. In addition, you will find some graphs on the performance data of the connected agent directly with the respective check. Therefore not only problems can be detected, but also trends are visualised directly. This data is kept for one year in our packages to ensure a long-term overview.

The first month of Icinga 2 Master is also free – it’s worth a try. Our MyEngineer is also happy to help with the setup!