Introduction
This process can be used whenever your databases are out of sync. For example when someone without thinking reboots all nodes in a cluster without shutting down the databases first.
Resolution
Place all cluster hosts into standby mode and cleanup any failed resources.
I sugguest making a back-up of Mariadb on each controller nodes – just in case.
root@controler3# mysqldump --all-databases > mariadb_dump_06152015root@controller3 # pcs cluster standby --all root@controller3 # pcs status root@controller3 # pcs resource cleanup ${resource}
Then on each Controller Node – verify that mariadb (mysql) has stopped. If any instance has not stopped properly via PCS as shown below, please stop them manually.
# systemctl status mariadb.service mariadb.service - MariaDB database server Loaded: loaded (/usr/lib/systemd/system/mariadb.service; disabled) Active: inactive (dead)
Find the latest (i.e – largest) version of the DB (seqno), or choose a node if all have the same version. In my instance, controller3 had the largest/highest version number, so we will be doing most of our recovery work on the instance below.
root@controller3 # cat /var/lib/mysql/grastate.dat # GALERA saved state version: 2.1 uuid: 2fb3bbe0-eed6-11e4-ac79-4b77a337d598 seqno: 12175047 cert_index:
Stop puppet if it is running and edit the /etc/my.cnf.d/galera.cnf file. If you are not running puppet, or if puppet is not managing your Galera cluster you can skip this step and go ahead and edit the file.
root@controller3 # systemctl stop puppet
Set wsrep_cluster_address manually in /etc/my.cnf.d/galera.cnf making a note of the original value and restart mariadb manually.
For example, your default configuration should look something like what is shown below. Each IP address listed is the IP address of a mariadb instance in your cluster.
# Group communication system handle wsrep_cluster_address="gcomm://172.17.9.23,172.17.9.24,172.17.9.22"
Modify to this by commenting out the default string and adding the string shown in the example below. Note that in this example we are working on controller3.
# Group communication system handle #wsrep_cluster_address="gcomm://172.17.9.23,172.17.9.24,172.17.9.22" wsrep_cluster_address="gcomm://"
Now restart Mariadb manually on the controller3.
root@controller3 # systemctl start mariadb
Now start mariadb on one of the remaining controllers
root@controller2 # systemctl start mariadb
Below you can see the newly started Maridb instance as requested to sync
Jun 15 10:08:02 controller2 mysqld_safe[24999]: 150615 10:08:02 mysqld_safe Logging to '/var/log/mariadb/mariadb.log'. Jun 15 10:08:02 controller2 mysqld_safe[24999]: 150615 10:08:02 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql Jun 15 10:08:02 controller2 mysqld_safe[24999]: 150615 10:08:02 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsr...ver.pid' Jun 15 10:08:04 controller2 mysqld_safe[24999]: 150615 10:08:04 mysqld_safe WSREP: Recovered position 2fb3bbe0-eed6-11e4-ac79-4b77a337d598:12175047 Jun 15 10:08:06 controller2 systemd[1]: Started MariaDB database server.
Once running you can un-standby the nodes and monitor the status as the remaining resources are loaded.
root@rcontroller3 # pcs cluster unstandby --all root@node # pcs status
Finally, correct the value of wsrep_cluster_address to it’s original value in /etc/my.cnf.d/galera.cnf and restart the service, monitoring that the resource remains active.
root@controller3 # vi /etc/my.cnf.d/galera.cnf root@controller3 # systemctl restart mariadb root@controller3 # pcs status
Now check your databases to make sure that they are in sync. The following file should be the same on each controller.
[root@lppcldiuctl01 ~(openstack_admin)]# cat /var/lib/mysql/grastate.dat # GALERA saved state version: 2.1 uuid: 2fb3bbe0-eed6-11e4-ac79-4b77a337d598 seqno: -1 cert_index:
Reference:
https://access.redhat.com/solutions/1283463