OpenStack: Rabbitmq Cannot Join Cluster, Already a Member

rabbitmq-sh-600x600

You can run into this error when attempting to join a node into a Rabbitmq cluster when the cluster believes that a particular node is already a member of the cluster. I have run into this issue a few times and is usually seen when attempting to recover from a crash of an OpenStack controller.

I have run into this issue a few times and is usually seen when attempting to recover from a crash of an OpenStack controller.

Below are the steps to resolve the issue.

The error below is seen when attempting to add a node back into the cluster.

INFO REPORT==== 27-Jan-2017::16:57:22 ===
Already member of cluster: [rabbit@nodectrl2,rabbit@nodectrl1,
rabbit@nodectrl0]

We check the cluster status for confirmation.

root@nodectrl1 rabbitmq]# rabbitmqctl cluster_status
Cluster status of node rabbit@nodectrl1 …
[{nodes,[{disc,[rabbit@nodectrl0,rabbit@nodectrl1,
rabbit@nodectrl2]}]},
{running_nodes,[rabbit@nodectrl2,rabbit@nodectrl1]},
{cluster_name,<<“rabbit@nodectrl0.localdomain”>>},
{partitions,[]},
{alarms,[{rabbit@nodectrl2,[]},{rabbit@nodectrl1,[]}]}]

Now we force the cluster to forget the affected node.

[root@nodectrl1 rabbitmq]# rabbitmqctl forget_cluster_node rabbit@nodectrl0
Removing node rabbit@nodectrl0 from cluster …

We then check the cluster status to ensure that it has been removed from the cluster.

[root@nodectrl1 rabbitmq]# rabbitmqctl cluster_status

Cluster status of node rabbit@nodectrl1 …
[{nodes,[{disc,[rabbit@nodectrl1,rabbit@nodectrl2]}]},
{running_nodes,[rabbit@nodectrl2,rabbit@nodectrl1]},
{cluster_name,<<“rabbit@nodectrl0.localdomain”>>},
{partitions,[]},
{alarms,[{rabbit@nodectrl2,[]},{rabbit@nodectrl1,[]}]}]

We can now add our node back into the cluster.

[root@nodectrl1 rabbitmq]#  rabbitmqctl -n nodectrl1 join_cluster rabbit@nodectrl0.localdomain

OpenStack: How to Remove RabbitMQ Durable Queues

hero_thumb1

Introduction

This error you can see in multiple places, one being the logs for rabbitmq. The second being /var/log/cinder/volume.log

The tail statement below is very helpful for finding errors in OpenStack.

tail -fn0 /var/log/{nova,cinder,glance}/*.log | egrep 'ERROR|TRACE|WARNING'

Documented Errors:

EXAMPLE 1:

oslo.messaging._drivers.impl_rabbit PreconditionFailed: Exchange.declare: (406) PRECONDITION_FAILED - cannot redeclare exchange 'openstack' in vhost '/' with different type, durable, internal or autodelete value

EXAMPLE 2:

2015-06-08 09:52:17.367 8437 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to declare consumer for topic 'cinder-scheduler': Queue.declare: (406) PRECONDITION_FAILED - parameters for queue 'cinder-scheduler' in vhost '/' not equivalent

Warning

Note that you need to make sure that you are only working with OpenStack specific queues.

Cause

This is caused by a mismatch in the durable queues setting in the configuration files on the controller nodes. You cannot just change this setting to correct the issue, you must manually delete the affected queues.

Resolution

# pcs resource unmanage mysqld
# pcs resource unmanage rabbitmq-server
# pcs cluster standby --all
# curl http://localhost:15672/cli/rabbitmqadmin > rabbitmqadmin
# chmod +x rabbitmqadmin
# ./rabbitmqadmin help subcommands
# rabbitmqctl list_queues

The ccommand above will give you a list of all the queues, so now you can search for the affected queue and delete it as shown. In this example I am deleting the queue “notifications.info”.

[root@lppcldiuctl01 ~(openstack_admin)]# ./rabbitmqadmin --username=openstack --password=81d86697132a45a55 delete queue name=notifications.info
queue deleted

If you have multiple queues affected you can run through several at a time as shown below.

# rabbitmqctl list_queues | awk '{print $1}' > queues
# vim queues
# cat /etc/rabbitmq/rabbitmq.config
# ./rabbitmqadmin --username=openstack 
--password=81d86697132a45d9832d7fb35d168a55 delete queue 
name=reply_ea13ffaf100de1baca
# for q in $(<queues); do ./rabbitmqadmin --username=openstack 
--password=81d8669713 delete queue name=$q; done

Now bring everything back online

# pcs cluster unstandby --all

Now re-managed the services that you unmanaged above.

# pcs resource manage mysqld
# pcs resource manage rabbitmq-server

Note that the documentation above can be modified to delete exchanges as well as queues. More information on exchanges below.

https://www.rabbitmq.com/tutorials/amqp-concepts.html