OpenStack: Rabbitmq Cannot Join Cluster, Already a Member

rabbitmq-sh-600x600

You can run into this error when attempting to join a node into a Rabbitmq cluster when the cluster believes that a particular node is already a member of the cluster. I have run into this issue a few times and is usually seen when attempting to recover from a crash of an OpenStack controller.

I have run into this issue a few times and is usually seen when attempting to recover from a crash of an OpenStack controller.

Below are the steps to resolve the issue.

The error below is seen when attempting to add a node back into the cluster.

INFO REPORT==== 27-Jan-2017::16:57:22 ===
Already member of cluster: [rabbit@nodectrl2,rabbit@nodectrl1,
rabbit@nodectrl0]

We check the cluster status for confirmation.

root@nodectrl1 rabbitmq]# rabbitmqctl cluster_status
Cluster status of node rabbit@nodectrl1 …
[{nodes,[{disc,[rabbit@nodectrl0,rabbit@nodectrl1,
rabbit@nodectrl2]}]},
{running_nodes,[rabbit@nodectrl2,rabbit@nodectrl1]},
{cluster_name,<<“rabbit@nodectrl0.localdomain”>>},
{partitions,[]},
{alarms,[{rabbit@nodectrl2,[]},{rabbit@nodectrl1,[]}]}]

Now we force the cluster to forget the affected node.

[root@nodectrl1 rabbitmq]# rabbitmqctl forget_cluster_node rabbit@nodectrl0
Removing node rabbit@nodectrl0 from cluster …

We then check the cluster status to ensure that it has been removed from the cluster.

[root@nodectrl1 rabbitmq]# rabbitmqctl cluster_status

Cluster status of node rabbit@nodectrl1 …
[{nodes,[{disc,[rabbit@nodectrl1,rabbit@nodectrl2]}]},
{running_nodes,[rabbit@nodectrl2,rabbit@nodectrl1]},
{cluster_name,<<“rabbit@nodectrl0.localdomain”>>},
{partitions,[]},
{alarms,[{rabbit@nodectrl2,[]},{rabbit@nodectrl1,[]}]}]

We can now add our node back into the cluster.

[root@nodectrl1 rabbitmq]#  rabbitmqctl -n nodectrl1 join_cluster rabbit@nodectrl0.localdomain

OpenStack: instackenv.json Format Example

stack-example

Here is a quick and dirty example of the format of your instackenv.json. This is the file that Ironic uses to import nodes.

Enter your IPMI user id under “pm_user”

Enter your IPMI password under “pm_password”

 

[code language=”css”]
{
"nodes":[
{
"mac":[
"74:E6:E2:FB:71:B0"
],
"cpu":"4",
"memory":"6144",
"disk":"40",
"arch":"x86_64",
"name":"control01",
"pm_type":"pxe_ipmitool",
"pm_user":"admin",
"pm_password":"admin",
"pm_addr":"10.75.99.120"
},
{
"mac":[
"74:E6:E2:FB:71:D6"
],
"cpu":"4",
"memory":"6144",
"disk":"40",
"arch":"x86_64",
"name":"control02",
"pm_type":"pxe_ipmitool",
"pm_user":"admin",
"pm_password":"admin",
"pm_addr":"10.75.99.119"
},
{
"mac":[
"74:E6:E2:FB:73:D0"
],
"cpu":"4",
"memory":"6144",
"disk":"40",
"arch":"x86_64",
"name":"control03",
"pm_type":"pxe_ipmitool",
"pm_user":"admin",
"pm_password":"admin",
"pm_addr":"10.75.99.118"
},
{
"mac":[
"74:E6:E2:FB:27:D4"
],
"cpu":"4",
"memory":"6144",
"disk":"40",
"arch":"x86_64",
"name":"compute01",
"pm_type":"pxe_ipmitool",
"pm_user":"admin",
"pm_password":"admin",
"pm_addr":"10.75.99.117"
},
]
}
[/code]

Red Hat OpenStack 8: Making your Undercloud Immutable

chain-and-padlock-macro-wallpaper-background

Introduction

This article will show you how to block the overcloud from being deleted.

Blocking Users from Deleting the Overcloud Stack

First make a backup copy of /etc/heat/policy.json

$sudo cp /etc/heat/policy.json /etc/heat/policy.json.orig

Run the command below to see the default stacks:delete policy.

$ sudo grep -m1 stacks:delete /etc/heat/policy.json
“stacks:delete”: “rule:deny_stack_user”,

Then, make it so that we deny anyone and everyone from removing the stack, even if you’re an admin.

Note, that this means that the policy would have to be reverted back to the original configuration to delete the stack in the future. See sed command below.

$ sudo sed -i /stacks:delete/{s/rule:.*/’rule:deny_everybody”,’/}
/etc/heat/policy.json

Verify your changes.

$ sudo grep -m1 stacks:delete /etc/heat/policy.json
“stacks:delete”: “rule:deny_everybody”,

Blocking Users from Deleting Nova Instances

In addition to blocking users from accidentally deleting your overcloud from heat, you should also block the accidental deletion of the overcloud nodes from nova.

First, run the command below to make a backup of /etc/nova/policy.json.

$ sudo cp /etc/nova/policy.json /etc/nova/policy.json.orig

Run the command below to see the default compute:delete policy.

$ sudo grep compute:delete /etc/nova/policy.json
“compute:delete”: “rule:admin_or_owner”,

Now let’s change the policy so that it blocks anyone and everyone from deleting a compute node.

$ sudo sed -i /compute:delete/{s/rule:.*/’rule:deny_everybody”,’/}
/etc/nova/policy.json

Now we can verify our changes.

$ sudo grep compute:delete /etc/nova/policy.json
“compute:delete”: “rule:deny_everybody”,

Deploying Red Hat OpenStack 10 via the Tripleo UI

tripleo_owl

In this lab, we are going to deploy a functional, yet simple, Overcloud via the Tripleo WebUI using Virtual Machines. Our test deployment will consist of three Overcloud Controller Nodes (configured with HA) and one Overcloud Compute Node.

Hypervisor Details

Interface IP Address Interface IP Address Interface IP Address
em1
10.13.32.31
virbr0
192.168.122.1
virbr1
192.168.122.253

Undercloud VM Details

Interface IP Address Interface IP Address
brct-plane
172.16.0.1
eth1
192.168.122.253

Prerequisites

Note that we have installed squid on our hypervisor node and are using that node to proxy all web traffic to the undercloud controller. This also requires configuring your web browser to use the hypervisor as its proxy.

Continue reading

OpenStack: Configuring SR-IOV in RHEL OSP 8

openstack

Introduction

This article documents the configuration used to configure SR-IOV in OSP 8/Liberty on Dell hardware

Compute Node Configuration

This section will outline the changes needed to configure SR-IOV on each Compute Node.

Bios Configuration on Dell Compute Nodes

First, you will need to ssh to the drac of each Compute Node. Then, type the command below to enter racadm command line.

#racadm

Type the command below to enable SRIOV.

#racadm set BIOS.IntegratedDevices.SriovGlobalEnable Enabled
[Key=BIOS.Setup.1-1#IntegratedDevices]
RAC1017: Successfully modified the object value and the change is in
pending state.
To apply modified value, create a configuration job and reboot
the system. To create the commit and reboot jobs, use “jobqueue”
command. For more information about the “jobqueue” command, see RACADM
help.

Type the command below to verify your configuration.

#racadm>>get BIOS.IntegratedDevices.SriovGlobalEnable

racadm get BIOS.IntegratedDevices.SriovGlobalEnable
[Key=BIOS.Setup.1-1#IntegratedDevices]
SriovGlobalEnable=Enabled

If the server already has an OS, reboot it to make these settings stick. If it doesn’t, use these racadm commands to power cycle.

#racadm serveraction powerdown
#racadm serveraction powerup

Grub Configuration on Compute Nodes

Add “intel_iommu=on” to the GRUB_CMDLINE_LINUX line as shown below.

# cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=”$(sed ‘s, release .*$,,g’ /etc/system-release)”
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT=”console”
GRUB_CMDLINE_LINUX=”console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet intel_iommu=on”
GRUB_DISABLE_RECOVERY=”true”
audit=1

First, make a backup of /etc/default/grub.

# cp -p /etc/default/grub /etc/default/grub/.$(date +%F_%R)

Edit the line below.

“GRUB_CMDLINE_LINUX=\”console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet\”

Change it to this.

“GRUB_CMDLINE_LINUX=\”console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet intel_iommu=on\”

Now, rebuild GRUB config as shown below.

# grub2-mkconfig -o /boot/grub2/grub.cfg

Specify the number of VFs to Create in rc.local

Add the following line to /etc/rc.d/rc.local, adjusting for the device name and #VFs. In this instance, our physical adapters each support 32 VFs.

echo 32 > /sys/class/net/<device>/device/sriov_numvfs

For example:

# echo 32 > /sys/class/net/p1p1/device/sriov_numvfs
# echo 32 > /sys/class/net/p3p1/device/sriov_numvfs

Also,ensure the correct SELinux context is restored.

# restorecon -R -v /etc/rc.d/rc.local

Ensure that /etc/rc.d/rc.local is executable.

#chmod +x /etc/rc.d/rc.local

Whitelist PCI devices nova-compute (Compute)

Tell nova-compute which pci devices are allowed to be passed through. Edit the file /etc/nova/nova.conf:

[default]
pci_passthrough_whitelist = [{“vendor_id”:”8086″,”product_id”:”154d”}][{“devname”:”p1p1″,”physical_network”:”sriov_net1″},{“devname”:”p3p1″,”physical_network”:”sriov_net2″}]

This tells nova that all VFs belonging to the physical interface, “p1p1“, are allowed to be passed through to VMs and belong to the neutron provider network “sriov_net1” and all VFs belonging to the physical interface, “p3p1“, are allowed to be passed through for the network “sriov_net2“.

Restart nova-compute on each compute node with the command shown below.

#systemctl restart openstack-nova-compute

Install and Enable Neutron Sriov-Agent (Compute)

Note that the sriov-agent is not required, however, we are going to install and configure it anyway.

Install the following rpm.

# yum -y install openstack-neutron-sriov-nic-agent

Now, on each compute node edit the file /etc/neutron/plugins/ml2/ml2_conf_sriov.ini:

[securitygroup]
firewall_driver = neutron.agent.firewall.NoopFirewallDriver

[sriov_nic]
physical_device_mappings = sriov_net1:p1p1,sriov_net2:p3p1

Now enable and start the nic agent.

# systemctl enable neutron-sriov-nic-agent.service && systemctl start neutron-sriov-nic-agent.service

Controller Node Configuration

Perform the following steps on each Controller Node. Note we will modify Nova and Neutron config files.

Neutron-Server changes in /etc/neutron/plugins/ml2/ml2_conf.ini(Controller)

The following changes take place in the file /etc/neutron/plugins/ml2/ml2_conf.ini.

Add sriovnicswitch as mechanism driver.

mechanism_drivers =openvswitch,bsn_ml2,sriovnicswitch

Set type_drivers to vlan as shown below.

type_drivers = vlan

Set tenant_network_types to vlan.

tenant_network_types = vlan

Set flat_networks as shown below where “sriov_net1” and “sriov_net2” are the networks we are going to create.

flat_networks =datacentre,sriov_net1,sriov_net2

Add VLAN ranges for the SRIOV networks to the network_vlan_ranges line as shown below.

network_vlan_ranges =datacentre:10:100,datacentre:101:122,sriov_net1:200:300,sriov_net1:200:300

Neutron-Server changes in /etc/neutron/plugins/ml2/ml2_conf_sriov.ini(Controller)

The change below needs to be made in /etc/neutron/plugins/ml2/ml2_conf_sriov.ini on each controller

Update the /etc/neutron/plugins/ml2/ml2_conf_sriov.ini on each controller.
In our case,the vendor_id is 8086 and the product_id is 10ed.

supported_pci_vendor_devs = 8086:10ed

Modify Nuetron-Server Startup

Edit /usr/lib/systemd/system/neutron-server.service. Here we add –config-file /etc/neutron/plugins/ml2/ml2_conf_sriov.ini to the ExecStart line. See example below.

# cat neutron-server.service
[Unit]
Description=OpenStack Neutron Server
After=syslog.target network.target

[Service]
Type=notify
User=neutron
ExecStart=/usr/bin/neutron-server –config-file /usr/share/neutron/neutron-dist.conf –config-dir /usr/share/neutron/server –config-file /etc/neutron/neutron.conf –config-file /etc/neutron/plugin.ini –config-file /etc/neutron/plugins/ml2/ml2_conf_sriov.ini –config-dir /etc/neutron/conf.d/common –config-dir /etc/neutron/conf.d/neutron-server –log-file /var/log/neutron/server.log
PrivateTmp=true
NotifyAccess=all
KillMode=process

[Install]
WantedBy=multi-user.target

Restart neutron on the controllers via pacemaker. See command below.

#pcs resource restart neutron-server-clone

Configure nova-scheduler (Controller)

On every controller node running nova-scheduler add PCIDeviceScheduler to the scheduler_default_filters parameter.

Also  add a new line for scheduler_available_filters parameter under the [default] section in /etc/nova/nova.conf.

[DEFAULT]
scheduler_default_filters = RetryFilter, AvailabilityZoneFilter, RamFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, PciPassthroughFilter
scheduler_available_filters = nova.scheduler.filters.all_filters
scheduler_available_filters = nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter

Now restart nova-scheduler via Pacemaker as shown below.

# pcs resource restart openstack-nova-scheduler-clone

 

Reference

http://docs.openstack.org/liberty/networking-guide/adv-config-sriov.html

Creating and Deleting OpenStack Pacemaker IP Addresses

bild_go-clusters-34-nichtinsocialmediaverwenden

You can use the steps below if you need to change managed IP resources, for example, if you need to re-IP your RHEL OSP Overcloud endpoints.

In this example, we are changing a managed VIP from one IP to another.

First, we get a good look at the resource that we want to delete. Here we are going to delete the resource ip-99.239.203.25. This resource starts the VIP, 99.239.203.25.

# pcs resource show ip-99.239.203.25
Resource: ip-99.239.203.25 (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=99.239.203.25 cidr_netmask=32
Operations: start interval=0s timeout=20s (ip-99.239.203.25-start-interval-0s)
stop interval=0s timeout=20s (ip-99.239.203.25-stop-interval-0s)
monitor interval=10s timeout=20s (ip-99.239.203.25-monitor-interval-10s)

Now let’s actually delete it.

# pcs resource delete ip-99.239.203.25
Attempting to stop: ip-99.239.203.25…Stopped

Now lets create the replacement VIP

# pcs resource create ip-99.239.203.10 ocf:heartbeat:IPaddr2 ip=99.239.203.10 cidr_netmask=32 op monitor interval=10s

Now, let’s take a good look at it.

# pcs resource show ip-99.239.203.10
Resource: ip-99.239.203.10 (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=99.239.203.10 cidr_netmask=32
Operations: start interval=0s timeout=20s (ip-99.239.203.10-start-interval-0s)
stop interval=0s timeout=20s (ip-99.239.203.10-stop-interval-0s)
monitor interval=10s (ip-99.239.203.10-monitor-interval-10s)

Now we need to check to make sure that the VIP started on one of our OpenStack controllers.

# pcs status | grep 99.239.203.10
ip-99.239.203.10 (ocf::heartbeat:IPaddr2): Started ctrl01

For good measure, let’s make sure we can ping it.

# ping 99.239.203.10
PING 99.239.203.10 (99.239.203.10) 56(84) bytes of data.
64 bytes from 99.239.203.10: icmp_seq=1 ttl=64 time=0.781 ms
64 bytes from 99.239.203.10: icmp_seq=2 ttl=64 time=1.21 ms

 

 

Mapping Virtual Networks with plotnetcfg

plot-sheet-large

Plotnetcfg is a Linux utility that you can use to scan the networking configuration on a server and output the configuration hierarchy to a file. Plotnetcfg is most useful when troubleshooting complex virtual networks with all sorts of bonds and bridges, the likes of which you will find on KVM nodes, or OpenStack Controller nodes.

You can install plot on RHEL/Centos as shown below.

# yum -y plotnetcfg.x86_64

You will also want to install the “dot” command which is installed with graphiz. See below.

# yum -y install graphviz.x86_64

Now that the bits and pieces are installed we can run the command below which outputs to PDF file named file.pd

# plotnetcfg | dot -Tpdf > file.pd

If you want to, you can also use “convert” to convert the PDF to a jpg. For example, I exported to jpg to embed below.

file

Super clean, and super easy to read and understand