Red Hat OpenStack 13: Containerized Services Operations Guide

Screenshot from 2018-11-10 18-59-16.png

Contain Yourself

With the release of Red Hat OpenStack 13, the move to containerized overcloud services is complete.  Traditional systemd services such as RabbitMQ, Haproxy, Mariadb, etc, are all now running as containers in the overcloud.  This move to containers is meant to provide additional stability, control, and security to the platform. Future upgrades should be easier, and future deploys should be more flexible.

However, the move to containers brings with it a couple of new challenges. Operations.

The average OpenStack administrator no longer restarts services, they restart containers.

They no longer view a rabbit cluster’s status on the controller node, but rather within a container on the controller node.  Log locations have changed. Config file locations have changed.

So let’s retrain ourselves.

Continue reading

Advertisements

How to Manage Libvirt VMs via OpenStack Ironic (OSP10)

Ironic_mascot_color

Bear Metal

 

In this post I will document the steps that I am using to create a fully virtualized OSP 10 environment in my lab. The undercloud node is a VM, as well as the overcloud nodes. We will configure libvirt so that ironic has the ability to boot and shutdown the VMs on the underlying hypervisor via Ironic.

Add the stack user on your hypervisor. In this case my hypervisor’s hostname is virt01, however we will refer to it as hypervisor for clarity.

hypervisor# useradd stack
hypervisor# echo “password” | passwd stack --stdin

Modify polkit to allow stack user to manage libvirt.

hypervisor # cat << EOF > /etc/polkit-1/localauthority/50-local.d/50-libvirt-user-stack.pkla
[libvirt Management Access]
Identity=unix-user:stack
Action=org.libvirt.unix.manage
ResultAny=yes
ResultInactive=yes
ResultActive=yes
EOF

Now attempt to libvirt as stack via a remote session. Here we are just connecting back to the localhost, virt01. In the example below, 10.1.99.112 is the ip of the hypervisor. The undercloud has an ip of 10.1.99.10

undercloud# virsh --connect qemu+ssh://stack@10.1.99.112/system list --all

Now ssh as stack to your undercloud vm

Copy stack’s public key to your hypervisor (virt01 in this case). In the command below you will replace the ip address shown with the ip that your undercloud vm will use to connect to libvirt on the hypervisor

undercloud# ssh-copy-id -i ~/.ssh/id_rsa.pub stack@10.1.99.112

Now we need to create a few Virtual Machines. Specifically I am building an environment with 5 virtual machines to run virtualized Red Hat Openstack 13. My overcloud will consist of 2 computes and three controller nodes

I will use the command below to create 5 qcows.

hypervisor# cd /var/lib/libvirt/images/
hypervisor# for i in {1..5}; do qemu-img create -f qcow2 \
-o preallocation=metadata overcloud-node$i.qcow2 60G; done
Formatting ‘overcloud-node1.qcow2′, fmt=qcow2 size=64424509440 encryption=off cluster_size=65536 preallocation=’metadata’ lazy_refcounts=off
Formatting ‘overcloud-node2.qcow2′, fmt=qcow2 size=64424509440 encryption=off cluster_size=65536 preallocation=’metadata’ lazy_refcounts=off
Formatting ‘overcloud-node3.qcow2′, fmt=qcow2 size=64424509440 encryption=off cluster_size=65536 preallocation=’metadata’ lazy_refcounts=off
Formatting ‘overcloud-node4.qcow2′, fmt=qcow2 size=64424509440 encryption=off cluster_size=65536 preallocation=’metadata’ lazy_refcounts=off
Formatting ‘overcloud-node5.qcow2′, fmt=qcow2 size=64424509440 encryption=off cluster_size=65536 preallocation=’metadata’ lazy_refcounts=off

The command below will create 5 xml files and use those to spawn my 5 VMs.

hypervisor# for i in {1..5}; do \
virt-install --ram 16384 --vcpus 4 --os-variant rhel7 \
--disk path=/var/lib/libvirt/images/overcloud-node$i.qcow2,device=disk,bus=virtio,format=qcow2 \
--noautoconsole --vnc --network network:provisioning --network bridge:br99 \
--network network:default --name overcloud-node$i \
--dry-run --print-xml > /tmp/overcloud-node$i.xml; \
hypervisor# virsh define --file /tmp/overcloud-node$i.xml; done

You should end up with the following virtual machines

 

hypervisor# virsh list --all
Id Name State
----------------------------------------------------
1 undercloud running
-- overcloud-node1 shut off
-- overcloud-node2 shut off
-- overcloud-node3 shut off
-- overcloud-node4 shut off
-- overcloud-node5 shut off

Back on the undercloud we use the command below to grab the provisioning network mac address from each virtual machine running on the hypervisor. We could run this command locally on the hypervisor, but since we need the mac addresses for ironic on the undercloud, we will run it here.

undercloud$ for i in {1..5}; do virsh -c qemu+ssh://stack@10.1.99.112/system domiflist overcloud-node$i | awk ‘$3 == “provisioning” {print $5}’; done> /tmp/nodes.txt

Now we use our temp file above to populate the instackenv.json that we will import into ironic. See gist below

At this point we are ready to import our nodes via Ironic.

Note that I do not claim to be the original author of the steps documented above, rather I wanted to ensure that I could easily consume these steps in the future.

Also, I look forward to experimenting with the vbmc ironic driver and might stop using pxe_ssh altogether.

OpenStack: Introduction to Troubleshooting Heat

keyboard-key-board-melt-broken-computer_p

Introduction to Heat

Heat is the main orchestration engine for OpenStack, and is used my OpenStack director to install an OpenStack Overcloud environment.

When we run the “openstack deploy overcloud” command, we are specifically
telling RHEL OSP director that we want it to use the pre-defined Heat templates from
/usr/share/openstack-tripleo-heat-templates/. OSP director will manage the
deployment of a new overcloud heat stack, using files from this directory.
When RHEL OSP director calls the Heat stack, it needs the following data…

  • A top-level Heat template to use that describes the overall environment and the
    resources required.
  • An environment/resource registry to tell Heat where to find resource
    definitions for non-standard Heat elements, e.g. TripleO components.
  • A set of parameters to declare the deployment-specific options (via -e)

 

The most important files for us to focus on are in our deployment directory, these are the default files that get called by OSP director.

  • The top-level Heat template that OSP director uses for deployment is
    /usr/share/openstack-tripleo-heat-templates/overcloud-without-mergepy.yaml
  • The resource registry, which tells Heat where to find the templates for
    deployment resources is
    /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.
    yaml

Creating a Heat Stack

To create the stack we run the command below. This command instructs heat to use
the templates in ~/my_templates/, as well as the override templates specified
with the ‘-e’ option.

This is just an example of what I am using in my lab environment, your deploy command will be much different. Also note that I have copied the templates from /usr/share/openstack-tripleo-heat-templates to ~/my_templates/.

#openstack overcloud deploy –debug –templates ~/my_templates/ \
–ntp-server 10.1.0.1 –control-scale 3 –compute-scale 2 \
-e ~/my_templates/advanced-networking.yaml

Troubleshooting a Failed Heat Stack

Unfortunately our deploy failed with the following errors.

Exception: Heat Stack create failed.
DEBUG: openstackclient.shell clean_up DeployOvercloud
DEBUG: openstackclient.shell got an error: Heat Stack create failed.
ERROR: openstackclient.shell Traceback (most recent call last):
File “/usr/lib/python2.7/site-packages/openstackclient/shell.py”, line 176, in
run
return super(OpenStackShell, self).run(argv)
File “/usr/lib/python2.7/site-packages/cliff/app.py”, line 230, in run
result = self.run_subcommand(remainder)
File “/usr/lib/python2.7/site-packages/cliff/app.py”, line 295, in
run_subcommand
result = cmd.run(parsed_args)
File “/usr/lib/python2.7/site-packages/cliff/command.py”, line 53, in run
self.take_action(parsed_args)
File
“/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py”,
line 864, in take_action
self._deploy_tripleo_heat_templates(stack, parsed_args)
File
“/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py”,
line 535, in _deploy_tripleo_heat_templates
parsed_args.timeout)
File
“/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py”,
line 478, in _heat_deploy
raise Exception(“Heat Stack create failed.”)
Exception: Heat Stack create failed.

We can verify that the deploy failed with the command below.

stack@undercloud] # heat stack-list
+————————————–+————+—————+—| id | stack_name | stack_status | creation_time |
+————————————–+————+—————+—
| ce993847-b0ee-4ea2-ac15-dc0ddc81825a | overcloud | CREATE_FAILED |
2016-02-29T20:40:54Z |

Since the stack deploy has failed, let’s take a closer look at the stack resources
and see if we can determine which resources failed.

Here we will make things simple by viewing only failed resources.

[stack@undercloud] # heat resource-list overcloud | grep -i failed
| Compute | c032c668-755f-422f-8ad1-4abf46b022ff
| OS::Heat::ResourceGroup | CREATE_FAILED |
2016-02-29T20:40:55Z |
| Controller | 668d27e0-9ab1-4dbe-8445-1d1ee8839265
| OS::Heat::ResourceGroup | CREATE_FAILED |
2016-02-29T20:40:55Z |

The failed resources are named “Compute” and “Controller“. Lets take a closer
look at those using the “resource-show” argument.

#heat resource-show overcloud Compute

| resource_status_reason | ResourceUnknownStatus: Resource failed – Unknown
status FAILED due to “Resource CREATE failed: ResourceUnknownStatus:
Resource failed – Unknown status FAILED due to “Resource CREATE failed:
StackValidationFailed: Property error : OsNetConfigImpl: config The Parameter
(BondInterfaceOvsOptions) was not provided.”” |

Let’s now do the same for Controller.

#heat resource-show overcloud Controller

| resource_status_reason | ResourceUnknownStatus: Resource failed – Unknown
status FAILED due to “Resource CREATE failed: ResourceUnknownStatus:
Resource failed – Unknown status FAILED due to “Resource CREATE failed:
StackValidationFailed: Property error : OsNetConfigImpl: config The Parameter
(BondInterfaceOvsOptions) was not provided.”” |

Apparently I have some issues with my OVS bonding options, so I need to get those straight before I can continue.

Deleting a Failed Heat Stack

Since our last deploy failed, we need to delete the failed stack before we can kick off another stack deploy. Below is an example of that command – note we are using the UUID of the stack.

 

[stack@vz-undercloud] # heat stack-delete 2b0da4f6-e6f8-41cd-89e8-bf070d0e0d15
 +--------------------------------------+------------+-------------------| id | stack_name | stack_status | creation_time |
 +--------------------------------------+------------+-------------------
 | 2b0da4f6-e6f8-41cd-89e8-bf070d0e0d15 | overcloud | DELETE_IN_PROGRESS |
 2016-03-01T17:21:58Z |
 +--------------------------------------+------------+-------------------

then…

[stack@vz-undercloud] # heat stack-list
 +--------------------------------------+------------+---------------+---
 | id | stack_name | stack_status | creation_time |
 +--------------------------------------+------------+---------------+---
 | 2b0da4f6-e6f8-41cd-89e8-bf070d0e0d15 | overcloud | DELETE_FAILED |
 2016-03-01T17:21:58Z |
 +--------------------------------------+------------+---------------+---

Now lets kick off another deploy

#openstack overcloud deploy –debug –templates ~/my_templates/ \
–ntp-server 10.1.0.1 –control-scale 3 –compute-scale 2 \
-e ~/my_templates/advanced-networking.yaml

Unfortunately, this deploy failed as well.

Ok, let’s take a look at /var/log/heat/heat/heat-engine.log for more details. I also suggest opening another ssh session and tailing the log while the delete is attempting to do its thing.

If the output is too verbose to follow, I suggest attempting to thin out the output using the command below

#tail -f /var/log/heat/heat-engine.log | egrep ‘error|fatal’

This lead me to the following error.

2016-03-01 13:46:12.366 18554 ERROR heat.engine.resource [-] Error marking
resource as failed
2016-03-01 13:46:12.366 18554 TRACE heat.engine.resource DBConnectionError:
(_mysql_exceptions.OperationalError) (2003, “Can’t connect to MySQL server on
‘172.16.0.10’ (111)”)

Mysql is down? So now we need to look at the mariadb logs – where we see the following.

160301 12:16:24 [Warning] Failed to setup SSL
160301 12:16:24 [Warning] SSL error: SSL_CTX_set_default_verify_paths failed

Apparently SELinux is blocking the reads for the certificates.
There are two ways to work around this issue. You can run “restorecon -v
/path/to/certs/“, or you can work around by disabling selinux by running
“setenforce 0” or by editing the /etc/selinux/config file and setting ‘SELINUX=DISABLED’.
You may need to rerun the delete, in my case it was stuck in
“DELETE_IN_PROGRESS”.  I restarted all heat releated services to force the delete to error.

#systemctl restart openstack-heat-engine.service openstack-heat-api.service
openstack-heat-api-cloudwatch.service openstack-heat-api-cfn.service

This will cause the delete to error. You can then retry the delete.

If the delete is taking a long time, you can dig a bit deeper into the delete
using the command below.

#heat resource-list overcloud

Now drill down more with the command below.

#heat event-list overcloud

Make note of the resource_name and its id and use them in the next command.
Note that stack name is still overcloud.

heat event-show overcloud Compute d9e13b02-07b0-4beb-8442-f25de0e7ef8b

I have found that rebooting the undercloud will clear out any in-progress
tasks, you can then run the delete again.

You can also try to manually delete each node from Ironic by mimic’ing what the
nova driver in Ironic does. This is shown below for reference.

$ ironic node-set-provision-state <node uuid> deleted

And to remove the instance_uuid

$ ironic node-update <node uuid> remove instance_uuid

Troubleshooting Failed and In Progress Deployments.

Updated 6/28/2018

While your deploy is running, you can watch for stuck or failed deployments.

$ heat deployment-list | grep -vi complete
WARNING (shell) "heat deployment-list" is deprecated, please use "openstack software deployment list" instead
+--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+----------------------+------------------+
| id | config_id | server_id | action | status | creation_time | status_reason |
+--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+----------------------+------------------+
| 143b25ce-9c1e-48b0-bd92-2569640b5208 | 5b9843a5-2092-4c75-8eae-2e042d7b34d2 | f08232e7-7a14-4885-86ab-443b4581afe6 | CREATE | IN_PROGRESS | 2017-05-17T19:46:38Z | None |
| 27d2a61e-e5cd-4313-967d-cc7c385320cb | c113ead1-bb9a-46ca-a992-4527c2e9fda1 | 67dc5457-5ae7-4290-ad77-36bf8bc81d56 | CREATE | IN_PROGRESS | 2017-05-17T19:46:38Z | None |
| fabab6ae-8848-4a35-8066-d83f82335e95 | 763c4760-3e8c-43c1-8afd-d1b8d6bbb55b | e51061ed-f96f-43ec-b18a-4dc079d961d8 | CREATE | IN_PROGRESS | 2017-05-17T19:46:38Z | None |
| 0b6993c1-a320-4dc7-980c-e789d87a0c84 | 50f1e303-123a-451b-9d86-f29c32e6ce0b | 8d87818a-943f-4f69-8112-7450dfa26bb7 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| 0e537823-f17d-4009-b2e4-6d11adcdfeb7 | 1d7938ad-a6e5-4d8f-9604-489408295a59 | 98847475-ed1c-44b2-aa4e-bbddfe81a1d1 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| 1d89b9cf-87cf-4050-a00f-496c82fd9432 | 4f378bc3-10a7-4a47-b269-2c7de5d75dd7 | 69257ad8-a3ee-453b-a532-db3c293a34b9 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| 3d5915a6-1470-4ffe-a0f1-a356df33e1f9 | dbc8c62c-f5cc-4cfa-93af-ef7b9f02fd19 | 75b49872-c1fb-4c83-87d1-579dcd027bc8 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| a2e57b50-dc03-42a8-b851-b33ade36d591 | 0c7133d8-20bc-481d-b9c5-2d0867a656e7 | 000794e9-4ce2-4c8a-aa69-02aadfd19ce6 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| ab463b0f-fd6e-4169-9283-3e8bbcc1bb4b | 627f1288-3823-4e21-b61d-4bafcbdc446a | b6cf41e8-e60c-43ca-a3ed-d24c078031c8 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| ad05853f-2e70-4f11-b8c0-26d2d4c324fb | 8cd656e7-129f-40d4-a1bc-ed0181b9e768 | 24017696-102d-43ab-a101-f3a33a091faa | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| b6418aec-385a-4b6f-845b-84c87fa10be7 | 04747e35-f0cc-45a4-b251-78ea61a08c83 | ffe36a72-2293-48eb-93ed-bfe1077a0245 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| e741d37d-6119-4d37-8b91-519ecfad8397 | 271139d7-c29c-47df-ad11-26bbc3373c4c | 167dc5ee-0831-4fa8-a7ca-8db6d2b3d4ea | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| 0910d714-027b-40b0-acf3-140f5ab90837 | 9317c6ca-b767-49f2-a56c-45b61bbee2d5 | 34f6b4fd-fa98-46b6-8ace-b6f76af50162 | CREATE | IN_PROGRESS | 2017-05-17T19:46:40Z | None |
| 508e585d-19ec-41e8-bc1e-049d8206f614 | 0917bf21-54f8-46ce-89c8-6461001840a5 | c8aa01f8-5627-46a4-885c-dc61e0886c09 | CREATE | IN_PROGRESS | 2017-05-17T19:46:40Z | None |
+--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+----------------------+------------------+

Next I can inspect a stuck/hung deployment using the command below.  Note that server id above directly corresponds to a nova uuid.

$ heat deployment-show 05d3c2b7-104c-47fd-beeb-3819d55aafb7
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
 "status": "COMPLETE", 
 "server_id": "e1389b2e-ae2c-4095-8654-fd1cac1a8563", 
 "config_id": "1bfd85e8-217c-440f-b095-f4a2fcce4bfc", 
 "output_values": {
 "deploy_stdout": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDVWznMhZnwFKR9KH+7ppJr5f7ZI2LsxwZV6+xPkNOhNE5o2DDNGWGVoGE+2SIiENuhk938LRUixg4pj1coXl9johjv8h2PcB+6rUxVT7jSKGUdpwgTlLsNOEugcR5oJjf0mM55BMzz1nr1yZMTy9THGwU/A43RZcOVvuJXFVA3MIlYGXXUidEOQM6NMEloUpVd/hj2BXUzdaUJV83E0aSGPvED02LixV5uorY0n4OIk2EXC8LlG0863YwJEdQi7hd8/rczLwX18CFFAQJ2mOwkcuJB37V/BEWUqBczHO46e0uc1ZmHhtdPFKgFObgP8gSrT4g0HjOds/BVPgBTtQf5 tripleo-validations\n", 
 "deploy_stderr": "", 
 "deploy_status_code": 0
 }, 
 "creation_time": "2017-05-17T19:46:38Z", 
 "updated_time": "2017-05-17T19:49:26Z", 
 "input_values": {}, 
 "action": "CREATE", 
 "status_reason": "Outputs received", 
 "id": "05d3c2b7-104c-47fd-beeb-3819d55aafb7"

Deleting a Zombie Overcloud

Sometimes it is impossible to delete a failed deploy. After attempting a deploy the stack was stuck in the state below.

+--------------------------------------+------------+--------------------+----------------------+----------------------+
| id | stack_name | stack_status | creation_time | updated_time |
+--------------------------------------+------------+--------------------+----------------------+----------------------+
| 34fa8bd8-b80b-4379-925f-26465e366da7 | overcloud | DELETE_IN_PROGRESS | 2018-07-06T15:58:09Z | 2018-07-06T16:15:57Z |

I was given this command to kill it

# openstack stack list --nested --column ID --format value | xargs openstack stack delete --yes

This finally allowed the stack to delete.

[stack@tpavcpp4r1v4uc ~]$ heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
+----+------------+--------------+---------------+--------------+
| id | stack_name | stack_status | creation_time | updated_time |
+----+------------+--------------+---------------+--------------+
+----+------------+--------------+---------------+--------------+

If you run into the error below when attempting to redeploy

Exception creating plan: Unable to create plan. The Mistral environment already exists

You will need to run the command below to delete the mistral plan

#openstack overcloud plan delete overcloud

Additional Resources

https://wiki.openstack.org/wiki/Heat

http://hardysteven.blogspot.com/2015/04/debugging-tripleo-heat-templates.html

 

RHEL OSP 10/11 – OVS+DPDK Tunables

93905-6a00e551c39e1c883401a511e32c88970c-pi

Tunables for Dell R630s for use when deploying OVS+DPDK

# OSP 10/11 DPDK Tunables
#
# R630 NUMA locality – CPUs
# node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22
# 24 26 28 30 32 34 36 38 40 42 44 46
#
# node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23
# 25 27 29 31 33 35 37 39 41 43 45 47
#
#
# R630 NUMA locality – NIC
# node 0 dpdk interface – p3p1
# node 1 dpdk interface – p1p1
#
#
#
# NovaVcpuPinSet (OSP 10+)
# These are the cores that Nova will use for scheduling instances. Pair sibling threads together.
# Using cores from NUMA node 0 only to prevent crossing NUMA boundaries
NovaVcpuPinSet: “‘4,6,8,10,12,14,16,18,20,22,28,30,32,34,36,38,40,42,44,46′”
#
# NeutronDpdkCoreList (OSP 10/11) OvsPmdCoreList (OSP 12+)
# This parameter configures a list of CPU cores to be used by the OVS-DPDK Poll Mode Drivers
# The first core from a CPU, should be reserved for host processes, and should be excluded from this list.
NeutronDpdkCoreList: “‘2,26,3,27′”
#
# HostIsolatedCoreList (OSP 10/11) IsolCpusList (OSP 12+)
# A set list or range of cores (and their sibling threads) to be appended to the tuned cpu-partitioning profile and isolated from the host.
# These cores will be isolated from any host processes
# Assuming you want to isolate nova cores from all system processes, NovaVcpuPinSet + NeutronDpdkCoreList = HostIsolatedCoreList
HostIsolatedCoreList: “‘2,3,4,6,8,10,12,14,16,18,20,22,26,27,28,30,32,34,36,38,40,42,44,46′”
#
# HostCpusList (OSP 10/11) & OvsDpdkCoreList (OSP 12+)
# A list of logical cores used by OVS-DPDK processes for dpdk-lcore-mask for non-datapath operations
# These cores must be mutually exclusive from the list of cores in NeutronDpdkCoreList/OvsPmdCoreList and NovaVcpuPinSet.
# Allocate the first physical core (and sibling thread) from each NUMA node irrespective of DPDK interface NUMA locality.
HostCpusList: “‘0,24,1,25′”
#
# Provide the number of memory channels in the format – [allowed_pattern: “[0-9]+”]:
NeutronDpdkMemoryChannels: “4”
#
# Set the memory allocated for each socket:
NeutronDpdkSocketMemory: “‘2048,2048′”
#
# An array of filters used by Nova to filter a node.These filters will be applied in the order they are listed,
# so place your most restrictive filters first to make the filtering process more efficient.
NovaSchedulerDefaultFilters: “RamFilter,ComputeFilter,AvailabilityZoneFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,PciPassthroughFilter,NUMATopologyFilter”
#
# Kernel arguments for Compute node
ComputeKernelArgs: “default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on”

 

How to disable Cloud-Init in a RHEL Cloud Image

happy_cloud1600

So this one is pretty simple. However, I found a lot of misinformation along the way, so I figured that I would jot the proper (and most simple) process here.

Symptoms: a RHEL (or variant) VM that takes a very long time to boot. On the VM console, you can see the following output while the VM boot process is stalled and waiting for a timeout. Note that the message below has nothing to do with cloud init, but its the output that I have most often seen on the console while waiting for a VM to boot.

[106.325574} random: crng init done

Note that I have run into this issue in both OpenStack (when booting from external provider networks) and in KVM.

Upon initial boot of the VM, run the command below.

touch /etc/cloud/cloud-init.disabled

Seriously, that’s it. No need to disable or remove cloud-init services. See reference.

 

 

How to Install TestPMD on RHEL 7.x

dpdk (1).png

TestPMD is a lightweight application running in user space, utilizing ovs-dpdk, that can be used for testing DPDK in packet forwarding mode.

In this example we want to setup TestPMD on a RHEL VM running in our SR-IOV capable Red Hat OpenStack 10 overcloud. Our passthrough adapters are Intel X520s. Our plan here is to run performance tests via an external load generator.

Before we can get started we need to build a test VM.

VM Details

  • RHEL 7.x
  • Two VFs
    • eth0 – ssh access via admin network
    • eth1 – load generator private network
  • 4 vCPUs
  • 4096 MB Mem
  • 150 GB Disk

Deploy RHEL VM

Your first step is to deploy your RHEL VM, configure your primary network interface (eth0 for ssh) via VM console. Eth1 needs to be up and configured to start at boot, but do not assign it an IP address. Next, register your VM with your local satellite server or with RH CDN.

Download DKDP

Use the link below to download the “Latest Major” version of DPDK. Place the tarfile in /root on the VM and untar.

Install Prerequisites

Before we can compile DPDK, we need to install a few prereqs.

Install gcc

#yum -y install gcc

Install lubnuma-devel

#yum -y install glibc-devel

Install Kernel Headers and Devel

#yum -y install kernel-headers.x86_64 kernel-devel.x86_64

Install NUMA Packages

#yum -y install numad.x86_64 numactl-libs.x86_64 numactl-devel.x86_64

Install libpcap

#yum -y install libpcap.x86_64 libpcap-devel.x86_64

Install Tuned Profiles

#yum -y install tuned-profiles-cpu-partitioning.noarch

Install DPDK Tools Package

#yum -y install dpdk-tools.x86_64

Compile DPDK

Compile using the “Quick Start” guide below
http://dpdk.org/doc/quick-start

Determine the PCI address of your test interface using ethtool.

# ethtool -i eth1
driver: ixgbevf
version: 1.5.10-k
firmware-version: 5.02 0x80002390 1.1313.0
expansion-rom-version:
bus-info: 0000:00:05.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

Here the PCI address for eth1 is 0000:00:05.0

Load DPDK Driver for Testing Interface

In this test we are using the Intel x520 NIC, which is directly accessible to our VM via SR-IOV passthrough. If you are passing through a different NIC, your process will differ.

#modprobe vfio-pci
#dpdk-devbind –bind=vfio-pci 0000:00:05.0

Verify Driver using dpdk-devbind

# dpdk-devbind –s

Network devices using DPDK-compatible driver
============================================
0000:00:05.0 ‘82599 Ethernet Controller Virtual Function’ drv=igb_uio unused=ixgbevf,vfio-pci

Network devices using kernel driver
===================================
0000:00:03.0 ‘Virtio network device’ if=eth0 drv=virtio-pci unused=virtio_pci,igb_uio,vfio-pci *Active*

Example TestPDM Start Script

In the example script below, we are going to start TestPDM. By default, TestPDM will forward any packets recieved on eth1 back to the sending MAC

#!/bin/bash

#VARS
NICADDRESS1=’0000:00:05.0′

/root/dpdk-17.08/build/app/testpmd -l 0,1,2,3 –socket-mem 512 -n 4 –proc-type auto –file-prefix pg -w $NICADDRESS1 — –disable-rss –nb-cores=2 –portmask=1 –rxq=1 –txq=1 –rxd=256 –txd=256 –port-topology=chained –forward-mode=macswap -i –auto-start

The example below works for testing an MTU up to 9200 bytes

#works for 9200 byte packet
/root/dpdk-17.08/x86_64-native-linuxapp-gcc/app/testpmd –log-level 8 –huge-dir=/mnt/huge -l 0,1,2,3 -n 4 –proc-type auto –file-prefix pg -w $NICADDRESS1 — –disable-rss –nb-cores=2 –portmask=1 –rxq=2 –txq=2 –rxd=256 –txd=256 –port-topology=chained –forward-mode=mac –eth-peer=0,00:10:94:00:00:06 –mbuf-size=10240 –total-num-mbufs=32768 –max-pkt-len=9200 -i –auto-start

 

TestPMD has a plethora of options.

For additional information on the options used above, refer to the user guide.

Additional Resources

 

OpenStack: Rabbitmq Cannot Join Cluster, Already a Member

rabbitmq-sh-600x600

You can run into this error when attempting to join a node into a Rabbitmq cluster when the cluster believes that a particular node is already a member of the cluster. I have run into this issue a few times and is usually seen when attempting to recover from a crash of an OpenStack controller.

I have run into this issue a few times and is usually seen when attempting to recover from a crash of an OpenStack controller.

Below are the steps to resolve the issue.

The error below is seen when attempting to add a node back into the cluster.

INFO REPORT==== 27-Jan-2017::16:57:22 ===
Already member of cluster: [rabbit@nodectrl2,rabbit@nodectrl1,
rabbit@nodectrl0]

We check the cluster status for confirmation.

root@nodectrl1 rabbitmq]# rabbitmqctl cluster_status
Cluster status of node rabbit@nodectrl1 …
[{nodes,[{disc,[rabbit@nodectrl0,rabbit@nodectrl1,
rabbit@nodectrl2]}]},
{running_nodes,[rabbit@nodectrl2,rabbit@nodectrl1]},
{cluster_name,<<“rabbit@nodectrl0.localdomain”>>},
{partitions,[]},
{alarms,[{rabbit@nodectrl2,[]},{rabbit@nodectrl1,[]}]}]

Now we force the cluster to forget the affected node.

[root@nodectrl1 rabbitmq]# rabbitmqctl forget_cluster_node rabbit@nodectrl0
Removing node rabbit@nodectrl0 from cluster …

We then check the cluster status to ensure that it has been removed from the cluster.

[root@nodectrl1 rabbitmq]# rabbitmqctl cluster_status

Cluster status of node rabbit@nodectrl1 …
[{nodes,[{disc,[rabbit@nodectrl1,rabbit@nodectrl2]}]},
{running_nodes,[rabbit@nodectrl2,rabbit@nodectrl1]},
{cluster_name,<<“rabbit@nodectrl0.localdomain”>>},
{partitions,[]},
{alarms,[{rabbit@nodectrl2,[]},{rabbit@nodectrl1,[]}]}]

We can now add our node back into the cluster.

[root@nodectrl1 rabbitmq]#  rabbitmqctl -n nodectrl1 join_cluster rabbit@nodectrl0.localdomain