In order to use the staging-ovirt driver , I first I needed to configure the undercloud to use the staging-ovirt driver. See undercloud.conf below.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Then create an instackenv.json. In the example below pm_addr is the IP of my local RHV manager.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Note that I ran into an error importing my nodes. Error shown below.
[{u’result’: u’Node 09dfefec-e5c3-42c4-93d0-45fb44ce37a8 did not reach state “manageable”, the state is “enroll”, error: Failed to get power state for node 09dfefec-e5c3-42c4-93d0-45fb44ce37a8. Error: global name \’sdk\’ is not defined’}, {u’result’: u’Node 59dce2eb-3aea-41f9-aec2-3f13deece30b did not reach state “manageable”, the state is “enroll”, error: Failed to get power state for node 59dce2eb-3aea-41f9-aec2-3f13deece30b. Error: global name \’sdk\’ is not defined’}, {u’result’: u’Node 0895a6d0-f934-44d0-9c26-25e61b6679cb did not reach state “manageable”, the state is “enroll”, error: Failed to get power state for node 0895a6d0-f934-44d0-9c26-25e61b6679cb. Error: global name \’sdk\’ is not defined’}, {u’result’: u’Node 68bdf1cb-fe1f-48ab-b96d-fb5edaf17154 did not reach state “manageable”, the state is “enroll”, error: Failed to get power state for node 68bdf1cb-fe1f-48ab-b96d-fb5edaf17154. Error: global name \’sdk\’ is not defined’}]
With the release of Red Hat OpenStack 13, the move to containerized overcloud services is complete. Traditional systemd services such as RabbitMQ, Haproxy, Mariadb, etc, are all now running as containers in the overcloud. This move to containers is meant to provide additional stability, control, and security to the platform. Future upgrades should be easier, and future deploys should be more flexible.
However, the move to containers brings with it a couple of new challenges. Operations.
The average OpenStack administrator no longer restarts services, they restart containers.
They no longer view a rabbit cluster’s status on the controller node, but rather within a container on the controller node. Log locations have changed. Config file locations have changed.
In this post I will document the steps that I am using to create a fully virtualized OSP 10 environment in my lab. The undercloud node is a VM, as well as the overcloud nodes. We will configure libvirt so that ironic has the ability to boot and shutdown the VMs on the underlying hypervisor via Ironic.
Add the stack user on your hypervisor. In this case my hypervisor’s hostname is virt01, however we will refer to it as hypervisor for clarity.
Now attempt to libvirt as stack via a remote session. Here we are just connecting back to the localhost, virt01. In the example below, 10.1.99.112 is the ip of the hypervisor. The undercloud has an ip of 10.1.99.10
[simterm]undercloud# virsh –connect qemu+ssh://stack@10.1.99.112/system list –all
[/simterm]
Now ssh as stack to your undercloud vm
Copy stack’s public key to your hypervisor (virt01 in this case). In the command below you will replace the ip address shown with the ip that your undercloud vm will use to connect to libvirt on the hypervisor
Now we need to create a few Virtual Machines. Specifically I am building an environment with 5 virtual machines to run virtualized Red Hat Openstack 13. My overcloud will consist of 2 computes and three controller nodes
You should end up with the following virtual machines
[simterm]hypervisor# virsh list –all
Id Name State
—————————————————-
1 undercloud running
– overcloud-node1 shut off
– overcloud-node2 shut off
– overcloud-node3 shut off
– overcloud-node4 shut off
– overcloud-node5 shut off
[/simterm]
Back on the undercloud we use the command below to grab the provisioning network mac address from each virtual machine running on the hypervisor. We could run this command locally on the hypervisor, but since we need the mac addresses for ironic on the undercloud, we will run it here.
[simterm]undercloud$ for i in {1..5}; do virsh -c qemu+ssh://stack@10.1.99.112/system domiflist overcloud-node$i | awk ‘$3 == “provisioning” {print $5}’; done> /tmp/nodes.txt
[/simterm]
Now we use our temp file above to populate the instackenv.json that we will import into ironic. See gist below
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
At this point we are ready to import our nodes via Ironic.
Note that I do not claim to be the original author of the steps documented above, rather I wanted to ensure that I could easily consume these steps in the future.
Also, I look forward to experimenting with the vbmc ironic driver and might stop using pxe_ssh altogether.
Heat is the main orchestration engine for OpenStack, and is used my OpenStack director to install an OpenStack Overcloud environment.
When we run the “openstack deploy overcloud” command, we are specifically
telling RHEL OSP director that we want it to use the pre-defined Heat templates from /usr/share/openstack-tripleo-heat-templates/. OSP director will manage the
deployment of a new overcloud heat stack, using files from this directory.
When RHEL OSP director calls the Heat stack, it needs the following data…
A top-level Heat template to use that describes the overall environment and the
resources required.
An environment/resource registry to tell Heat where to find resource
definitions for non-standard Heat elements, e.g. TripleO components.
A set of parameters to declare the deployment-specific options (via -e)
The most important files for us to focus on are in our deployment directory, these are the default files that get called by OSP director.
The top-level Heat template that OSP director uses for deployment is
/usr/share/openstack-tripleo-heat-templates/overcloud-without-mergepy.yaml
The resource registry, which tells Heat where to find the templates for
deployment resources is
/usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.
yaml
Creating a Heat Stack
To create the stack we run the command below. This command instructs heat to use
the templates in ~/my_templates/, as well as the override templates specified
with the ‘-e’ option.
This is just an example of what I am using in my lab environment, your deploy command will be much different. Also note that I have copied the templates from /usr/share/openstack-tripleo-heat-templates to ~/my_templates/.
Unfortunately our deploy failed with the following errors.
Exception: Heat Stack create failed.
DEBUG: openstackclient.shell clean_up DeployOvercloud
DEBUG: openstackclient.shell got an error: Heat Stack create failed.
ERROR: openstackclient.shell Traceback (most recent call last):
File “/usr/lib/python2.7/site-packages/openstackclient/shell.py”, line 176, in
run
return super(OpenStackShell, self).run(argv)
File “/usr/lib/python2.7/site-packages/cliff/app.py”, line 230, in run
result = self.run_subcommand(remainder)
File “/usr/lib/python2.7/site-packages/cliff/app.py”, line 295, in
run_subcommand
result = cmd.run(parsed_args)
File “/usr/lib/python2.7/site-packages/cliff/command.py”, line 53, in run
self.take_action(parsed_args)
File
“/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py”,
line 864, in take_action
self._deploy_tripleo_heat_templates(stack, parsed_args)
File
“/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py”,
line 535, in _deploy_tripleo_heat_templates
parsed_args.timeout)
File
“/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py”,
line 478, in _heat_deploy
raise Exception(“Heat Stack create failed.”)
Exception: Heat Stack create failed.
We can verify that the deploy failed with the command below.
The failed resources are named “Compute” and “Controller“. Lets take a closer
look at those using the “resource-show” argument.
#heat resource-show overcloud Compute
| resource_status_reason | ResourceUnknownStatus: Resource failed – Unknown
status FAILED due to “Resource CREATE failed: ResourceUnknownStatus:
Resource failed – Unknown status FAILED due to “Resource CREATE failed:
StackValidationFailed: Property error : OsNetConfigImpl: config The Parameter
(BondInterfaceOvsOptions) was not provided.”” |
Let’s now do the same for Controller.
#heat resource-show overcloud Controller
| resource_status_reason | ResourceUnknownStatus: Resource failed – Unknown
status FAILED due to “Resource CREATE failed: ResourceUnknownStatus:
Resource failed – Unknown status FAILED due to “Resource CREATE failed:
StackValidationFailed: Property error : OsNetConfigImpl: config The Parameter
(BondInterfaceOvsOptions) was not provided.”” |
Apparently I have some issues with my OVS bonding options, so I need to get those straight before I can continue.
Deleting a Failed Heat Stack
Since our last deploy failed, we need to delete the failed stack before we can kick off another stack deploy. Below is an example of that command – note we are using the UUID of the stack.
Ok, let’s take a look at /var/log/heat/heat/heat-engine.log for more details. I also suggest opening another ssh session and tailing the log while the delete is attempting to do its thing.
If the output is too verbose to follow, I suggest attempting to thin out the output using the command below
Apparently SELinux is blocking the reads for the certificates.
There are two ways to work around this issue. You can run “restorecon -v /path/to/certs/“, or you can work around by disabling selinux by running “setenforce 0” or by editing the /etc/selinux/config file and setting ‘SELINUX=DISABLED’.
You may need to rerun the delete, in my case it was stuck in “DELETE_IN_PROGRESS”. I restarted all heat releated services to force the delete to error.
Tunables for Dell R630s for use when deploying OVS+DPDK
# OSP 10/11 DPDK Tunables
#
# R630 NUMA locality – CPUs
# node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22
# 24 26 28 30 32 34 36 38 40 42 44 46
#
# node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23
# 25 27 29 31 33 35 37 39 41 43 45 47
#
#
# R630 NUMA locality – NIC
# node 0 dpdk interface – p3p1
# node 1 dpdk interface – p1p1
#
#
#
# NovaVcpuPinSet (OSP 10+)
# These are the cores that Nova will use for scheduling instances. Pair sibling threads together.
# Using cores from NUMA node 0 only to prevent crossing NUMA boundaries
NovaVcpuPinSet: “‘4,6,8,10,12,14,16,18,20,22,28,30,32,34,36,38,40,42,44,46′”
#
# NeutronDpdkCoreList (OSP 10/11) OvsPmdCoreList (OSP 12+)
# This parameter configures a list of CPU cores to be used by the OVS-DPDK Poll Mode Drivers
# The first core from a CPU, should be reserved for host processes, and should be excluded from this list.
NeutronDpdkCoreList: “‘2,26,3,27′”
#
# HostIsolatedCoreList (OSP 10/11) IsolCpusList (OSP 12+)
# A set list or range of cores (and their sibling threads) to be appended to the tuned cpu-partitioning profile and isolated from the host.
# These cores will be isolated from any host processes
# Assuming you want to isolate nova cores from all system processes, NovaVcpuPinSet + NeutronDpdkCoreList = HostIsolatedCoreList
HostIsolatedCoreList: “‘2,3,4,6,8,10,12,14,16,18,20,22,26,27,28,30,32,34,36,38,40,42,44,46′”
#
# HostCpusList (OSP 10/11) & OvsDpdkCoreList (OSP 12+)
# A list of logical cores used by OVS-DPDK processes for dpdk-lcore-mask for non-datapath operations
# These cores must be mutually exclusive from the list of cores in NeutronDpdkCoreList/OvsPmdCoreList and NovaVcpuPinSet.
# Allocate the first physical core (and sibling thread) from each NUMA node irrespective of DPDK interface NUMA locality.
HostCpusList: “‘0,24,1,25′”
#
# Provide the number of memory channels in the format – [allowed_pattern: “[0-9]+”]:
NeutronDpdkMemoryChannels: “4”
#
# Set the memory allocated for each socket:
NeutronDpdkSocketMemory: “‘2048,2048′”
#
# An array of filters used by Nova to filter a node.These filters will be applied in the order they are listed,
# so place your most restrictive filters first to make the filtering process more efficient.
NovaSchedulerDefaultFilters: “RamFilter,ComputeFilter,AvailabilityZoneFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,PciPassthroughFilter,NUMATopologyFilter”
#
# Kernel arguments for Compute node
ComputeKernelArgs: “default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on”
So this one is pretty simple. However, I found a lot of misinformation along the way, so I figured that I would jot the proper (and most simple) process here.
Symptoms: a RHEL (or variant) VM that takes a very long time to boot. On the VM console, you can see the following output while the VM boot process is stalled and waiting for a timeout. Note that the message below has nothing to do with cloud init, but its the output that I have most often seen on the console while waiting for a VM to boot.
[106.325574} random: crng init done
Note that I have run into this issue in both OpenStack (when booting from external provider networks) and in KVM.
Upon initial boot of the VM, run the command below.
touch /etc/cloud/cloud-init.disabled
Seriously, that’s it. No need to disable or remove cloud-init services. See reference.
TestPMD is a lightweight application running in user space, utilizing ovs-dpdk, that can be used for testing DPDK in packet forwarding mode.
In this example we want to setup TestPMD on a RHEL VM running in our SR-IOV capable Red Hat OpenStack 10 overcloud. Our passthrough adapters are Intel X520s. Our plan here is to run performance tests via an external load generator.
Before we can get started we need to build a test VM.
VM Details
RHEL 7.x
Two VFs
eth0 – ssh access via admin network
eth1 – load generator private network
4 vCPUs
4096 MB Mem
150 GB Disk
Deploy RHEL VM
Your first step is to deploy your RHEL VM, configure your primary network interface (eth0 for ssh) via VM console. Eth1 needs to be up and configured to start at boot, but do not assign it an IP address. Next, register your VM with your local satellite server or with RH CDN.
Download DKDP
Use the link below to download the “Latest Major” version of DPDK. Place the tarfile in /root on the VM and untar.
Install Prerequisites
Before we can compile DPDK, we need to install a few prereqs.
In this test we are using the Intel x520 NIC, which is directly accessible to our VM via SR-IOV passthrough. If you are passing through a different NIC, your process will differ.