Tunables for Dell R630s for use when deploying OVS+DPDK
# OSP 10/11 DPDK Tunables
# R630 NUMA locality – CPUs
# node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22
# 24 26 28 30 32 34 36 38 40 42 44 46
# node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23
# 25 27 29 31 33 35 37 39 41 43 45 47
# R630 NUMA locality – NIC
# node 0 dpdk interface – p3p1
# node 1 dpdk interface – p1p1
# NovaVcpuPinSet (OSP 10+)
# These are the cores that Nova will use for scheduling instances. Pair sibling threads together.
# Using cores from NUMA node 0 only to prevent crossing NUMA boundaries
# NeutronDpdkCoreList (OSP 10/11) OvsPmdCoreList (OSP 12+)
# This parameter configures a list of CPU cores to be used by the OVS-DPDK Poll Mode Drivers
# The first core from a CPU, should be reserved for host processes, and should be excluded from this list.
# HostIsolatedCoreList (OSP 10/11) IsolCpusList (OSP 12+)
# A set list or range of cores (and their sibling threads) to be appended to the tuned cpu-partitioning profile and isolated from the host.
# These cores will be isolated from any host processes
# Assuming you want to isolate nova cores from all system processes, NovaVcpuPinSet + NeutronDpdkCoreList = HostIsolatedCoreList
# HostCpusList (OSP 10/11) & OvsDpdkCoreList (OSP 12+)
# A list of logical cores used by OVS-DPDK processes for dpdk-lcore-mask for non-datapath operations
# These cores must be mutually exclusive from the list of cores in NeutronDpdkCoreList/OvsPmdCoreList and NovaVcpuPinSet.
# Allocate the first physical core (and sibling thread) from each NUMA node irrespective of DPDK interface NUMA locality.
# Provide the number of memory channels in the format – [allowed_pattern: "[0-9]+"]:
# Set the memory allocated for each socket:
# An array of filters used by Nova to filter a node.These filters will be applied in the order they are listed,
# so place your most restrictive filters first to make the filtering process more efficient.
# Kernel arguments for Compute node
ComputeKernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on"
This article documents the configuration used to configure SR-IOV in OSP 8/Liberty on Dell hardware
Compute Node Configuration
This section will outline the changes needed to configure SR-IOV on each Compute Node.
Bios Configuration on Dell Compute Nodes
First, you will need to ssh to the drac of each Compute Node. Then, type the command below to enter racadm command line.
Type the command below to enable SRIOV.
#racadm set BIOS.IntegratedDevices.SriovGlobalEnable Enabled
RAC1017: Successfully modified the object value and the change is in
To apply modified value, create a configuration job and reboot
the system. To create the commit and reboot jobs, use “jobqueue”
command. For more information about the “jobqueue” command, see RACADM
Type the command below to verify your configuration.
This tells nova that all VFs belonging to the physical interface, “p1p1“, are allowed to be passed through to VMs and belong to the neutron provider network “sriov_net1” and all VFs belonging to the physical interface, “p3p1“, are allowed to be passed through for the network “sriov_net2“.
Restart nova-compute on each compute node with the command shown below.
#systemctl restart openstack-nova-compute
Install and Enable Neutron Sriov-Agent (Compute)
Note that the sriov-agent is not required, however, we are going to install and configure it anyway.
Using a few simple commands you can easily map a PCI slot back to its directly connected NUMA node. This information comes in very handy when implementing NFV leveraged technologies such as CPU Pinning and SRIOV.
First, you will need to install hwloc and hwloc-gui, if it is not already installed on your system. hwloc-gui provides the lstopo command, so you will need to install the gui package even if you are going to run the command on a headless system.
# yum -y install hwloc.x86_64 hwloc-gui.x86_64
Now you can run lstopo. Below is the output from one of my dual socket, quad core Xeon systems.
The first 27 lines of output tell you which cores are in each socket.
Lines starting with “HostBridge L#0” list the PCI devices attached to socket 0. On more modern dual socket systems (think Sandybridge) you would have a “HostBridge L#8” section as well.
“The PCI host bridge provides an interconnect between the processor and peripheral components. Through the PCI host bridge, the processor can directly access main memory independent of other PCI bus masters. For example, while the CPU is fetching data from the cache controller in the host bridge, other PCI devices can also access the system memory through the host bridge. The advantage of this architecture lies in its separation of the I/O bus from the processor’s host bus.”
Unfortunately, my lab systems are Nehalem based machines which implement what is called QPI to share a host bridge between CPU sockets. See image below.
Nehalem QPI Architecture
Nonetheless, we are able to determine which CPU socket is associated with a specific PCI device. For this example, we will focus on the devices below since they are both directly attached to the PCI Host Bridge and not the PCI Bus.
Net L#0 “enp8s0f0”
Net L#1 “enp8s0f1”
Now using the lspci command I can find the exact devices per NUMA node.