So this one is pretty simple. However, I found a lot of misinformation along the way, so I figured that I would jot the proper (and most simple) process here.
Symptoms: a RHEL (or variant) VM that takes a very long time to boot. On the VM console, you can see the following output while the VM boot process is stalled and waiting for a timeout. Note that the message below has nothing to do with cloud init, but its the output that I have most often seen on the console while waiting for a VM to boot.
[106.325574} random: crng init done
Note that I have run into this issue in both OpenStack (when booting from external provider networks) and in KVM.
Upon initial boot of the VM, run the command below.
Seriously, that’s it. No need to disable or remove cloud-init services. See reference.
Plotnetcfg is a Linux utility that you can use to scan the networking configuration on a server and output the configuration hierarchy to a file. Plotnetcfg is most useful when troubleshooting complex virtual networks with all sorts of bonds and bridges, the likes of which you will find on KVM nodes, or OpenStack Controller nodes.
You can install plot on RHEL/Centos as shown below.
# yum -y plotnetcfg.x86_64
You will also want to install the “dot” command which is installed with graphiz. See below.
# yum -y install graphviz.x86_64
Now that the bits and pieces are installed we can run the command below which outputs to PDF file named file.pd
# plotnetcfg | dot -Tpdf > file.pd
If you want to, you can also use “convert” to convert the PDF to a jpg. For example, I exported to jpg to embed below.
Super clean, and super easy to read and understand
According to Wikipedia, Numa is — “a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non-local memory (memory local to another processor or memory shared between processors). The benefits of NUMA are limited to particular workloads, notably on servers where the data are often associated strongly with certain tasks or users.“
So what does this mean for a Virtual Machine optimization under KVM/Libvirt? It means that for best performance, you want to configure your multi-vcpu VMs to use only cores from the same physical CPU (or numa node).
So how do we do this? See the example below from one of my homelab servers. This machine has two hyperthreaded quad core Xeons (x5550) — for a total of 16 cores.
First we use the “lspcu” command to determine which cpu cores are tied to which CPU. This is in bold below.
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 4
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model name: Intel(R) Xeon(R) CPU X5550 @ 2.67GHz
CPU MHz: 2668.000
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K NUMA node0 CPU(s): 0-3,8-11 NUMA node1 CPU(s): 4-7,12-15
Using the virsh command, we can inspect the CPU pinning for my test VM called “mytestvm“.
After a few hours poking around a newly deployed UCS cluster trying to get some basic profiles created and assigned. I realized that I had actually no idea how the KVM is actually supposed to work inside the UCS cluster. Which is funny as this was a subject that we touched on during my DCUDC class. Apparently we did not touch on it enough.
Anyway, before I get ahead of myself, lets review the gear in this cluster.
Now in my network all lights out management ips (ilos, ipmi, etc) are all on one particular vlan, which for the purpose of this post we will call VLAN 100. Non application related infrastructure equipment (servers, virtual hosts) are on another vlan, which we will call VLAN 200. So when the Fabric Interconnets were deployed, I gave them each an ip address on VLAN 200. And once UCS Manager was up and running, I created a KVM ip address pool of unused ip addresses on VLAN 100. Well guess what, this is wrong.
Routing for the KVM addresses is done through the management interfaces on the Fabric Interconects, so unless you are using vlan tagging, your KVM pool must be on the same vlan as the ip addresses assigned to your Fabric Interconnects.
But wait, why is this?
I thought that I could even assign private 192.168.x.x ip addreses to the KVMs as they were only supposed to be managed via the UCS Manager, but this also incorrect.
Navigate to one of your working KVM ip addresses in a web browser and you can access the KVM of the blade outside of UCS Manager. Nice, which is how I actually expected this to work.
Note that I find it rather dumb to have my KVM management ips and Fabric Interconnects on the same vlan as a rule, however since this is how its supposed to work I am going to have to let that one go.
Now, the fact that you can navigate to a specific KVM IP address via a web browser also makes the idea of using a pool of ip addresses silly. Would you not want to hard code the KVM ip address in the service profile so that you always know which server's console you are logging into? Dunno, I am still working on figuring that one out.