With the release of Red Hat OpenStack 13, the move to containerized overcloud services is complete. Traditional systemd services such as RabbitMQ, Haproxy, Mariadb, etc, are all now running as containers in the overcloud. This move to containers is meant to provide additional stability, control, and security to the platform. Future upgrades should be easier, and future deploys should be more flexible.
However, the move to containers brings with it a couple of new challenges. Operations.
The average OpenStack administrator no longer restarts services, they restart containers.
They no longer view a rabbit cluster’s status on the controller node, but rather within a container on the controller node. Log locations have changed. Config file locations have changed.
According to Wikipedia, “Link Layer Discovery Protocol (LLDP) is a vendor-neutral link layer protocol in the Internet Protocol Suite used by network devices for advertising their identity, capabilities, and neighbors on an IEEE 802 local area network, principally wired Ethernet”
LLDP is often what you will find running on non-Cisco switches and routers (which usually run CDP). If you want to use tcpdump to capture northbound switch port information, you can use the example below as a guide.
# tcpdump -nn -v -i p4p2 ether proto 0x88cc
tcpdump: WARNING: p4p2: no IPv4 address assigned
tcpdump: listening on p4p2, link-type EN10MB (Ethernet), capture size 65535 bytes
19:00:12.559556 LLDP, length 218
Chassis ID TLV (1), length 7
Subtype MAC address (4): f4:8e:38:28:b6:89
Port ID TLV (2), length 11
Subtype Interface Name (5): ethernet11
Time to Live TLV (3), length 2: TTL 120s
Port Description TLV (4), length 39: BCF Port ethernet11
System Name TLV (5), length 22: Switch01
Wow, today I logged into my first AIX Server in about 4.5 years. It was a horrible experience. I’ve been working with Redhat/CentOS pretty much exculsively for so long, I was mostly helpless to do anything of importance on the CLI other than create a few users and move some files around. None of the common commands that I am so used to using even exist in AIX.
Figured I would do a bit of homework and figure out how to do some basic troubleshooting before I was in a server down situation with no idea how to troubleshoot.
Checking Free Memory
To check free memory on a box use the svmon command.
Overall System Status
For this you will probably want to use topas, which is pretty simiar to top. Topas gives you a quick and dirty overview of what is going on on a system. Here you can find CPU usage, top processes, disk utililization. Check out the fancy screen shot below.
List Volume Groups
Wow, Linux has really confused me on this one. Anyway, use lsvg
So a good while back I posted an article on how to troubleshootSELinux violations and after reviewing that article as part of a troubleshooting exercise, I realized that I left out a few details. Needless to say my original article was not as clear as it should be. Anyway I wanted to use up a few more bytes of the internet to clarify.
When the package setroubleshoot-server is installed, SELinux violations will be sent to /var/log/messages, which makes it fairly easy to troubleshoot SELinux issues.
So first lets install setroubleshoot and all its parts
# yum install setroubleshoot*
In my case on RHEL6, the following packages were installed
Note that you may need to restart auditd if your message does not show up in the messages file.
Aug 11 17:08:39 vfatmin01 setroubleshoot: SELinux is preventing /usr/sbin/httpd from getattr access on the file /var/www/html/file3. For complete SELinux messages. run sealert -l 5a413022-af89-4222-b055-0cc1edc4bbad
Note: You will also find a the same error in /var/log/audit/audit.log, albeit in a bit less friendly format.
GRUB, which stands for the GRand Unified Bootloader is the default boot loader in Linux these days ( it replaced LILO). When your server boots, the system BIOS transfers control to the Master Boot Record of your first boot device which is where Grub is installed. If the removed, damaged, or overwritten, then you will not be able to boot, and in which case you will need to repair/reinstall grub.
The entire process only takes a few minutes if you already have a Redhat/Centos cd to boot of off. Just slap that sucker into the cd drive (or virtual cd drive) and at the Boot Menu type “linux rescue”
Then run grub as shown below.
Next identify your /boot partition
Then install first stage grub into the MBR
# setup (hd0)
All you ever wanted to know about Grub can be found below
So a few weeks ago some of our Centos 5.4 and OEL 5.5 servers started exibiting strange connectivity problems. Monitoring started alerting that hosts were down when they weren't; some boxes could ping target hosts and some couldn't; some boxes became unresponsive when interfaces were failed over, and the strangest of all is that some of the boxes would magically "repair" themselves. Like I said, strange.
Over the next week or so we ran into the issue a few more times and were able to see a pattern emerge. All the affected servers were running Centos 5.4 or Oracle Linux 5.5 and had broadcom (bnx2) adapters that were on the recieving end of some pretty decent traffic. Most importantly, all had a good number of dropped recieved packets that was continuously, albeit slowly, increasing.
A bit of google research led us to this bugzilla, which suggested changing the adapter's coalescense settings…. so a bit on coalescense.
In your network adapter, coalescence is all about interupts. Traditionally interupt coalescense (or IC) is used to reduce the number of interupts generated by the system by delaying the generating of an interrupt by a very short period of time…think less then a milisecond. In turn more traffic will be recieved by the host and the next interupt generated will be larger in size. You can find out more than you would ever want to know about coalesence here
So apparently the Broadcom IC settings were not aggressive enough. Packets would come in, fill up the receive queue, and get dropped before they could be sent off for processing via an interrupt. This takes us back to the bugzilla above and the suggested settings below which you set with the ethtool command
Note that this was not an issue on any in Centos 5.6, any server with Intel adapters, or any server with 10g adapters. As a matter of fact, those servers had IC settings even more agressive then those above. See the Intel 82599EB 10-gigabit settings below
So now that we know the fix we need to make it permanent, which is not as easy as editing a config file for the device as the coalescence config is set at boot and it part of the installed driver for the device. Rather than muck around with trying to modify the driver itself, we decided to set and configure our devices at boottime with a rc script that checks the checks the each network interface on the box and modifys their IC settings if they are using the bnx2 (Broadcom) driver. We dropped the script below into /etc/rc.d and created a symbolic link to it in /etc/rc3.d.
for ETH in $IFACE do if ( ethtool -i $ETH | grep -qw bnx2 ) then echo "$Changing Settings for $ETH" ethtool -C $ETH rx-usecs 8 rx-usecs-irq 8 rx-frames 0 rx-frames-irq 0 else echo "$ETH is not a broadcom"
Dear Reader: Welcome to my third and not final installment on SELinux. The first two can be read here and here. They are exciting reads and are sure to have you on the edge of your seat.
Anyway, the best way to implement SELinux sucessfully is to know how to troubleshoot when things aren’t going your way. If you panic at the first sign of trouble, you are just going to end up turning off SELinux and not reap the rich rewards that it will bring you in life. Now that I have convinced you to run SELinux lets get started.
First install the package setroubleshoot, which will send SELinux messages to our messages file.
yum -y install setroubleshoot-server.x86_64
Now you can search the messages file for SELinux Violations. Use sealert -l UUID to find information on a specific incident, or sealert -a /var/log/audit.log to search an entire log file for violations.
In this specfic example, I created a test file and dropped it in /var/www/html, however I did not set the context to httpd_sys_content_t, then i attempted to view the file in a browser. Obviously access was denied. The output of sealert shows me the error and then tells me how to fix it.
SELinux is preventing /usr/sbin/httpd “getattr” access to /var/www/html/file3.
SELinux denied access requested by httpd. /var/www/html/file3 may be a mislabeled. /var/www/html/file3 default SELinux type is httpd_sys_content_t, but its current type is admin_home_t. Changing this file back to the default type, may fix your problem.
You can restore the default system context to this file by executing the restorecon command. restorecon ‘/var/www/html/file3’, if this file is a directory, you can recursively restore using restorecon -R ‘/var/www/html/file3’.