Red Hat OpenStack 13: Containerized Services Operations Guide

Screenshot from 2018-11-10 18-59-16.png

Contain Yourself

With the release of Red Hat OpenStack 13, the move to containerized overcloud services is complete.  Traditional systemd services such as RabbitMQ, Haproxy, Mariadb, etc, are all now running as containers in the overcloud.  This move to containers is meant to provide additional stability, control, and security to the platform. Future upgrades should be easier, and future deploys should be more flexible.

However, the move to containers brings with it a couple of new challenges. Operations.

The average OpenStack administrator no longer restarts services, they restart containers.

They no longer view a rabbit cluster’s status on the controller node, but rather within a container on the controller node.  Log locations have changed. Config file locations have changed.

So let’s retrain ourselves.

Continue reading

Linux: Using Tcpdump to Capture LLDP Info

inner-banner-itnetworkaudit

According to Wikipedia, “Link Layer Discovery Protocol (LLDP) is a vendor-neutral link layer protocol in the Internet Protocol Suite used by network devices for advertising their identity, capabilities, and neighbors on an IEEE 802 local area network, principally wired Ethernet”

LLDP is often what you will find running on non-Cisco switches and routers (which usually run CDP). If you want to use tcpdump to capture northbound switch port information, you can use the example below as a guide.

 

# tcpdump -nn -v -i p4p2 ether proto 0x88cc
tcpdump: WARNING: p4p2: no IPv4 address assigned
tcpdump: listening on p4p2, link-type EN10MB (Ethernet), capture size 65535 bytes
19:00:12.559556 LLDP, length 218
Chassis ID TLV (1), length 7
Subtype MAC address (4): f4:8e:38:28:b6:89
Port ID TLV (2), length 11
Subtype Interface Name (5): ethernet11
Time to Live TLV (3), length 2: TTL 120s
Port Description TLV (4), length 39: BCF Port ethernet11
System Name TLV (5), length 22: Switch01
..trunc..

 

 

 

Basic AIX Performance Troubleshooting Commands

600px-Orange_x.svgWow, today I logged into my first AIX Server in about 4.5 years. It was a horrible experience. I’ve been working with Redhat/CentOS pretty much exculsively for so long, I was mostly helpless to do anything of importance on the CLI other than create a few users and move some files around.  None of the common commands that I am so used to using even exist in AIX.

Figured I would do a bit of homework and figure out how to do some basic troubleshooting before I was in a server down situation with no idea how to troubleshoot.

Checking Free Memory

To check free memory on a box use the svmon command.

svmon -G

Overall System Status

For this you will probably want to use topas, which is pretty simiar to top. Topas gives you a quick and dirty overview of what is going on on a system. Here you can find CPU usage, top processes, disk utililization. Check out the fancy screen shot below.

Top-ass1

List Volume Groups

Wow, Linux has really confused me on this one. Anyway, use lsvg

# lsvg -o
rootvg
crsrdb_bin
crsprdb_data
crsprdb_index
crsprdb_arch
crsprdb_rman

List Info About a Volume Group.

# lsvg rootvg

Display Names of all Logical Volumes in a Volume Group.

# lsvg -l rootvg

Display Physical Memory

# lsattr -El sys0 -a realmem

Finding Disk I/O Issues

Sar appears to be a fine option here. Especially since I am looking for percent busy. Iostat also exists on AIX, btw.

# sar -d 1 2

Show Network Throughput

The more I poke around the internet trying to figure out how to actually use AIX the more I keep running into topas. Anyway this one is a good one

#topas -E

I plan to have more of these one liners documented here in the future, but for now this is going to have to do.

RHEL6 – SELinux Troubleshooting II: Electric Boogaloo

Little_Miss_Trouble_by_Percyfan94So a good while back I posted an article on how to troubleshoot SELinux violations and after reviewing that article as part of a troubleshooting exercise, I realized that I left out a few details. Needless to say my original article was not as clear as it should be. Anyway I wanted to use up a few more bytes of the internet to clarify.

When the package setroubleshoot-server is installed, SELinux violations will be sent to /var/log/messages, which makes it fairly easy to troubleshoot SELinux issues.

So first lets install setroubleshoot and all its parts

# yum install setroubleshoot*

In my case on RHEL6, the following packages were installed

setroubleshoot-plugins-3.0.40-1.el6.noarch
setroubleshoot-server-3.0.47-3.el6_3.x86_64
setroubleshoot-3.0.47-3.el6_3.x86_64

Note that the setroubleshoot-server is the one that you need to troubleshoot via the command line.

Now lets generate a violation. In this case I am just dropping a file with the wrong selinux context into /var/www/html and am trying to access it.

# touch /root/file3 && cp /root/index.html /var/www/html/file3

Check the context if you must to make sure that its not correct for httpd content. In this case you can see that it is not.

# ls -lZ /var/www/html/file3
-rwxrwxrwx. root root system_u:object_r:admin_home_t:s0 /var/www/html/file3

Now start Apache and try to access the file via elinks or a browser. You will get a Forbidden error, which I have omitted below.

# elinks -dump http://localhost/file3

Note that you may need to restart auditd if your message does not show up in the messages file.

Aug 11 17:08:39 vfatmin01 setroubleshoot: SELinux is preventing /usr/sbin/httpd from getattr access on the file /var/www/html/file3. For complete SELinux messages. run sealert -l 5a413022-af89-4222-b055-0cc1edc4bbad

Note: You will also find a the same error in /var/log/audit/audit.log, albeit in a bit less friendly format.

type=AVC msg=audit(1344719319.890:7196): avc:  denied  { getattr } for  pid=6765 comm=”httpd” path=”/var/www/html/file3″ dev=dm-1 ino=656718 scontext=unconfined_u:system_r:httpd_t:s0 tcontext=unconfined_u:object_r:admin_home_t:s0 tclass=file

Anyway back to the error from the messages file. At the end of the error you are shown the UUID of the error and the sealert command to run to get more information on the error.

# sealert -l 5a413022-af89-4222-b055-0cc1edc4bbad

Output below:

SELinux is preventing /usr/sbin/httpd from getattr access on the file /var/www/html/file3.

*****  Plugin restorecon (99.5 confidence) suggests  *************************

If you want to fix the label.
/var/www/html/file3 default label should be httpd_sys_content_t.
Then you can run restorecon.
Do
# /sbin/restorecon -v /var/www/html/file3

Wow, sealert actually tells you why the file is being blocked and the commands that you should run to fix the problem. Nice!

RHEL6 – Restore Grub on MBR

GrubGRUB, which stands for the GRand Unified Bootloader is the default boot loader in Linux these days ( it replaced LILO). When your server boots, the system BIOS transfers control to the Master Boot Record of your first boot device which is where Grub is installed.  If the removed, damaged, or overwritten, then you will not be able to boot, and in which case you will need to repair/reinstall grub.

The entire process only takes a few minutes if you already have a Redhat/Centos cd to boot of off. Just slap that sucker into the cd drive (or virtual cd drive) and at the Boot Menu type “linux rescue”

Then run grub as shown below.

#grub

Next identify your /boot partition

#root (hd0,0)

Then install first stage grub into the MBR

# setup (hd0)

then exit.

All you ever wanted to know about Grub can be found below

http://en.wikipedia.org/wiki/GNU_GRUB

Broadcom (bnx2) Network Adapters Dropping Recieved Packets Under Linux

VampSo a few weeks ago some of our Centos 5.4 and OEL 5.5 servers started exibiting strange connectivity problems. Monitoring started alerting that hosts were down when they weren't; some boxes could ping target hosts and some couldn't; some boxes became unresponsive when interfaces were failed over, and the strangest of all is that some of the boxes would magically "repair" themselves. Like I said, strange.

Over the next week or so we ran into the issue a few more times and were able to see a pattern emerge. All the affected servers were running Centos 5.4 or Oracle Linux 5.5 and had broadcom (bnx2) adapters that were on the recieving end of some pretty decent traffic. Most importantly, all had a good number of dropped recieved packets that was continuously, albeit slowly, increasing.

A bit of google research led us to this bugzilla, which suggested changing the adapter's coalescense settings…. so a bit on coalescense.

Coalescense

In your network adapter, coalescence is all about interupts. Traditionally interupt coalescense (or IC) is used to reduce the number of interupts generated by the system by delaying the generating of an interrupt by a very short period of time…think less then a milisecond. In turn more traffic will be recieved by the host and the next interupt generated will be larger in size. You can find out more than you would ever want to know about coalesence here

The Fix

So apparently the Broadcom IC settings were not aggressive enough. Packets would come in, fill up the receive queue, and get dropped before they could be sent off for processing via an interrupt. This takes us back to the bugzilla above and the suggested settings below which you set with the ethtool command

 ethtool -C ethX rx-usecs 8 rx-usecs-irq 8 rx-frames 0 rx-frames-irq 0

Note that this was not an issue on any in Centos 5.6, any server with Intel adapters, or any server with 10g adapters. As a matter of fact, those servers had IC settings even more agressive then those above. See the Intel 82599EB 10-gigabit settings below

rx-usecs: 1
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0

Final Configuration

So now that we know the fix we need to make it permanent, which is not as easy as editing a config file for the device as the coalescence config is set at boot and it part of the installed driver for the device. Rather than muck around with trying to modify the driver itself, we decided to set and configure our devices at boottime with a rc script that checks the checks the each network interface on the box and modifys their IC settings if they are using the bnx2 (Broadcom) driver.  We dropped the script below into /etc/rc.d and created a symbolic link to it in /etc/rc3.d.

#!/bin/bash

case "$1" in
start)

IFACE=$(ls /etc/sysconfig/network-scripts/ifcfg-eth*| grep -v bak | cut -d – -f 3)

for ETH in $IFACE
        do
                if ( ethtool -i $ETH | grep -qw bnx2 )
                then
                        echo "$Changing Settings for $ETH"
                        ethtool -C $ETH rx-usecs 8 rx-usecs-irq 8 rx-frames 0 rx-frames-irq 0
                else
                        echo "$ETH is not a broadcom"

                fi
        done
exit 1
;;

stop)

echo " hammer time"
;;

*)
    echo "usage: $0 (start|stop)"
;;
esac

RHEL6 — Troubleshooting SELinux Violations

Sad_face1Dear Reader: Welcome to my third and not final installment on SELinux. The first two can be read here and here. They are exciting reads and are sure to have you on the edge of your seat.

Anyway, the best way to implement SELinux sucessfully is to know how to troubleshoot when things aren’t going your way. If you panic at the first sign of trouble, you are just going to end up turning off SELinux and not reap the rich rewards that it will bring you in life. Now that I have convinced you to run SELinux lets get started.

First install the package setroubleshoot, which will send SELinux messages to our messages file.

yum -y install setroubleshoot-server.x86_64

Now you can search the messages file for SELinux Violations. Use sealert -l UUID to find information on a specific incident, or sealert -a  /var/log/audit.log to search an entire log file for violations.

In this specfic example, I created a test file and dropped it in /var/www/html, however I did not set the context to httpd_sys_content_t, then i attempted to view the file in a browser. Obviously access was denied. The output of sealert shows me the error and then tells me how to fix it.

Summary:

SELinux is preventing /usr/sbin/httpd “getattr” access to /var/www/html/file3.

Detailed Description:

SELinux denied access requested by httpd. /var/www/html/file3 may be a
mislabeled. /var/www/html/file3 default SELinux type is httpd_sys_content_t, but
its current type is admin_home_t. Changing this file back to the default type,
may fix your problem.

…TRUNCATED…

Allowing Access:

You can restore the default system context to this file by executing the
restorecon command. restorecon ‘/var/www/html/file3’, if this file is a
directory, you can recursively restore using restorecon -R
‘/var/www/html/file3’.

Fix Command:

/sbin/restorecon ‘/var/www/html/file3’

Boom goes the dynomite! Problem solved.