Red Hat Satellite 6.x: Restarting Services

sputnik-2-icon

Introduction

Red Hat Satellite consists of a number of running services. Restarting each service manually can be painful. Luckily you can use the commands below to easily restart all services.

List Services

Run the command below to view a list of all Satellite services that are started at boot.

# katello-service list
Redirecting to ‘foreman-maintain service’
Running Service List
========================================================================
List applicable services:
dynflowd.service enabled
foreman-proxy.service enabled
httpd.service enabled
postgresql.service enabled
pulp_celerybeat.service enabled
pulp_resource_manager.service enabled
pulp_streamer.service enabled
pulp_workers.service enabled
puppetserver.service enabled
qdrouterd.service enabled
qpidd.service enabled
rh-mongodb34-mongod.service enabled
smart_proxy_dynflow_core.service enabled
squid.service enabled
tomcat.service enabled

All services listed [OK]

Check Service Status

The command below will check the status of all Satellite services. The output is similar to running a systemctl status on all Satellite specific services. The output can be quite verbose.

# katello-service status

Stop Services

Use the command below to stop all Satellite services.

# katello-service stop

Start Services

Use the command below to start all Satellite services.

# katello-service start

Restart Services

The command below will restart all Satellite services.

# katello-service restart

Advertisements

RHEL7: Install RHV Guest Agent and Drivers

howto-draw-octopuses-tutorials_html_101880e

About the Guest Agent

The RHEL 7.x virtual machine guest agent in RHV 4.x provides  drivers, additional data, and functionality once installed on a RHEL virtual machine.

The guest agent includes:

  • virtio-net paravirtualized network driver
  • virtio-scsi paravirtualized HDD driver
  • virtio-balloon driver which improves memory overcommit (currently not used by RHV)
  • rhevm-guest-agent common which allows RHV to retrieve guest internal information such as IP addresses and allows RHV to gracefully reboot the guest

You can view the entire list here.

When spawning a virtual machine in RHV without the guest agent, a warning will appear as an exclamation mark in RHV.

Screenshot from 2019-07-02 16-11-58

Register with Satellite

You can skip this step if your guest is already registered

In order to install the guest agent, I must first register the virtual machine with my local satellite. If not using a local satellite server, you can register with RHN.

First we need to grab the katello rpm from satellite – an insecure satellite server in this case as we are using a self-signed cert.

# curl -Ok https://satellite.lab.localdomain/pub/katello-ca-consumer-latest.noarch.rpm

Then install the rpm.

# rpm -ivh katello-ca-consumer-latest.noarch.rpm

Now register with Satellite. In the example below, we are using a custom activation key and organization.

# subscription-manager register –activationkey=”auburn-lab-ak” –org=”lab”

Installing the Guest Agent

You will need to ensure that the RHEL 7 RH Common repo is enabled. If the repo is not available to the guest, you will need to enable.

# yum repolist | grep common
!rhel-7-server-rh-common-rpms/7Server/x86_64 Red Hat Enterprise Linux 234

If the proper repo is enabled, then run the following command.

Once installed, the orange exclamation point will disappear.

Screenshot from 2019-07-02 17-13-19

Red Hat Satellite: Create and Publish Content Views for RHEL + OpenStack

sputnik-2-icon.png

Overview

In this post I will review the process of creating Content Views (CV), Composite Content Views (CCV), publishing each view, and creating lifecycles.

Note that in this post we are working with Red Hat Satellite 6.4, in which there was a major overhaul of the WebUI. You may have noticed that all menus are now situated on in a pane on the left, rather than at the top of each page.

Sync Plans

A sync plan is a constant, scheduled synchronization of updates of a Red Hat Satellite repository and the source repositories. I suggest syncing either daily or weekly in order to minimize the deltas between each sync. When you sync more often, the amount of change between syncs is less and therefore should complete faster than a monthly sync.

Note that this step assumes that you have already setup the correct repositories for RHEL and Red Hat OpenStack.  A list of required repositories can be found in the Red Hat OpenStack Director Installation and Usage Guide.

Navigate to Content > Sync Plans

Screenshot from 2019-03-18 17-02-19.png

Here we create a daily sync plan for RHEL 7.

Screenshot from 2019-03-18 17-03-37.png

We now add RHEL 7 as the product.

Screenshot from 2019-03-18 17-05-06.png

Now we need to create a daily sync plan for Red Hat OpenStack.

Screenshot from 2019-03-18 17-07-16.png

Note: you might need to create a sync plan for Ceph as well.  Ensure all plans sync at the same interval.

Create a Content View

Now we need to create our content views. We will create one for RHEL, and one for OSP. If you are using ceph, you will need to create a content view for it as well.

 

Continue reading

Configure RHEL7/Centos 7 as a Virtualization Host

smarta

This is a fresh install of RHEL 7.5

First install the packages as shown below.


1
yum install qemu-kvm libvirt

Now install the additional recommened virtualization packages


1
# yum install virt-install libvirt-python virt-manager virt-install libvirt-client 

Now restart libvirtd


1
# systemctl restart libvirtd

 
Now you should be able to launch virt-manager from your remote machine and add a connection to your new virtualization host.

Special note. Make sure that you disable NetworkManager


1
2
# systemctl stop NetworkManager
# systemctl disable NetworkManager

OpenStack: Introduction to Troubleshooting Heat

keyboard-key-board-melt-broken-computer_p

Introduction to Heat

Heat is the main orchestration engine for OpenStack, and is used my OpenStack director to install an OpenStack Overcloud environment.

When we run the “openstack deploy overcloud” command, we are specifically
telling RHEL OSP director that we want it to use the pre-defined Heat templates from
/usr/share/openstack-tripleo-heat-templates/. OSP director will manage the
deployment of a new overcloud heat stack, using files from this directory.
When RHEL OSP director calls the Heat stack, it needs the following data…

  • A top-level Heat template to use that describes the overall environment and the
    resources required.
  • An environment/resource registry to tell Heat where to find resource
    definitions for non-standard Heat elements, e.g. TripleO components.
  • A set of parameters to declare the deployment-specific options (via -e)

 

The most important files for us to focus on are in our deployment directory, these are the default files that get called by OSP director.

  • The top-level Heat template that OSP director uses for deployment is
    /usr/share/openstack-tripleo-heat-templates/overcloud-without-mergepy.yaml
  • The resource registry, which tells Heat where to find the templates for
    deployment resources is
    /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.
    yaml

Creating a Heat Stack

To create the stack we run the command below. This command instructs heat to use
the templates in ~/my_templates/, as well as the override templates specified
with the ‘-e’ option.

This is just an example of what I am using in my lab environment, your deploy command will be much different. Also note that I have copied the templates from /usr/share/openstack-tripleo-heat-templates to ~/my_templates/.

#openstack overcloud deploy –debug –templates ~/my_templates/ \
–ntp-server 10.1.0.1 –control-scale 3 –compute-scale 2 \
-e ~/my_templates/advanced-networking.yaml

Troubleshooting a Failed Heat Stack

Unfortunately our deploy failed with the following errors.

Exception: Heat Stack create failed.
DEBUG: openstackclient.shell clean_up DeployOvercloud
DEBUG: openstackclient.shell got an error: Heat Stack create failed.
ERROR: openstackclient.shell Traceback (most recent call last):
File “/usr/lib/python2.7/site-packages/openstackclient/shell.py”, line 176, in
run
return super(OpenStackShell, self).run(argv)
File “/usr/lib/python2.7/site-packages/cliff/app.py”, line 230, in run
result = self.run_subcommand(remainder)
File “/usr/lib/python2.7/site-packages/cliff/app.py”, line 295, in
run_subcommand
result = cmd.run(parsed_args)
File “/usr/lib/python2.7/site-packages/cliff/command.py”, line 53, in run
self.take_action(parsed_args)
File
“/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py”,
line 864, in take_action
self._deploy_tripleo_heat_templates(stack, parsed_args)
File
“/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py”,
line 535, in _deploy_tripleo_heat_templates
parsed_args.timeout)
File
“/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py”,
line 478, in _heat_deploy
raise Exception(“Heat Stack create failed.”)
Exception: Heat Stack create failed.

We can verify that the deploy failed with the command below.

stack@undercloud] # heat stack-list
+————————————–+————+—————+—| id | stack_name | stack_status | creation_time |
+————————————–+————+—————+—
| ce993847-b0ee-4ea2-ac15-dc0ddc81825a | overcloud | CREATE_FAILED |
2016-02-29T20:40:54Z |

Since the stack deploy has failed, let’s take a closer look at the stack resources
and see if we can determine which resources failed.

Here we will make things simple by viewing only failed resources.

[stack@undercloud] # heat resource-list overcloud | grep -i failed
| Compute | c032c668-755f-422f-8ad1-4abf46b022ff
| OS::Heat::ResourceGroup | CREATE_FAILED |
2016-02-29T20:40:55Z |
| Controller | 668d27e0-9ab1-4dbe-8445-1d1ee8839265
| OS::Heat::ResourceGroup | CREATE_FAILED |
2016-02-29T20:40:55Z |

The failed resources are named “Compute” and “Controller“. Lets take a closer
look at those using the “resource-show” argument.

#heat resource-show overcloud Compute

| resource_status_reason | ResourceUnknownStatus: Resource failed – Unknown
status FAILED due to “Resource CREATE failed: ResourceUnknownStatus:
Resource failed – Unknown status FAILED due to “Resource CREATE failed:
StackValidationFailed: Property error : OsNetConfigImpl: config The Parameter
(BondInterfaceOvsOptions) was not provided.”” |

Let’s now do the same for Controller.

#heat resource-show overcloud Controller

| resource_status_reason | ResourceUnknownStatus: Resource failed – Unknown
status FAILED due to “Resource CREATE failed: ResourceUnknownStatus:
Resource failed – Unknown status FAILED due to “Resource CREATE failed:
StackValidationFailed: Property error : OsNetConfigImpl: config The Parameter
(BondInterfaceOvsOptions) was not provided.”” |

Apparently I have some issues with my OVS bonding options, so I need to get those straight before I can continue.

Deleting a Failed Heat Stack

Since our last deploy failed, we need to delete the failed stack before we can kick off another stack deploy. Below is an example of that command – note we are using the UUID of the stack.

 

[stack@vz-undercloud] # heat stack-delete 2b0da4f6-e6f8-41cd-89e8-bf070d0e0d15
 +--------------------------------------+------------+-------------------| id | stack_name | stack_status | creation_time |
 +--------------------------------------+------------+-------------------
 | 2b0da4f6-e6f8-41cd-89e8-bf070d0e0d15 | overcloud | DELETE_IN_PROGRESS |
 2016-03-01T17:21:58Z |
 +--------------------------------------+------------+-------------------

then…

[stack@vz-undercloud] # heat stack-list
 +--------------------------------------+------------+---------------+---
 | id | stack_name | stack_status | creation_time |
 +--------------------------------------+------------+---------------+---
 | 2b0da4f6-e6f8-41cd-89e8-bf070d0e0d15 | overcloud | DELETE_FAILED |
 2016-03-01T17:21:58Z |
 +--------------------------------------+------------+---------------+---

Now lets kick off another deploy

#openstack overcloud deploy –debug –templates ~/my_templates/ \
–ntp-server 10.1.0.1 –control-scale 3 –compute-scale 2 \
-e ~/my_templates/advanced-networking.yaml

Unfortunately, this deploy failed as well.

Ok, let’s take a look at /var/log/heat/heat/heat-engine.log for more details. I also suggest opening another ssh session and tailing the log while the delete is attempting to do its thing.

If the output is too verbose to follow, I suggest attempting to thin out the output using the command below

#tail -f /var/log/heat/heat-engine.log | egrep ‘error|fatal’

This lead me to the following error.

2016-03-01 13:46:12.366 18554 ERROR heat.engine.resource [-] Error marking
resource as failed
2016-03-01 13:46:12.366 18554 TRACE heat.engine.resource DBConnectionError:
(_mysql_exceptions.OperationalError) (2003, “Can’t connect to MySQL server on
‘172.16.0.10’ (111)”)

Mysql is down? So now we need to look at the mariadb logs – where we see the following.

160301 12:16:24 [Warning] Failed to setup SSL
160301 12:16:24 [Warning] SSL error: SSL_CTX_set_default_verify_paths failed

Apparently SELinux is blocking the reads for the certificates.
There are two ways to work around this issue. You can run “restorecon -v
/path/to/certs/“, or you can work around by disabling selinux by running
“setenforce 0” or by editing the /etc/selinux/config file and setting ‘SELINUX=DISABLED’.
You may need to rerun the delete, in my case it was stuck in
“DELETE_IN_PROGRESS”.  I restarted all heat releated services to force the delete to error.

#systemctl restart openstack-heat-engine.service openstack-heat-api.service
openstack-heat-api-cloudwatch.service openstack-heat-api-cfn.service

This will cause the delete to error. You can then retry the delete.

If the delete is taking a long time, you can dig a bit deeper into the delete
using the command below.

#heat resource-list overcloud

Now drill down more with the command below.

#heat event-list overcloud

Make note of the resource_name and its id and use them in the next command.
Note that stack name is still overcloud.

heat event-show overcloud Compute d9e13b02-07b0-4beb-8442-f25de0e7ef8b

I have found that rebooting the undercloud will clear out any in-progress
tasks, you can then run the delete again.

You can also try to manually delete each node from Ironic by mimic’ing what the
nova driver in Ironic does. This is shown below for reference.

$ ironic node-set-provision-state <node uuid> deleted

And to remove the instance_uuid

$ ironic node-update <node uuid> remove instance_uuid

Troubleshooting Failed and In Progress Deployments.

Updated 6/28/2018

While your deploy is running, you can watch for stuck or failed deployments.

$ heat deployment-list | grep -vi complete
WARNING (shell) "heat deployment-list" is deprecated, please use "openstack software deployment list" instead
+--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+----------------------+------------------+
| id | config_id | server_id | action | status | creation_time | status_reason |
+--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+----------------------+------------------+
| 143b25ce-9c1e-48b0-bd92-2569640b5208 | 5b9843a5-2092-4c75-8eae-2e042d7b34d2 | f08232e7-7a14-4885-86ab-443b4581afe6 | CREATE | IN_PROGRESS | 2017-05-17T19:46:38Z | None |
| 27d2a61e-e5cd-4313-967d-cc7c385320cb | c113ead1-bb9a-46ca-a992-4527c2e9fda1 | 67dc5457-5ae7-4290-ad77-36bf8bc81d56 | CREATE | IN_PROGRESS | 2017-05-17T19:46:38Z | None |
| fabab6ae-8848-4a35-8066-d83f82335e95 | 763c4760-3e8c-43c1-8afd-d1b8d6bbb55b | e51061ed-f96f-43ec-b18a-4dc079d961d8 | CREATE | IN_PROGRESS | 2017-05-17T19:46:38Z | None |
| 0b6993c1-a320-4dc7-980c-e789d87a0c84 | 50f1e303-123a-451b-9d86-f29c32e6ce0b | 8d87818a-943f-4f69-8112-7450dfa26bb7 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| 0e537823-f17d-4009-b2e4-6d11adcdfeb7 | 1d7938ad-a6e5-4d8f-9604-489408295a59 | 98847475-ed1c-44b2-aa4e-bbddfe81a1d1 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| 1d89b9cf-87cf-4050-a00f-496c82fd9432 | 4f378bc3-10a7-4a47-b269-2c7de5d75dd7 | 69257ad8-a3ee-453b-a532-db3c293a34b9 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| 3d5915a6-1470-4ffe-a0f1-a356df33e1f9 | dbc8c62c-f5cc-4cfa-93af-ef7b9f02fd19 | 75b49872-c1fb-4c83-87d1-579dcd027bc8 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| a2e57b50-dc03-42a8-b851-b33ade36d591 | 0c7133d8-20bc-481d-b9c5-2d0867a656e7 | 000794e9-4ce2-4c8a-aa69-02aadfd19ce6 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| ab463b0f-fd6e-4169-9283-3e8bbcc1bb4b | 627f1288-3823-4e21-b61d-4bafcbdc446a | b6cf41e8-e60c-43ca-a3ed-d24c078031c8 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| ad05853f-2e70-4f11-b8c0-26d2d4c324fb | 8cd656e7-129f-40d4-a1bc-ed0181b9e768 | 24017696-102d-43ab-a101-f3a33a091faa | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| b6418aec-385a-4b6f-845b-84c87fa10be7 | 04747e35-f0cc-45a4-b251-78ea61a08c83 | ffe36a72-2293-48eb-93ed-bfe1077a0245 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| e741d37d-6119-4d37-8b91-519ecfad8397 | 271139d7-c29c-47df-ad11-26bbc3373c4c | 167dc5ee-0831-4fa8-a7ca-8db6d2b3d4ea | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| 0910d714-027b-40b0-acf3-140f5ab90837 | 9317c6ca-b767-49f2-a56c-45b61bbee2d5 | 34f6b4fd-fa98-46b6-8ace-b6f76af50162 | CREATE | IN_PROGRESS | 2017-05-17T19:46:40Z | None |
| 508e585d-19ec-41e8-bc1e-049d8206f614 | 0917bf21-54f8-46ce-89c8-6461001840a5 | c8aa01f8-5627-46a4-885c-dc61e0886c09 | CREATE | IN_PROGRESS | 2017-05-17T19:46:40Z | None |
+--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+----------------------+------------------+

Next I can inspect a stuck/hung deployment using the command below.  Note that server id above directly corresponds to a nova uuid.

$ heat deployment-show 05d3c2b7-104c-47fd-beeb-3819d55aafb7
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
 "status": "COMPLETE", 
 "server_id": "e1389b2e-ae2c-4095-8654-fd1cac1a8563", 
 "config_id": "1bfd85e8-217c-440f-b095-f4a2fcce4bfc", 
 "output_values": {
 "deploy_stdout": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDVWznMhZnwFKR9KH+7ppJr5f7ZI2LsxwZV6+xPkNOhNE5o2DDNGWGVoGE+2SIiENuhk938LRUixg4pj1coXl9johjv8h2PcB+6rUxVT7jSKGUdpwgTlLsNOEugcR5oJjf0mM55BMzz1nr1yZMTy9THGwU/A43RZcOVvuJXFVA3MIlYGXXUidEOQM6NMEloUpVd/hj2BXUzdaUJV83E0aSGPvED02LixV5uorY0n4OIk2EXC8LlG0863YwJEdQi7hd8/rczLwX18CFFAQJ2mOwkcuJB37V/BEWUqBczHO46e0uc1ZmHhtdPFKgFObgP8gSrT4g0HjOds/BVPgBTtQf5 tripleo-validations\n", 
 "deploy_stderr": "", 
 "deploy_status_code": 0
 }, 
 "creation_time": "2017-05-17T19:46:38Z", 
 "updated_time": "2017-05-17T19:49:26Z", 
 "input_values": {}, 
 "action": "CREATE", 
 "status_reason": "Outputs received", 
 "id": "05d3c2b7-104c-47fd-beeb-3819d55aafb7"

Deleting a Zombie Overcloud

Sometimes it is impossible to delete a failed deploy. After attempting a deploy the stack was stuck in the state below.

+--------------------------------------+------------+--------------------+----------------------+----------------------+
| id | stack_name | stack_status | creation_time | updated_time |
+--------------------------------------+------------+--------------------+----------------------+----------------------+
| 34fa8bd8-b80b-4379-925f-26465e366da7 | overcloud | DELETE_IN_PROGRESS | 2018-07-06T15:58:09Z | 2018-07-06T16:15:57Z |

I was given this command to kill it

# openstack stack list --nested --column ID --format value | xargs openstack stack delete --yes

This finally allowed the stack to delete.

[stack@tpavcpp4r1v4uc ~]$ heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
+----+------------+--------------+---------------+--------------+
| id | stack_name | stack_status | creation_time | updated_time |
+----+------------+--------------+---------------+--------------+
+----+------------+--------------+---------------+--------------+

If you run into the error below when attempting to redeploy

Exception creating plan: Unable to create plan. The Mistral environment already exists

You will need to run the command below to delete the mistral plan

#openstack overcloud plan delete overcloud

Additional Resources

https://wiki.openstack.org/wiki/Heat

http://hardysteven.blogspot.com/2015/04/debugging-tripleo-heat-templates.html

 

Ceph: Troubleshooting Failed OSD Creation

logo_ceph_CMYK_coated

Introduction to Ceph

According to Wikipedia “Ceph is a free software storage platform designed to present object, block, and file storage from a single distributed computer cluster. Ceph’s main goals are to be completely distributed without a single point of failure, scalable to the exabyte level, and freely-available”

More information pertaining to Ceph can be found here.

Lab Buildout

In my homelab I am building out a small Ceph cluster for testing and learning purposes. My small cluster consists or 4 virtual machines as shown below. I plan to use this cluster primarily as a backend for OpenStack.

Monitor Servers
Count 1
CPU 2
Memory (GB) 2
Primary Disk (GB) 16
OSD Servers
Count 3
CPU 2
Memory (GB) 2
Primary Disk (GB) 16
OSD Disk (GB) 10
OSD Disk (GB) 10
OSD Disk (GB) 10
SSD Journal (GB) 6

Troubleshooting OSD Creation

On my monitor server which is also serving as my Admin node, I run the following command to remove all partitioning on all disks that I intend to use for Ceph.

# for disk in sdb sdc sdd sdd; do ceph-deploy disk zap osd01:/dev/$disk; done
Next I run the command below to prepare each OSD and specify the journal disk to use for each OSD. This command “should” create a partition on each OSD, format label it as a Ceph disk, and then create a journal partition for each OSD on the journal disk (sde in this case).
#ceph-deploy osd prepare osd01:sdb:sde osd01:sdc:sde osd01:sdd:sde
Unfortunately, the command below kept failing, stating that it was unable to create some of the partitions on each disk, while creating partitions on some of the disk, and mounting them locally. This left my OSDs in a bad state as running the command again would throw all sorts of errors. So I figured that I would start over and run the zap command again. However now this command was failing with errors as some of the disks were mounted and Ceph was running.
Next step was to ssh into the OSD server, aptly named, osd1 and stop ceph.
# /etc/init.d/ceph stop
Then unmount any OSDd that were mounted.
# umount /var/lib/ceph/osd/ceph-7 /var/lib/ceph/osd/ceph-8 /var/lib/ceph/osd/ceph-9
Then using fdisk, delete any existing partitions, this seemed to be necesary to remove partitons created on the SSD journal disk. Next run partx to force the OS to re-read the partition table on each disk.
# for disk in sdb sdc sdd sde; do partx -a /dev/$disk; done
At this point I was able to log back into the admin node and re-run the prepare command.

Additional Troubleshooting

So, apparently this was not the end of all my woes. I ran into the same issue on my second OSD server, osd02. First thing I did was ssh into the OSD server and run the command below.
[root@osd02 ceph]# /etc/init.d/ceph status
=== osd.3 ===
osd.3: not running.
=== osd.13 ===
osd.13: running {“version”:”0.94.1″}
=== osd.14 ===
osd.14: running {“version”:”0.94.1″}
So I stopped Ceph.
[root@osd02 ceph]# /etc/init.d/ceph stop
=== osd.14 ===
Stopping Ceph osd.14 on osd02…kill 224396…kill 224396…done
=== osd.13 ===
Stopping Ceph osd.13 on osd02…kill 223838…kill 223838…done
=== osd.3 ===
Stopping Ceph osd.3 on osd02…done
Then I unmounted the osd.3.
[root@osd02 ceph]# umount /var/lib/ceph/osd/ceph-3
Then I locally prepared osd3, where /dev/sdb is the osd disk and /dev/sde is the journal disk.
[root@osd02 ceph]# ceph-disk -v prepare –fs-type xfs –cluster ceph — /dev/sdb /dev/sde
I then verified that I had three Ceph journal partitions on my ssd
[root@osd02 ceph]# fdisk -l /dev/sde
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sde: 6442 MB, 6442450944 bytes, 12582912 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: gpt

#         Start          End    Size  Type            Name
1         2048      4098047      2G  unknown         ceph journal
2      4098048      8194047      2G  unknown         ceph journal
3      8194048     12290047      2G  unknown         ceph journal

Then I checked my OSDs again. All were running
[root@osd02 ceph]# /etc/init.d/ceph status
=== osd.13 ===
osd.13: running {“version”:”0.94.1″}
=== osd.14 ===
osd.14: running {“version”:”0.94.1″}
=== osd.18 ===
osd.18: running {“version”:”0.94.1″}

Packstack Installer Failure: “Error: Could not start Service[rabbitmq-server]: Execution of ‘/usr/bin/systemctl start rabbitmq-server’ returned 1”

openstack

Sitting in my hotel room today, I kept running into this error while trying to install OpenStack on a RHEL 7.1 VM running on my laptop. Digging through logs was not helping me one bit, and neither was trying to run “puppet apply” on the failing puppet manifests to see if I could get more info with which to troubleshoot.

Below is the specific error that I was running into. Note that my RHEL VM’s IP address is 192.168.122.75. This IP address is pre-pended to the puppet module names. Your output, will obviously, vary. Note that this output is truncated.

Applying 192.168.122.75_amqp.pp
Applying 192.168.122.75_mariadb.pp
192.168.122.75_amqp.pp: [ ERROR ]
Applying Puppet manifests [ ERROR ]

ERROR : Error appeared during Puppet run: 192.168.122.75_amqp.pp
Error: Could not start Service[rabbitmq-server]: Execution of ‘/usr/bin/systemctl start rabbitmq-server’ returned 1: Job for rabbitmq-server.service failed. See ‘systemctl status rabbitmq-server.service’ and ‘journalctl -xn’ for details.
You will find full trace in log /var/tmp/packstack/20150415-183003-mn6Kfx/manifests/192.168.122.75_amqp.pp.log
Please check log file /var/tmp/packstack/20150415-183003-mn6Kfx/openstack-setup.log for more information
Additional information:

Each and every time, the failure occurred when the installer was trying to install/start and rabbitmq-server via the puppet module amqp.pp. Attempting to start rabbitmq manually yielded the same result.

In this instance, I was trying to be fancy and I had given my VM the hostname packstack01.local (instead of sticking with localhost).

[root@packstack01 20150415-183254-Kv8u6k]# hostnamectl
Static hostname: packstack01.local
Icon name: computer
Chassis: n/a
Machine ID: ca64b7fb0c9d4459a4d313dd17b19d76
Boot ID: fc3397657ed040fca72f3d229d014b74
Virtualization: kvm
Kernel: Linux 3.10.0-229.1.2.el7.x86_64
Architecture: x86_64

Fresh out of any good ideas, I noticed that a simple nslookup on my made up hostname actually returned results. Results that I would not have expected to be valid.

[root@packstack01 20150415-183254-Kv8u6k]# nslookup packstack01.local
Server: 192.168.1.1
Address: 192.168.1.1#53

Name: packstack01.local.local
Address: 198.105.244.104
Name: packstack01.local.local
Address: 198.105.254.104

Despite never referencing my made up hostname in my answer file (by default, the answer file is generated with IP addresses only)  the Rabbitmq service was attempting to connect to itself via hostname, which obviously failed as this is a valid ip and since I was working in a hotel room without proper dns, my server was trying to connect to a machine on the opposite side of the country.

A quick bit of tinkering in the /etc/hosts file resolved this issue, and I was able to complete my install.

Note that there are probably many other reasons why one might run into this error during an OpenStack install via Packstack, however this is the one that I ran into, and thankfully it was easy to fix.

Note to self – always use localhost when working without a valid DNS entry.