Red Hat OpenStack Platform 13: five things you need to know about networking

via Red Hat OpenStack Platform 13: five things you need to know about networking

Advertisements

Virtualize your OpenStack control plane with Red Hat Virtualization and Red Hat OpenStack Platform 13

via Virtualize your OpenStack control plane with Red Hat Virtualization and Red Hat OpenStack Platform 13

OpenStack: 9 tips to properly configure your OpenStack Instances

faf3a30ac4067155dd656381da179869

Qcow vs Raw, Performance Tweaks, Cloud-init, and a short guide on Kernel Tuning – courtesy of redhatstackblog.redhat.com

via 9 tips to properly configure your OpenStack Instance

OpenStack: Deleting Zombie Cinder Volumes and VMs

cinder-1

First off let me start by saying that the new Cinder logo is wonderful. Nothing helps me think of backend storage better than the backend of a horse.

In an environment I am working in, we have a large number of cinder volumes that are in error state, due to the backend storage being ripped out. The volumes were not deleted, nor were they detached from the VMs.

End result, you cannot delete the zombie VM (at it has an attached volume) and you cannot delete the zombie/orphaned volume (as it is attached to a VM).

The following process allows you to work around the chicken-and-egg scenario above.

First we get a list of all volumes in error state.

# openstack volume list –all | grep -i error

Then we take a closer look at the volume to see if it exists/existed on the backend that was removed.

# openstack volume show 05b372ef-ee45-499b-9676-72cc4170e1b3

First we check the backend to ensure it is the affected backend – in this case it is.

| os-vol-host-attr:host | hostgroup@dellsc#openstack_dellsc

We also check for any current attachments. Below we see that this volume is attached to a vm with the uuid shown below.

| attachments | [{u’server_id’: u’d142eb4b-823d-4abd-95a0-3b02a3194c9f’,

Now we reset the state of the volume, so that it is no longer in an error state

# cinder reset-state –state available 05b372ef-ee45-499b-9676-72cc4170e1b3

Now we detach the volume via cinder.

# cinder reset-state –attach-status detached 05b372ef-ee45-499b-9676-72cc4170e1b3

Now we are free to delete the volume

# openstack volume delete 05b372ef-ee45-499b-9676-72cc4170e1b3

Confirm volume deletion

# openstack volume show 05b372ef-ee45-499b-9676-72cc4170e1b3
No volume with a name or ID of ’05b372ef-ee45-499b-9676-72cc4170e1b3′ exists

Now we can delete the VM.

# openstack server delete d142eb4b-823d-4abd-95a0-3b02a3194c9f

And now we confirm its deletion.

#openstack server show d142eb4b-823d-4abd-95a0-3b02a3194c9f
No server with a name or ID of ‘d142eb4b-823d-4abd-95a0-3b02a3194c9f’ exists.

OpenStack: Mapping Ironic Hostnames to Nova Hostnames

Ironic_mascot_color

The Hostname Problem

When deploying OpenStack via Red Hat OSP director you configure the hostname of your baremetal (ironic) nodes at time of import. This is done via json file, by default named instack-env.json (but often re-named, nodes.json). Below is an excerpt from that file.

{
“nodes” :  [
{
“arch”: “x86_64”,
“cpu”: “4”,
“disk”: “40”,
“mac”: [
“58:8a:5a:e6:c0:40”
],
“memory”: “6144”,
“name”: “fatmin-ctrl0”,
“pm_addr”: “10.10.1.100”,
“pm_password”: “Mix-A-Lot”,
“pm_type”: “pxe_ipmitool”,
“pm_user”: “sir”
}

 

In the sample instance above, I am importing a node named, “fatmin-ctrl01”. This will be the server name as it appears in Ironic.  When heat deploys the overcloud, this node will by default be renamed overcloud-controller0, and any controller nodes will iterate by 1. Same situation for compute nodes.

What is preferable is to configure what is referred to as “Predictable Hostnames”. Using “Predictable Hostnames” we can do one of two things.

  1. Specify the hostname format to use and allow nova to iterate through nodes on its own.
  2. Specify the exact hostname for nova to use for each baremetal node

Nova Scheduler Hints

Before we can use either of the two options above, we must first update each baremetal node with a nova scheduler hint. In the examples below we are tagging one node to build as controller-0 (overcloud-controller0) and one node to build as (overcloud-compute-0).

For Controllers: Repeat for each controller

# ironic node-update <id> replace properties/capabilities=”node:controller-0,boot_option:local”

For Compute Node: Repeat for each compute node

# ironic node-update <id> replace properties/capabilities=”node:compute-0,boot_option:local”

You will then need to set your nova hints

parameter_defaults:
ControllerSchedulerHints:
‘capabilities:node’: ‘controller-%index%’
ComputeSchedulerHints:
‘capabilities:node’: ‘compute-%index%’

FYI – the same process can be used for the following hostname types

  • ControllerSchedulerHints
  • ComputeSchedulerHints
  • BlockStorageSchedulerHints
  • ObjectStorageSchedulerHints
  • CephStorageSchedulerHints

Custom Nova Hostname Format

Referring to option 1 shown above, we can set a specific format to be used for hostnames instead of the default.

 ControllerHostnameFormat: ‘fatmin-controller-%index%’
ComputeHostnameFormat: ‘fatmin-compute-%index%’

Using the method above the first compute node will be names fatmin-controller-01, and the first compute node will be names fatmin-compute-01. Additional nodes will iterate the index.

While this is nice, as it allows us to set a customized hostname format  for each type of node, it does not allow us to specify the exact hostname to be used for a specific ironic node.  We can do that will the HostnameMap.

HostnameMap

Now you may want to take this a bit further. You may want to use a custom nova name for each node compute/controller node. You can accomplish this using a HostnameMap as shown below.

HostnameMap:
overcloud-controller-0: fatmin-controller-0
overcloud-controller-1: fatmin-controller-1
overcloud-controller-2: fatmin-controller-2
overcloud-compute-0: fatmin-compute-0

 

Note, when specifying the flavor profiles in the deploy command for preassigned nodes, they should be specified as ‘baremetal‘ instead of ‘control‘ and ‘compute‘. This means that you will not have to assign a profile to each host. You will let the nova scheduler hints handle the decision

–control-flavor baremetal \
–compute-flavor baremetal \

So at this point – we will be able to allign the compute or controller index in ironic, with the index in Ironic. For example you can now map your ironic-node name (for example) fatmin-ctrl0 to fatmin-controller0.

Special Notes for Special People

  1. I do not suggest setting the nova name to the exactly the same name that you defined for the ironic name. While the indexes should match, the name formats should vary enough that you can easily tell if you are looking at a nova name or an ironic name.
  2. The use of HostnameMap will easily facilitate the replacement of a failed node so that you can provision the new node with the same nova name that was used by the original node before its premature death. Otherwise, nova will blacklist the nova name of the failed node. For example if controller0 dies and you need to replace and redeploy it, it will end up being named controller4 since this is the next number in the index.

 

 

OpenStack Heat and os-collect-config

OpenStack-logo

os-collect-config

os-collect-config is a tool that starts up via systemd when a system boots. It initially runs at boot time, but continues to run looking for changes in heat metadata. In a nutshell,  os-collect-config is responsible for monitoring and downloading metadata from the Heat API.

When data changes, os-collect-config makes a call to os-refresh-config. This data provides the node with all of the information it needs to make configuration changes to the host – this data will be node specific.

Os-collect-config polls for data from sources (nova-metadata and heat) and stores them in /var/lib/os-collect-config/.

Example of the contents of the directory above.

-rw——-. 1 root root 42412 Jul 23 14:40 ComputeAllNodesDeployment.json
-rw——-. 1 root root 42412 Jun 4 20:56 ComputeAllNodesDeployment.json.last
-rw——-. 1 root root 35447 Feb 20 2017 ComputeAllNodesDeployment.json.orig
-rw——-. 1 root root 23852 Jul 23 14:40 ComputeHostsDeployment.json
-rw——-. 1 root root 23852 Jun 4 20:07 ComputeHostsDeployment.json.last
-rw——-. 1 root root 8074 Feb 20 2017 ComputeHostsDeployment.json.orig
-rw——-. 1 root root 1071 Jul 23 14:40 ec2.json
-rw——-. 1 root root 1071 Feb 20 2017 ec2.json.last
-rw——-. 1 root root 1071 Feb 20 2017 ec2.json.orig
-rw——-. 1 root root 441 Feb 20 2017 heat_local.json
-rw——-. 1 root root 441 Feb 20 2017 heat_local.json.last
-rw——-. 1 root root 441 Feb 20 2017 heat_local.json.orig
-rw——-. 1 root root 2635 Jul 23 14:40 NetworkDeployment.json
-rw——-. 1 root root 2635 Mar 1 2017 NetworkDeployment.json.last
-rw——-. 1 root root 2636 Feb 20 2017 NetworkDeployment.json.orig
-rw——-. 1 root root 13259 Jul 23 14:40 NovaComputeDeployment.json
-rw——-. 1 root root 13259 Jan 16 2018 NovaComputeDeployment.json.last
-rw——-. 1 root root 11069 Feb 20 2017 NovaComputeDeployment.json.orig
-rw——-. 1 root root 311 Jun 4 22:02 os_config_files.json
-rw——-. 1 root root 252762 Jul 23 14:40 request.json
-rw——-. 1 root root 252762 Jun 4 22:07 request.json.last
-rw——-. 1 root root 23440 Feb 20 2017 request.json.orig

Example of the contents of the directory above. You will usually find 3 versions of each config. Current, original, and last (previous).

You can view this metadata, using python, as shown below.

# python -m json.tool ComputeAllNodesDeployment.json
{
“hiera”: {
“datafiles”: {
“all_nodes”: {
“mapped_data”: {
“ca_certs_enabled”: “true”,
“ca_certs_short_node_names”: [

…trunc…

os-refresh-config

os-refresh-config is called by os-collect-config once it recognizes that metadata has changed within the Heat API for that specific node.

The important steps that os-refresh-config take are shown below

1) Apply systemctl configurables
2) Run os-apply-config (see below)
3) Configure the networking for the host
4) Download and set the hieradata files for puppet parameters
5) Configure /etc/hosts
6) Deploy software configuration with Heat

os-apply-config

os-apply-config is called by os-refresh-config to sets up configuration files on specific nodes. It is called via ‘/usr/libexec/os-refresh-config/configure.d/20-os-apply-config‘.

As is does with the undercloud deploy, os-refresh-config executes scripts under /usr/libexec/os-refresh-config/ in a specific order based on numbering.

First scripts within thje  pre-configure.d/ directory are run, then configure.d/ scripts are applied, and finally scripts in post-configure.d/.

It is within these scripts that the metadata downloaded by os-collect-config will be acted upon.

Any call to os-apply-config uses the files in /var/lib/os-collect-config as its configuration source.

The appropriate script files for doing so are as follows:

overcloud$ ll /usr/libexec/os-refresh-config/configure.d/
total 32
-rwxr-xr-x. 1 root root 396 Aug 5 07:31 10-sysctl-apply-config
-rwxr-xr-x. 1 root root 42 Aug 5 07:31 20-os-apply-config
-rwxr-xr-x. 1 root root 189 Aug 5 07:31 20-os-net-config
-rwxr-xr-x. 1 root root 629 Aug 5 07:31 25-set-network-gateway
-rwxr-xr-x. 1 root root 2265 Aug 5 07:31 40-hiera-datafiles
-rwxr-xr-x. 1 root root 1387 Aug 5 07:31 51-hosts
-rwxr-xr-x. 1 root root 5784 Aug 5 07:31 55-heat-config

If we run os-apply-config manually, we can see that it does the following:

overcloud$ sudo sh /usr/libexec/os-refresh-config/configure.d/20-os-apply-config
[2015/08/07 01:17:40 PM] [INFO] writing /etc/os-net-config/config.json
[2015/08/07 01:17:40 PM] [INFO] writing /var/run/heat-config/heat-config
[2015/08/07 01:17:40 PM] [INFO] writing /etc/puppet/hiera.yaml
[2015/08/07 01:17:40 PM] [INFO] writing /etc/os-collect-config.conf
[2015/08/07 01:17:40 PM] [INFO] success

 

os-net-config

The directory /os/net-config/ holds the config.json file that is used to modify the networking configuration on each host. The config found in this file is derived from the os-collect-config data in /var/lib/os-collect-config/.

Again you can use this file to review your networking configuration and compare and contrast to your templates. Formatting is off in the file below, but you get the point.
# python -m json.tool config.json
{
“network_config”: [
{
“addresses”: [
{
“ip_netmask”: “172.20.4.113/24”
}
],
“dns_servers”: [
“96.239.250.57”,
“96.239.250.58”
],
“name”: “em3”,
“routes”: [
{
“ip_netmask”: “169.254.169.254/32”,
“next_hop”: “172.20.4.20”
}
],
“type”: “interface”,
“use_dhcp”: false
},
{
“members”: [
{
“bonding_options”: “mode=4 lacp_rate=1 updelay=1000 miimon=50”,
“members”: [
{
“mtu”: 9216,
“name”: “em1”,
“primary”: true,
“type”: “interface”
},
{
“mtu”: 9216,
“name”: “em2”,
“type”: “interface”
}
],
“mtu”: 9216,
“name”: “bond1”,
“type”: “linux_bond”
},
{
“addresses”: [
{
“ip_netmask”: “172.20.3.30/24”
}
],
“device”: “bond1”,
“mtu”: 9000,
“type”: “vlan”,
“vlan_id”: 52

OpenStack: Introduction to Troubleshooting Heat

keyboard-key-board-melt-broken-computer_p

Introduction to Heat

Heat is the main orchestration engine for OpenStack, and is used my OpenStack director to install an OpenStack Overcloud environment.

When we run the “openstack deploy overcloud” command, we are specifically
telling RHEL OSP director that we want it to use the pre-defined Heat templates from
/usr/share/openstack-tripleo-heat-templates/. OSP director will manage the
deployment of a new overcloud heat stack, using files from this directory.
When RHEL OSP director calls the Heat stack, it needs the following data…

  • A top-level Heat template to use that describes the overall environment and the
    resources required.
  • An environment/resource registry to tell Heat where to find resource
    definitions for non-standard Heat elements, e.g. TripleO components.
  • A set of parameters to declare the deployment-specific options (via -e)

 

The most important files for us to focus on are in our deployment directory, these are the default files that get called by OSP director.

  • The top-level Heat template that OSP director uses for deployment is
    /usr/share/openstack-tripleo-heat-templates/overcloud-without-mergepy.yaml
  • The resource registry, which tells Heat where to find the templates for
    deployment resources is
    /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.
    yaml

Creating a Heat Stack

To create the stack we run the command below. This command instructs heat to use
the templates in ~/my_templates/, as well as the override templates specified
with the ‘-e’ option.

This is just an example of what I am using in my lab environment, your deploy command will be much different. Also note that I have copied the templates from /usr/share/openstack-tripleo-heat-templates to ~/my_templates/.

#openstack overcloud deploy –debug –templates ~/my_templates/ \
–ntp-server 10.1.0.1 –control-scale 3 –compute-scale 2 \
-e ~/my_templates/advanced-networking.yaml

Troubleshooting a Failed Heat Stack

Unfortunately our deploy failed with the following errors.

Exception: Heat Stack create failed.
DEBUG: openstackclient.shell clean_up DeployOvercloud
DEBUG: openstackclient.shell got an error: Heat Stack create failed.
ERROR: openstackclient.shell Traceback (most recent call last):
File “/usr/lib/python2.7/site-packages/openstackclient/shell.py”, line 176, in
run
return super(OpenStackShell, self).run(argv)
File “/usr/lib/python2.7/site-packages/cliff/app.py”, line 230, in run
result = self.run_subcommand(remainder)
File “/usr/lib/python2.7/site-packages/cliff/app.py”, line 295, in
run_subcommand
result = cmd.run(parsed_args)
File “/usr/lib/python2.7/site-packages/cliff/command.py”, line 53, in run
self.take_action(parsed_args)
File
“/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py”,
line 864, in take_action
self._deploy_tripleo_heat_templates(stack, parsed_args)
File
“/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py”,
line 535, in _deploy_tripleo_heat_templates
parsed_args.timeout)
File
“/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py”,
line 478, in _heat_deploy
raise Exception(“Heat Stack create failed.”)
Exception: Heat Stack create failed.

We can verify that the deploy failed with the command below.

stack@undercloud] # heat stack-list
+————————————–+————+—————+—| id | stack_name | stack_status | creation_time |
+————————————–+————+—————+—
| ce993847-b0ee-4ea2-ac15-dc0ddc81825a | overcloud | CREATE_FAILED |
2016-02-29T20:40:54Z |

Since the stack deploy has failed, let’s take a closer look at the stack resources
and see if we can determine which resources failed.

Here we will make things simple by viewing only failed resources.

[stack@undercloud] # heat resource-list overcloud | grep -i failed
| Compute | c032c668-755f-422f-8ad1-4abf46b022ff
| OS::Heat::ResourceGroup | CREATE_FAILED |
2016-02-29T20:40:55Z |
| Controller | 668d27e0-9ab1-4dbe-8445-1d1ee8839265
| OS::Heat::ResourceGroup | CREATE_FAILED |
2016-02-29T20:40:55Z |

The failed resources are named “Compute” and “Controller“. Lets take a closer
look at those using the “resource-show” argument.

#heat resource-show overcloud Compute

| resource_status_reason | ResourceUnknownStatus: Resource failed – Unknown
status FAILED due to “Resource CREATE failed: ResourceUnknownStatus:
Resource failed – Unknown status FAILED due to “Resource CREATE failed:
StackValidationFailed: Property error : OsNetConfigImpl: config The Parameter
(BondInterfaceOvsOptions) was not provided.”” |

Let’s now do the same for Controller.

#heat resource-show overcloud Controller

| resource_status_reason | ResourceUnknownStatus: Resource failed – Unknown
status FAILED due to “Resource CREATE failed: ResourceUnknownStatus:
Resource failed – Unknown status FAILED due to “Resource CREATE failed:
StackValidationFailed: Property error : OsNetConfigImpl: config The Parameter
(BondInterfaceOvsOptions) was not provided.”” |

Apparently I have some issues with my OVS bonding options, so I need to get those straight before I can continue.

Deleting a Failed Heat Stack

Since our last deploy failed, we need to delete the failed stack before we can kick off another stack deploy. Below is an example of that command – note we are using the UUID of the stack.

 

[stack@vz-undercloud] # heat stack-delete 2b0da4f6-e6f8-41cd-89e8-bf070d0e0d15
 +--------------------------------------+------------+-------------------| id | stack_name | stack_status | creation_time |
 +--------------------------------------+------------+-------------------
 | 2b0da4f6-e6f8-41cd-89e8-bf070d0e0d15 | overcloud | DELETE_IN_PROGRESS |
 2016-03-01T17:21:58Z |
 +--------------------------------------+------------+-------------------

then…

[stack@vz-undercloud] # heat stack-list
 +--------------------------------------+------------+---------------+---
 | id | stack_name | stack_status | creation_time |
 +--------------------------------------+------------+---------------+---
 | 2b0da4f6-e6f8-41cd-89e8-bf070d0e0d15 | overcloud | DELETE_FAILED |
 2016-03-01T17:21:58Z |
 +--------------------------------------+------------+---------------+---

Now lets kick off another deploy

#openstack overcloud deploy –debug –templates ~/my_templates/ \
–ntp-server 10.1.0.1 –control-scale 3 –compute-scale 2 \
-e ~/my_templates/advanced-networking.yaml

Unfortunately, this deploy failed as well.

Ok, let’s take a look at /var/log/heat/heat/heat-engine.log for more details. I also suggest opening another ssh session and tailing the log while the delete is attempting to do its thing.

If the output is too verbose to follow, I suggest attempting to thin out the output using the command below

#tail -f /var/log/heat/heat-engine.log | egrep ‘error|fatal’

This lead me to the following error.

2016-03-01 13:46:12.366 18554 ERROR heat.engine.resource [-] Error marking
resource as failed
2016-03-01 13:46:12.366 18554 TRACE heat.engine.resource DBConnectionError:
(_mysql_exceptions.OperationalError) (2003, “Can’t connect to MySQL server on
‘172.16.0.10’ (111)”)

Mysql is down? So now we need to look at the mariadb logs – where we see the following.

160301 12:16:24 [Warning] Failed to setup SSL
160301 12:16:24 [Warning] SSL error: SSL_CTX_set_default_verify_paths failed

Apparently SELinux is blocking the reads for the certificates.
There are two ways to work around this issue. You can run “restorecon -v
/path/to/certs/“, or you can work around by disabling selinux by running
“setenforce 0” or by editing the /etc/selinux/config file and setting ‘SELINUX=DISABLED’.
You may need to rerun the delete, in my case it was stuck in
“DELETE_IN_PROGRESS”.  I restarted all heat releated services to force the delete to error.

#systemctl restart openstack-heat-engine.service openstack-heat-api.service
openstack-heat-api-cloudwatch.service openstack-heat-api-cfn.service

This will cause the delete to error. You can then retry the delete.

If the delete is taking a long time, you can dig a bit deeper into the delete
using the command below.

#heat resource-list overcloud

Now drill down more with the command below.

#heat event-list overcloud

Make note of the resource_name and its id and use them in the next command.
Note that stack name is still overcloud.

heat event-show overcloud Compute d9e13b02-07b0-4beb-8442-f25de0e7ef8b

I have found that rebooting the undercloud will clear out any in-progress
tasks, you can then run the delete again.

You can also try to manually delete each node from Ironic by mimic’ing what the
nova driver in Ironic does. This is shown below for reference.

$ ironic node-set-provision-state <node uuid> deleted

And to remove the instance_uuid

$ ironic node-update <node uuid> remove instance_uuid

Troubleshooting Failed and In Progress Deployments.

Updated 6/28/2018

While your deploy is running, you can watch for stuck or failed deployments.

$ heat deployment-list | grep -vi complete
WARNING (shell) "heat deployment-list" is deprecated, please use "openstack software deployment list" instead
+--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+----------------------+------------------+
| id | config_id | server_id | action | status | creation_time | status_reason |
+--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+----------------------+------------------+
| 143b25ce-9c1e-48b0-bd92-2569640b5208 | 5b9843a5-2092-4c75-8eae-2e042d7b34d2 | f08232e7-7a14-4885-86ab-443b4581afe6 | CREATE | IN_PROGRESS | 2017-05-17T19:46:38Z | None |
| 27d2a61e-e5cd-4313-967d-cc7c385320cb | c113ead1-bb9a-46ca-a992-4527c2e9fda1 | 67dc5457-5ae7-4290-ad77-36bf8bc81d56 | CREATE | IN_PROGRESS | 2017-05-17T19:46:38Z | None |
| fabab6ae-8848-4a35-8066-d83f82335e95 | 763c4760-3e8c-43c1-8afd-d1b8d6bbb55b | e51061ed-f96f-43ec-b18a-4dc079d961d8 | CREATE | IN_PROGRESS | 2017-05-17T19:46:38Z | None |
| 0b6993c1-a320-4dc7-980c-e789d87a0c84 | 50f1e303-123a-451b-9d86-f29c32e6ce0b | 8d87818a-943f-4f69-8112-7450dfa26bb7 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| 0e537823-f17d-4009-b2e4-6d11adcdfeb7 | 1d7938ad-a6e5-4d8f-9604-489408295a59 | 98847475-ed1c-44b2-aa4e-bbddfe81a1d1 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| 1d89b9cf-87cf-4050-a00f-496c82fd9432 | 4f378bc3-10a7-4a47-b269-2c7de5d75dd7 | 69257ad8-a3ee-453b-a532-db3c293a34b9 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| 3d5915a6-1470-4ffe-a0f1-a356df33e1f9 | dbc8c62c-f5cc-4cfa-93af-ef7b9f02fd19 | 75b49872-c1fb-4c83-87d1-579dcd027bc8 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| a2e57b50-dc03-42a8-b851-b33ade36d591 | 0c7133d8-20bc-481d-b9c5-2d0867a656e7 | 000794e9-4ce2-4c8a-aa69-02aadfd19ce6 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| ab463b0f-fd6e-4169-9283-3e8bbcc1bb4b | 627f1288-3823-4e21-b61d-4bafcbdc446a | b6cf41e8-e60c-43ca-a3ed-d24c078031c8 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| ad05853f-2e70-4f11-b8c0-26d2d4c324fb | 8cd656e7-129f-40d4-a1bc-ed0181b9e768 | 24017696-102d-43ab-a101-f3a33a091faa | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| b6418aec-385a-4b6f-845b-84c87fa10be7 | 04747e35-f0cc-45a4-b251-78ea61a08c83 | ffe36a72-2293-48eb-93ed-bfe1077a0245 | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| e741d37d-6119-4d37-8b91-519ecfad8397 | 271139d7-c29c-47df-ad11-26bbc3373c4c | 167dc5ee-0831-4fa8-a7ca-8db6d2b3d4ea | CREATE | IN_PROGRESS | 2017-05-17T19:46:39Z | None |
| 0910d714-027b-40b0-acf3-140f5ab90837 | 9317c6ca-b767-49f2-a56c-45b61bbee2d5 | 34f6b4fd-fa98-46b6-8ace-b6f76af50162 | CREATE | IN_PROGRESS | 2017-05-17T19:46:40Z | None |
| 508e585d-19ec-41e8-bc1e-049d8206f614 | 0917bf21-54f8-46ce-89c8-6461001840a5 | c8aa01f8-5627-46a4-885c-dc61e0886c09 | CREATE | IN_PROGRESS | 2017-05-17T19:46:40Z | None |
+--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+----------------------+------------------+

Next I can inspect a stuck/hung deployment using the command below.  Note that server id above directly corresponds to a nova uuid.

$ heat deployment-show 05d3c2b7-104c-47fd-beeb-3819d55aafb7
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
 "status": "COMPLETE", 
 "server_id": "e1389b2e-ae2c-4095-8654-fd1cac1a8563", 
 "config_id": "1bfd85e8-217c-440f-b095-f4a2fcce4bfc", 
 "output_values": {
 "deploy_stdout": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDVWznMhZnwFKR9KH+7ppJr5f7ZI2LsxwZV6+xPkNOhNE5o2DDNGWGVoGE+2SIiENuhk938LRUixg4pj1coXl9johjv8h2PcB+6rUxVT7jSKGUdpwgTlLsNOEugcR5oJjf0mM55BMzz1nr1yZMTy9THGwU/A43RZcOVvuJXFVA3MIlYGXXUidEOQM6NMEloUpVd/hj2BXUzdaUJV83E0aSGPvED02LixV5uorY0n4OIk2EXC8LlG0863YwJEdQi7hd8/rczLwX18CFFAQJ2mOwkcuJB37V/BEWUqBczHO46e0uc1ZmHhtdPFKgFObgP8gSrT4g0HjOds/BVPgBTtQf5 tripleo-validations\n", 
 "deploy_stderr": "", 
 "deploy_status_code": 0
 }, 
 "creation_time": "2017-05-17T19:46:38Z", 
 "updated_time": "2017-05-17T19:49:26Z", 
 "input_values": {}, 
 "action": "CREATE", 
 "status_reason": "Outputs received", 
 "id": "05d3c2b7-104c-47fd-beeb-3819d55aafb7"

Deleting a Zombie Overcloud

Sometimes it is impossible to delete a failed deploy. After attempting a deploy the stack was stuck in the state below.

+--------------------------------------+------------+--------------------+----------------------+----------------------+
| id | stack_name | stack_status | creation_time | updated_time |
+--------------------------------------+------------+--------------------+----------------------+----------------------+
| 34fa8bd8-b80b-4379-925f-26465e366da7 | overcloud | DELETE_IN_PROGRESS | 2018-07-06T15:58:09Z | 2018-07-06T16:15:57Z |

I was given this command to kill it

# openstack stack list --nested --column ID --format value | xargs openstack stack delete --yes

This finally allowed the stack to delete.

[stack@tpavcpp4r1v4uc ~]$ heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
+----+------------+--------------+---------------+--------------+
| id | stack_name | stack_status | creation_time | updated_time |
+----+------------+--------------+---------------+--------------+
+----+------------+--------------+---------------+--------------+

If you run into the error below when attempting to redeploy

Exception creating plan: Unable to create plan. The Mistral environment already exists

You will need to run the command below to delete the mistral plan

#openstack overcloud plan delete overcloud

Additional Resources

https://wiki.openstack.org/wiki/Heat

http://hardysteven.blogspot.com/2015/04/debugging-tripleo-heat-templates.html