Enabling SNMP in ESXi 4.1 using the Remote CLI

cdc63-6a00e551c39e1c88340168ea399c7e970c-pi

Based on the fact that ESX 4.1 is the last major release of ESX, I decided that I would make myself familiar with managing ESXi hosts.  Since I monitor all my hosts via Zenoss, I figured that I needed to get snmp up and running first.

So I first when out and installed the remote cli for ESX on my ubuntu desktop.  The rcli can be downloaded here. The remote cli allows you to run command administrative commands against ESX/ESXi systems. Its availbile for Windows or Linux.

Configuring on ESXi 4.1, Licensed

First configure your community string, target, and port:

vicfg-snmp –server <ESXi_ip> -c <communityname> -p 161 -t <destination_host>@161/<community name>

Then enable it using the command below:

vicfg-snmp –server <ESXi_ip> -E

Next verify your settings:

vicfg-snmp –server <ESXi_ip> -s

Now test your settings:

vicfg-snmp –server <ESXi_ip> -T

These settings are written out to /etc/vmware/snmp.xml. Sample file below.

/etc/vmware # cat snmp.xml
<config>
<snmpSettings>
<communities>pubic</communities>
<enable>true</enable>
<port>161</port>
<targets>10.1.xx.xx@161 public</targets>
</snmpSettings>

Configuring on ESXi 4.0, Free/Foundation

I had a couple of ESXi 4.0 free hosts to configure, but my attempts to configure them using the cli failed as the snmp settings via cli were read only. So the first thing that you need to do is enable the unsupported console. Instructions can be found here.

Once you are able to ssh to the ESXi box, you need to edit the following file by hand, /etc/vmware/snmp.xml. Use the sample file above as a template and modify your ip, port, and string as needed. I use vi to edit mine.

Then run the command below

services.sh restart

You can then verify your settings using the remote cli by running the command below against your esxi box.

vicfg-snmp –server <ESXi_ip> -T

Resolving SCSI Reservation Conflicts/Locks in Vsphere 4.0

Blue_lock-main1A few days ago we got hit with a ton of alerts which indicated that a handful of VMs were down, then up, and down again. This cycle continued several times.

At first, after a bit of digging through logs, we thought that the issue was related to scsi reservation errors, but we were already compliant with the best practices for 3PAR mentioned here. So we dug deeper and found that we were in fact suffering from SCSI locks. Go here for more information.

 According to VMware…

"The second
category involves acquisition of locks. These are locks related to VMFS
specific meta-data (called cluster locks) and locks related to files
(including directories). Operations in the s
econd category occur much more frequently than operations in the first category. The following are examples of VMFS operations that require locking metadata:

  • Creating a VMFS datastore
  • Expanding a VMFS datastore onto additional extents
  • Powering on a virtual machine
  • Acquiring a lock on a file
  • Creating or deleting a file
  • Creating a template
  • Deploying a virtual machine from a template
  • Creating a new virtual machine
  • Migrating a virtual machine with VMotion
  • Growing a file, for example, a Snapshot file or a thin provisioned Virtual Disk

To resolve a SCSI Lock, log into each of your ESX boxes and run the following command. 

# esxcfg-info | egrep -B5 "s Reserved|Pending

Look for the output below, as the host that has "Pending Reservation" value greater than one is causing the lock.

|—-Pending Reservations……………. 1

Now reset the lun.

vmkfstools –lock lunreset /vmfs/devices/disks/vml.02000000006001c230d8abfe000ff76c198ddbc13e50455243


ESX Post Install – Enable NTP and SNMP

FirewallThis post is the first in what I suspect will be a semi-long list of post-install hints and tips as I go through and start rebuilding my cluster as Vsphere 4. Hopefully I will learn a lot along the way… like for example the fact the ntp and snmp traffic is not allowed by default by the ESX Firewall.

But before we go there we first need to make sure that our services are starting at boot.

>chkconfig ntp on, … do the same for snmp

Then lets fix the firewall. First lets fix ntp.

esxcfg-firewall -e ntpClient

Then lets verify that all is well with…

esxcfg-firewall -q ntpClient

This command returns…

Service ntpClient is enabled

Ok now lets fix snmp using the same commands above, but specific for snmp.

esxcfg-firewall -e snmp and esxcfg-firewall -q snmpd.

While you are at it add the following to you snmp.conf

dlmod SNMPESX /usr/lib/vmware/snmp/libSNMPESX.so

Then restart snmp and ntp and you should be good.

How to Configure NTP in a RHEL/CentOS Vmware Guest

HourglassSuccessful time keeping in a Virtual Machine can be a bit confusing. At times I have been told to use Vmware Tools to sync time between the Guest and the Host, and at time I have been advised to avoid this functionality and use NTP. The following information is direct from a VMware KB article (updated, 4/16/2010) so I am going to follow their lead on this and use NTP exclusively.

First off VMware advises using the NTP service to keep time in sync, but it suggests
using an additional kernel parameter that you add to your grub.conf. See the KB Article for more info on how to do this.

  • notsc for RHEL/Centos 4.6 64bit
  • notsc divider=10 for  RHEL/Centos 5.3 64bit

Note that there are no additional params needed for 5.4

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427

Also, inside the ntp.conf the following line should be added to the top of the file.

tinker panic 0

This configuration directive instructs NTP not to give up
if it sees a large jump in time. This is important for coping with large
time drifts and also resuming virtual machines from their suspended state.

It is also important not to use the local clock as a time source, often
referred to as the Undisciplined Local Clock. NTP has a tendency to fall
back to this in preference to the remote servers when there is a large
amount of time drift.

An example of such a configuration is below. You should comment out both
lines.

server 127.127.1.0

fudge 127.127.1.0 stratum 10

Also, if you are using ntp, you want to make sure that you disable
Vmware tools time sync. You can do so with the following command via the
guest OS.

vmware-guestd –cmd “vmx.set_option synctime 1 0”

How to Disable DRS for one VM in a DRS Enabled Cluster

Vmotion_archVMware DRS (Distributed Resource Scheduler) is a feature of ESX that balances
computing workloads with available resources in a virtualized
environment. 

When you enable a cluster for DRS, VirtualCenter continuously monitors the distribution
of CPU and memory resources for all hosts and virtual machines in the
cluster. DRS compares these metrics to what resource utilization
ideally should be given the attributes of the resource pools and
virtual machines in the cluster, and the current load. Note that DRS is only available in ESX Enterprise or above.

When DRS is enabled in a cluster, ESX then will automagically vmotion guest VMs to other hosts in your cluster in an attempt to ballance out the load evenly across the cluster. However, sometimes this behavior is not always desired. For exmaple if you have a large VM that you want to stay pinned to a particular host.

In order to override the default DRS cluster settings for a vm, you need to do the following.

  1. Right Click on your cluster and then click on "edit settings"
  2. Under DRS, click on "Virtual Machine Options"
  3. Locate the particular VM and the drop down box under "Automation Level"
  4. Change "Default (Fully Automated)" to "Manual"

VMK load_Mod panic failed to load module VMFS2

This past weekend I was doing a few memory upgrades when one of the boxes that I was working on decided not to boot up properly. This was a bummer considering that it was 2am.

Networking did not start properly so I was unable to ssh to the box, however I found the above error messages in the vmware kernel log (/var/log/vmkernel), the hostd log (/var/log/vmware/hostd.log) and on the console.

First I needed to verify the modules that started using the esxcfg-module command with the -l switch. Below is the output of this command, as it should appear, however in this case as we found that the vmklinux and the vmfs3 were loaded up fine but there was no vmfs2 module.

esxcfg-module -l
Device Driver Modules
Module         Enabled Loaded
vmklinux       true    true
bnx2           true    true
cciss          true    true
e1000          true    true
lpfc_740       true    true
lvmdriver      true    true
vmfs3          true    true
etherswitch    true    true
shaper         true    true
tcpip          true    true
cosShadow      true    true
migration      true    true
nfsclient      true    true
deltadisk      true    true

vmfs2          true    true

The we checked whether vmfs2 is not enabled by using the "-g" switch to the same command and found it was not set to load at boot. We then set the boot option to true by using the "-s" switch for the "vmfs2". The performed an esxcfg-boot -b and then rebooted the ESX and it booted up fine.