This post is the first in what I suspect will be a semi-long list of post-install hints and tips as I go through and start rebuilding my cluster as Vsphere 4. Hopefully I will learn a lot along the way… like for example the fact the ntp and snmp traffic is not allowed by default by the ESX Firewall.
But before we go there we first need to make sure that our services are starting at boot.
>chkconfig ntp on, … do the same for snmp
Then lets fix the firewall. First lets fix ntp.
esxcfg-firewall -e ntpClient
Then lets verify that all is well with…
esxcfg-firewall -q ntpClient
This command returns…
Service ntpClient is enabled
Ok now lets fix snmp using the same commands above, but specific for snmp.
esxcfg-firewall -e snmp and esxcfg-firewall -q snmpd.
While you are at it add the following to you snmp.conf
Successful time keeping in a Virtual Machine can be a bit confusing. At times I have been told to use Vmware Tools to sync time between the Guest and the Host, and at time I have been advised to avoid this functionality and use NTP. The following information is direct from a VMware KB article (updated, 4/16/2010) so I am going to follow their lead on this and use NTP exclusively.
First off VMware advises using the NTP service to keep time in sync, but it suggests
using an additional kernel parameter that you add to your grub.conf. See the KB Article for more info on how to do this.
notsc for RHEL/Centos 4.6 64bit
notsc divider=10 for RHEL/Centos 5.3 64bit
Note that there are no additional params needed for 5.4
Also, inside the ntp.conf the following line should be added to the top of the file.
tinker panic 0
This configuration directive instructs NTP not to give up
if it sees a large jump in time. This is important for coping with large
time drifts and also resuming virtual machines from their suspended state.
It is also important not to use the local clock as a time source, often
referred to as the Undisciplined Local Clock. NTP has a tendency to fall
back to this in preference to the remote servers when there is a large
amount of time drift.
An example of such a configuration is below. You should comment out both
fudge 127.127.1.0 stratum 10
Also, if you are using ntp, you want to make sure that you disable
Vmware tools time sync. You can do so with the following command via the
Using NTP to set the time on a linux server is not hard, however it can have a trick or two up its sleeve. In this example I was troubleshooting NTP on a RedHat 8 server (yes I know its old).
Before we get started, the basics on NTP can be found here. A primer on the ntp.conf file can be found here. For most people this is all you will need to get ntp up and running. I unfortunately was not one of those people.
Below is the error messages that I was receiving when I attempted to start ntp via ‘service ntpd start’.
ntpdate: no server suitable for synchronization found
What is this??? Unfortunatley the server that am attempting to sync to is behind a firewall and is not pingable, so doing a simple ping test to verify that I can connect to the box is out of the question. So I ask a network guy to check the firewall and he tells me that he sees the request coming from the box in question, but its not going to the box that I specified in the ntp.conf. The answer can be found in the /etc/ntp/step-tickers file.
The step-tickers file is meant to hold an initial hostname or IP address to sync with upon startup of ntp. In RedHat, at least, the server runs an ntpdate against it. The entry in my step-tickers was an external host that was no longer accessible so I removed it and added one of my ntp hosts.
However the sync still failed. This time I take a close look at the current time on the client box, and sure enough the date way off. NTP will not sync if there is more than a 1000 second difference between the host and the server. So I fix this using the date command and try again.
Again it fails..
So I run the ntpdate -db command below to get some more info. The transmit section shows that I am not getting a response, this is not news to me but its good to verify,
ntpdate -bd <NTP_SERVER> ntpdate: ntpdate email@example.com Thu May 4 11:01:34 EDT 2006 (1) Looking for host <NTP_SERVER> and service ntp host found :<NTP_SERVER> transmit<NTP_SERVER> transmit<NTP_SERVER> transmit<NTP_SERVER> transmit<NTP_SERVER> transmit<NTP_SERVER> <NTP_SERVER> Server dropped: no data server <NTP_SERVER> port 123 stratum 0, precision 0, leap 00, trust 000 refid [10.253.82.1], delay 0.00000, dispersion 64.00000 transmitted 4, in filter 4 reference time: 00000000.00000000 Thu, Feb 7 2036 1:28:16.000 originate timestamp: 00000000.00000000 Thu, Feb 7 2036 1:28:16.000 transmit timestamp: ccf1430f.64e5a35d Mon, Dec 15 2008 15:56:47.394 filter delay: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 filter offset: 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 delay 0.00000, dispersion 64.00000 offset 0.000000
Ok so onto the NTP host where I run the following command to sniff traffic on UDP port 123.
tcpdump dst port 123
There is can see the client communicating with the host
So another call goes out to the Admin of the ntp server and have him verify that ntp is setup properly and is running. The provided the information above to him. Turns out he had iptables running and was blocking NTP. The other Admin makes a change and I am off and running.