A few days ago we got hit with a ton of alerts which indicated that a handful of VMs were down, then up, and down again. This cycle continued several times.
At first, after a bit of digging through logs, we thought that the issue was related to scsi reservation errors, but we were already compliant with the best practices for 3PAR mentioned here. So we dug deeper and found that we were in fact suffering from SCSI locks. Go here for more information.
According to VMware…
"The second
category involves acquisition of locks. These are locks related to VMFS
specific meta-data (called cluster locks) and locks related to files
(including directories). Operations in the second category occur much more frequently than operations in the first category. The following are examples of VMFS operations that require locking metadata:
-
Creating a VMFS datastore
-
Expanding a VMFS datastore onto additional extents
-
Powering on a virtual machine
-
Acquiring a lock on a file
-
Creating or deleting a file
-
Creating a template
-
Deploying a virtual machine from a template
-
Creating a new virtual machine
-
Migrating a virtual machine with VMotion
-
Growing a file, for example, a Snapshot file or a thin provisioned Virtual Disk
To resolve a SCSI Lock, log into each of your ESX boxes and run the following command.
# esxcfg-info | egrep -B5 "s Reserved|Pending
Look for the output below, as the host that has "Pending Reservation" value greater than one is causing the lock.
|—-Pending Reservations……………. 1
Now reset the lun.
vmkfstools –lock lunreset /vmfs/devices/disks/vml.02000000006001c230d8abfe000ff76c198ddbc13e50455243