Today while working in the Datacenter, I accidentally destroyed a fibre cable running to one of our database servers. All the databases went poof and the DBAs freaked out. After I reminded them that this was a non-production database I went about running a new fibre cable.
Being that this box was dual pathed back to the SAN, the server should have had additional paths back to its disks and the databases should have keep running along just fine. However, as luck would have it, the secondary path was down at the time I destroyed the primary.
Anyway after reseating the secondary fibre and running a new primary cable I was able to verify that I had multiple connections back to the disks with the following commands.
# vxdmpadm getdmpnode dmpnodename=3pardata0_4641
NAME STATE ENCLR-TYPE PATHS ENBL DSBL ENCLR-NAME ============================================================================== 3pardata0_4641 ENABLED 3PARDATA 4 4 0 3pardata0
Where 3pardata0_4641 is the disk name and PATHS is obviously the number of paths back to the SAN disk.
Even better yet you can check DMP (dynamic multipath status) for all disks with the command below.
If you have ever evacuated disks in Veritas, every so often this will happen to hang. Usually you terminate your session or who knows what. Kinda like Joe Girardi's willingness to sacrifice outs for no good reason every time the Yankees hottest hitter is at the plate. It happens, you can't explain it, you move on. Back to technology – vxtask list shows no tasks, but you get errors trying to rerun the failed evac.
For example:
Plex %5 in volume rman is locked by another utility
Plex rman-01 in volume rman is locked by another utility
Subdisk rman_7_tmp-01 in plex rman-01 is locked by another utility
vxprint -hf is our best friend, as it shows you any flags that are set
We can see that we have flags set on the temporary plex (from the failed evac), the subdisk for the temporary plex, the main plex, the subdisk in the main plex, as well as the volume itself. We need to clear flags to be able to finish re-start our evac. I will also cut the lines on the vxprint that don't change for the purpose of shortening this post.
vxmend -g rman_dg clear all rman %5
So we cleared the volume and temp plex flags, here's the vxprint -htf output afterwards
Great, now down to two flags, the one on the plex and the one on the source disk of our original evac. Clearing flags from subdisks is a lot trickier than clearing flags from volumes and plexes. Because the tutil0 flga is already set, we will need to force the clear. We clear by setting it to "".
vxedit -g rman_dg -f set tutil0="" rman_6_tmp-01
Once again, vxprint -htf
v rman fsgen ENABLED 15625864960 - ACTIVE - -
pl rman-01 rman ENABLED 15625864960 - ACTIVE SDMV1 –
And lastly, we clear the flag on the plex. Why in this order? Because I'm writing this up after I fixed my issues. In the interest of not editing vxprint outputs, it's like this. In retrospect, this could have been cleared with the first one we ran in the beginning.
vxmend -g rman_dg clear all rman rman-01
And finally, the way a vxprint -htf should look when all is healthy.
At this point, feel free to proceed with your evac again. If you're wondering what the putil and tutil fields are, here is what I found courtesy of Symantec:
Got a heads up from a DBA today stating that one of the their database servers was running hot from an i/o perspective. So in order to troubleshoot jumped on the server and ran an iostat
>iostat -d -x 5 3
Based on the output below i was able to determine which disks where the most utilized by looking at the last column which is %utilized. Two other columns to take note of are await and svctime,where await time is the average response time in ms for an i/o request to the device, including any time spent waiting in a queue. Svctime is the average time it took to service a request after it was sent to the disk and out of the queue. In this case service times are low so i can pretty much rule out SAN issues.
In this case svctimes are ok so i know that the disks are performing well, however i do know that the hot disks are under Veritas control, so i take a quick look at the disk groups on the box
>vxdg list
Then do i vxstat on each disk group individually to look for the ones with high number or reads and rights. Then i can advise the dbas which specific database is causing the load on the box.
Its been a while since I have done much with Veritas Storage Foundation, so I was at a bit of a loss after firing up the VEA gui on a fresh install on Centos 5.4, and not seeing any of the agents that I was used to seeing. A quick check on the command line showed that the Storage Agent was in fact running. A vxdisk list displayed my disks without issue, but the GUI was blank except for the server name.
A quick google search lead me to this Symantec KB article…
While this article was helpful it did not solve my issues as I mentioned above, my storage agent was in fact running without issue. However I was able to verify similar errors in my vxsis.log
: Thu Mar 27 12:40:03 2008:3342:get_objects_by_type: Error in GetObjects : 0xc1000039 Thu Mar 27 12:40:03 2008:3342:database::HandleILCacheRequest:Error in get_objects_by_type() : 0xc1000039 Thu Mar 27 12:40:03 2008:3342:rpc_object_fetch: Could not fetch objects
So I started searching on the errors above, and ran into this post on the Symatec Forum, where one of the posters sugguest that the /etc/host file might be the culprit.
Sure enough a quick check of my ./etc/hosts revealed that my servers ip and FQDN were not present. I corrected and then followed the steps below.
First I stopped VEA Service
> /opt/VRTS/bin/vxsvcctrl stop
Then I reconfigured Veritas
>/opt/VRTS/install/installsf
-configure
I restarted my VEA GUI and reconnected and found all was as it should be.
I was attempting to troubleshoot issues as a user was complaining about slow performance on a SAN disk. First thing that I did was check to ensure that there were not any performance issues on any disks that might have been causing this users issues
A quick iostat verified that everything was looking fine iostat -cxzn 1
This box is running Veritas so lets check out the disks. Vxdisk list shows one Sun6140 disk.
# vxdisk list
DEVICE TYPE DISK GROUP STATUS
Disk_0 auto:none – – online invalid
Disk_1 auto:none – – online invalid
SUN6140_0_1 auto:cdsdisk diskname_dg02 diskname_dg online nohotuse
Luxadm is an utility, which discovers FC devices (luxadm probe), shut
downs devives (luxadm shutown_device …) runs a firmware upgrade
(luxadm download_firmware …) and many other things. In this instance I use luxadm to get the true device name for my disk
# luxadm probe
No Network Array enclosures found in /dev/es
I then run a luxadm on the device. Below you can see that I do indeed have two paths to the device. 1 controller = one path, 2 controllers = 2 paths
# luxadm display /dev/rdsk/c4t600A0B800029A7A000000DC747A8168Ad0s2
DEVICE PROPERTIES for disk: /dev/rdsk/c4t600A0B800029A7A000000DC747A8168Ad0s2
Vendor: SUN
Product ID: CSM200_R
Revision: 0619
Serial Num: SG71009283
Unformatted capacity: 12288.000 MBytes
Write Cache: Enabled
Read Cache: Enabled
Minimum prefetch: 0x1
Maximum prefetch: 0x1
Device Type: Disk device
Path(s):
/dev/rdsk/c4t600A0B800029A7A000000DC747A8168Ad0s2
/devices/scsi_vhci/ssd@g600a0b800029a7a000000dc747a8168a:c,raw
Controller /devices/pci@1f,4000/SUNW,qlc@5,1/fp@0,0
Device Address 203700a0b829a7a0,1
Host controller port WWN 210100e08bb370ab
Class secondary
State STANDBY
Controller /devices/pci@1f,4000/SUNW,qlc@5/fp@0,0
Device Address 203600a0b829a7a0,1
Host controller port WWN 210000e08b9370ab
Class primary
State ONLINE
Had I only had one path I would have run cfgadm. I would have seen that one of the fc-fabric devices would have been unconfigured. I then could have used cfgadm to configure it and enable my mulitpathing
Solaris I/O multipathing gives you the ability to set up multiple
redundant paths to a storage system and gives you the benefits of load
balancing and failover.
Need to enable MPXIO
Solaris 10 is the easier, because the mpxio capability is
built-in. You just need to turn it on!
To enable it, edit the file /kernel/drv/fp.conf
file. At the end it should say:
mpxio-disable="yes";
Just change yes to no and it will be enabled:
mpxio-disable="no";
Before multipathing, you should see two copies of each disk in
format. Afterwards, you’ll just see the one copy.
It assigns the next available controller ID, and makes up some
horrendously long target number. For example:
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c6t600C0FF000000000086AB238B2AF0600d0s5
697942398 20825341 670137634 4% /test
Here is a handy list of commands that I put together from various sources. Helpful Commands:
vxdisksetup: used to setup a disk for use with Veritas Volume Manager vxdg: usage examples:
vxdg list — shows all veritas diskgroups
vxdg deport <disk_group> — deport a diskgroups
vxdg import <disk_group> — import a diskgroups
vxdiskadm: Menu driven command
vxvol:
usage example:
vxvol -g <disk_group> startall
vxdisk list: displays disk listing vxprint -ht: displays volume manager object listing vxdg -g <diskgroup> free: displays free space in the specified diskgroup vxtask list: list all volume manager tasks currently running on the system vxdiskadd <diskname>: add a disk to Volume Manager (devicename = cXtXdX) vxedit set spare=on <diskname>: designate a disk as a hot-relocation spare vxedit set spare=off <diskname>: remove a disk as a hot-relocation spare vxedit rename <old_name> <new_name>: rename a disk vxdisk offline <disk_name>: take a disk offline (first remove the disk from its disk group) (devicename=cXtXdXs2) vxdg -g <diskgroup> rmdisk <disk_name>: remove a disk from a disk group vxdisk rm <disk_name>: remove a disk from veritas control vxvol -d <diskgroup> startall: start all volumes in a diskgroup
1.Once you have your disks and disk names, you need to
run the following command on you host, which will make Veritas aware of
any new disks that the OS has access to.
vxdctl enable
2.Running the command vxdisk list will list all the disks that your
system has access to. The disk name may or may not be similar to the
name that on your storage array. Use the following command to
cross reference the vertias name to the os name.
vxdisk -e list
3.At this point you can log into the Veritas GUI, and initialize your disk. Then you can add to your existing diskgroup or create a new one