How to Check Mulitpathing in Veritias Volume Manager

Multipass-01Today while working in the Datacenter,  I accidentally destroyed a fibre cable running to one of our database servers. All the databases went poof and the DBAs freaked out. After I reminded them that this was a non-production database I went about running a new fibre cable.

Being that this box was dual pathed back to the SAN, the server should have had additional paths back to its disks and the databases should have keep running along just fine. However,  as luck would have it, the secondary path was down at the time I destroyed the primary.

Anyway after reseating the secondary fibre and running a new primary cable I was able to verify that I had multiple connections back to the disks with the following commands.

# vxdmpadm getdmpnode dmpnodename=3pardata0_4641

NAME                 STATE        ENCLR-TYPE   PATHS  ENBL  DSBL  ENCLR-NAME 
==============================================================================
3pardata0_4641       ENABLED      3PARDATA     4      4     0     3pardata0

Where 3pardata0_4641 is the disk name and PATHS is obviously the number of paths back to the SAN disk.

Even better yet you can check DMP (dynamic multipath status) for all disks with the command below.

vxdmpadm getdmpnode

Recovering from failed vxevac

Ilovebunt3-214x300

If you have ever evacuated disks in Veritas, every so often this will happen to hang.  Usually you terminate your session or who knows what.  Kinda like Joe Girardi's willingness to sacrifice outs for no good reason every time the Yankees hottest hitter is at the plate.  It happens, you can't explain it, you move on.  Back to technology – vxtask list shows no tasks, but you get errors trying to rerun the failed evac. 

 

For example:

Plex %5 in volume rman is locked by another utility

Plex rman-01 in volume rman is locked by another utility

Subdisk rman_7_tmp-01 in plex rman-01 is locked by another utility

vxprint -hf is our best friend, as it shows you any flags that are set

v  rman    fsgen        ENABLED  15625864960 -     ACTIVE   ATT1    –

pl %5           rman   ENABLED  11719399168 -     TEMPRM   SDMVTMP –

sd rman_6-01 %5         ENABLED  1953232896 9766166272 -    SDMVDST –

pl rman-01 rman    ENABLED  15625864960 -     ACTIVE   SDMV1   –

sd rman_1-01 rman-01 ENABLED 1953234688 0     -        -       –

sd rman_2-01 rman-01 ENABLED 1953232896 1953234688 -   -       –

sd rman_3-01 rman-01 ENABLED 1953232896 3906467584 -   -       –

sd rman_4-01 rman-01 ENABLED 1953232896 5859700480 -   -       –

sd rman_5-01 rman-01 ENABLED 1953232896 7812933376 -   -       –

sd rman_6_tmp-01 rman-01 ENABLED 1953232896 9766166272 – SDMVSRC –

sd rman_7_tmp-01 rman-01 ENABLED 1953232896 11719399168 – -    –

sd rman_8-01 rman-01 ENABLED 1953232896 13672632064 -  -       -

We can see that we have flags set on the temporary plex (from the failed evac), the subdisk for the temporary plex, the main plex, the subdisk in the main plex, as well as the volume itself.  We need to clear flags to be able to finish re-start our evac.  I will also cut the lines on the vxprint that don't change for the purpose of shortening this post.

vxmend -g rman_dg clear all rman %5

So we cleared the volume and temp plex flags, here's the vxprint -htf output afterwards

v  rman    fsgen        ENABLED  15625864960 -     ACTIVE   -       –

pl %5           rman    ENABLED  11719399168 -     TEMPRM   -       –

sd rman_6-01 %5         ENABLED  1953232896 9766166272 -    SDMVDST –

pl rman-01 rman    ENABLED  15625864960 -     ACTIVE   SDMV1   –

sd rman_6_tmp-01 rman-01 ENABLED 1953232896 9766166272 – SDMVSRC –

 

So now with the flags cleared we can remove the temporary plex

vxplex -g rman_dg -o rm dis %5

 

And once again our new vxprint -htf

v  rman    fsgen        ENABLED  15625864960 -     ACTIVE   -       –

pl rman-01 rman    ENABLED  15625864960 -     ACTIVE   SDMV1   –

sd rman_6_tmp-01 rman-01 ENABLED 1953232896 9766166272 – SDMVSRC –

 

Great, now down to two flags, the one on the plex and the one on the source disk of our original evac.  Clearing flags from subdisks is a lot trickier than clearing flags from volumes and plexes.  Because the tutil0 flga is already set, we will need to force the clear.  We clear by setting it to "".

vxedit -g rman_dg -f set tutil0="" rman_6_tmp-01

 

Once again, vxprint -htf

v  rman    fsgen        ENABLED  15625864960 -     ACTIVE   -       -

pl rman-01 rman    ENABLED  15625864960 -     ACTIVE   SDMV1   –

sd rman_6_tmp-01 rman-01 ENABLED 1953232896 9766166272 – -     –

 

And lastly, we clear the flag on the plex.  Why in this order?   Because I'm writing this up after I fixed my issues.  In the interest of not editing vxprint outputs, it's like this.  In retrospect, this could have been cleared with the first one we ran in the beginning.

vxmend -g rman_dg clear all rman rman-01

 

And finally, the way a vxprint -htf should look when all is healthy.

v  rman    fsgen        ENABLED  15625864960 -     ACTIVE   -       –

pl rman-01 rman    ENABLED  15625864960 -     ACTIVE   -       –

sd rman_1-01 rman-01 ENABLED 1953234688 0     -        -       –

sd rman_2-01 rman-01 ENABLED 1953232896 1953234688 -   -       –

sd rman_3-01 rman-01 ENABLED 1953232896 3906467584 -   -       –

sd rman_4-01 rman-01 ENABLED 1953232896 5859700480 -   -       –

sd rman_5-01 rman-01 ENABLED 1953232896 7812933376 -   -       –

sd rman_6_tmp-01 rman-01 ENABLED 1953232896 9766166272 – -     –

sd rman_7_tmp-01 rman-01 ENABLED 1953232896 11719399168 – -    –

sd rman_8-01 rman-01 ENABLED 1953232896 13672632064 -  -       –

 

At this point, feel free to proceed with your evac again.  If you're wondering what the putil and tutil fields are, here is what I found courtesy of Symantec:

http://www.symantec.com/business/support/index?page=content&id=TECH15609

 

Guest Authored By: @momkvi

 

Tracking Down I/O Hotspots in Linux w/ Veritas

MagTapeDrive Got a heads up from a DBA today stating that one of the their database servers was running hot from an i/o perspective. So in order to troubleshoot jumped on the server and ran an iostat

>iostat -d -x 5 3

Based on the output below i was able to determine which disks where the most utilized by looking at the last column which is %utilized. Two other columns to take note of are await and svctime,where await time is the average response time in ms for an i/o request to the device, including any time spent waiting in a queue. Svctime is the average time it took to service a request after it was sent to the disk and out of the queue. In this case service times are low so i can pretty much rule out SAN issues.

VxVM65519     0.00     0.00 230.80 20.80  5753.60   556.80    25.08     1.77    6.94   3.50  99.08
VxVM65516     0.00     0.00 257.60 21.80  6022.40   467.20    23.23     1.68    6.02   3.01  93.22
VxVM65515     0.00     0.00 265.80 18.80  6563.20   364.80    24.34     1.42    4.89   2.69  94.58
VxVM65513     0.00     0.00 233.20 24.00  6032.00   969.60    27.22     1.41    5.47   2.99  88.92
VxVM65493     0.00     0.00 308.80 21.00  7590.40   944.00    25.88     1.74    5.27   2.52  98.14
VxVM65492     0.00     0.00 262.80 20.40  6502.40   716.80    25.49     1.76    6.10   2.94  99.22

In this case svctimes are ok so i know that the disks are performing well, however i do know that the hot disks are under Veritas control, so i take a quick look at the disk groups on the box

>vxdg list

Then do i vxstat on each disk group individually to look for the ones with high number or reads and rights. Then i can advise the dbas which specific database is causing the load on the box.

>vxstat -g example_dg -i 1

 

Veritas Enterprise Administrator Not Displaying Objects/Agents

Symantec_veritas_storage_foundation_052009Its been a while since I have done much with Veritas Storage Foundation, so I was at a bit of a loss after firing up the VEA gui on a fresh install on Centos 5.4, and not seeing any of the agents that I was used to seeing. A quick check on the command line showed that the Storage Agent was in fact running. A vxdisk list displayed my disks without issue, but the GUI was blank except for the server name.

A quick google search lead me to this Symantec KB article…

http://seer.entsupport.symantec.com/docs/302156.htm.

While this article was helpful it did not solve my issues as I mentioned above, my storage agent was in fact running without issue. However I was able to verify similar errors in my vxsis.log

:
Thu Mar 27 12:40:03 2008:3342:get_objects_by_type: Error in GetObjects : 0xc1000039
Thu Mar 27 12:40:03 2008:3342:database::HandleILCacheRequest:Error in get_objects_by_type() : 0xc1000039
Thu Mar 27 12:40:03 2008:3342:rpc_object_fetch: Could not fetch objects

So I started searching on the errors above, and ran into this post on the Symatec Forum, where one of the posters sugguest that the /etc/host file might be the culprit.

Sure enough a quick check of my ./etc/hosts revealed that my servers ip and FQDN were not present. I corrected and then followed the steps below.

First I stopped VEA Service

> /opt/VRTS/bin/vxsvcctrl stop    

Then I reconfigured Veritas

>/opt/VRTS/install/installsf
-configure

I restarted my VEA GUI and reconnected and found all was as it should be.

Verify Solaris 10 Multipathing/Configure SAN Disk

Fibre

I was attempting to troubleshoot issues as a user was complaining about slow performance on a SAN disk. First thing that I did was check to ensure that there were not any performance issues on any disks that might have been causing this users issues

A quick iostat verified that everything was looking fine
iostat -cxzn 1

 

This box is running Veritas so lets check out the disks. Vxdisk list shows one Sun6140 disk.

# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
Disk_0       auto:none       –            –            online invalid
Disk_1       auto:none       –            –            online invalid
SUN6140_0_1  auto:cdsdisk    diskname_dg02  diskname_dg online nohotuse

Luxadm is an utility, which discovers FC devices (luxadm probe), shut
downs devives (luxadm shutown_device …) runs a firmware upgrade
(luxadm download_firmware …) and many other things. In this instance I use luxadm to get the true device name for my disk


# luxadm probe
No Network Array enclosures found in /dev/es

Found Fibre Channel device(s):
Node WWN:200600a0b829a7a0  Device Type:Disk device
Logical Path:/dev/rdsk/c4t600A0B800029A7A000000DC747A8168Ad0s2

I then run a luxadm on the device. Below you can see that I do indeed have two paths to the device.
1 controller = one path, 2 controllers = 2 paths

# luxadm display /dev/rdsk/c4t600A0B800029A7A000000DC747A8168Ad0s2
DEVICE PROPERTIES for disk: /dev/rdsk/c4t600A0B800029A7A000000DC747A8168Ad0s2
Vendor:               SUN
Product ID:           CSM200_R
Revision:             0619
Serial Num:           SG71009283
Unformatted capacity: 12288.000 MBytes
Write Cache:          Enabled
Read Cache:           Enabled
Minimum prefetch:   0x1
Maximum prefetch:   0x1
Device Type:          Disk device
Path(s):

/dev/rdsk/c4t600A0B800029A7A000000DC747A8168Ad0s2
/devices/scsi_vhci/ssd@g600a0b800029a7a000000dc747a8168a:c,raw
Controller           /devices/pci@1f,4000/SUNW,qlc@5,1/fp@0,0
Device Address              203700a0b829a7a0,1
Host controller port WWN    210100e08bb370ab
Class                       secondary
State                       STANDBY
Controller           /devices/pci@1f,4000/SUNW,qlc@5/fp@0,0
Device Address              203600a0b829a7a0,1
Host controller port WWN    210000e08b9370ab
Class                       primary
State                       ONLINE

Had I only had one path I would have run cfgadm. I would have seen that one of the fc-fabric devices would have been unconfigured. I then could have used cfgadm to configure it and enable my mulitpathing

# cfgadm
Ap_Id                          Type         Receptacle   Occupant     Condition
c0                             scsi-bus     connected    configured   unknown
c1                             scsi-bus     connected    unconfigured unknown
c2                             fc-fabric    connected    configured   unknown
c3                             fc-fabric    connected    configured   unknown

MPXIO Primer

Solaris I/O multipathing gives you the ability to set up multiple
redundant paths to a storage system and gives you the benefits of load
balancing and failover.

Need to enable MPXIO

Solaris 10 is the easier, because the mpxio capability is
built-in. You just need to turn it on!

To enable it, edit the file /kernel/drv/fp.conf
file. At the end it should say:

mpxio-disable="yes";

Just change yes to no and it will be enabled:

mpxio-disable="no";

Before multipathing, you should see two copies of each disk in
format. Afterwards, you’ll just see the one copy.

It assigns the next available controller ID, and makes up some
horrendously long target number. For example:

Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c6t600C0FF000000000086AB238B2AF0600d0s5
697942398 20825341 670137634 4% /test

Veritas Cheat Sheet

Here is a handy list of commands that I put together from various sources.

Helpful Commands:

vxdisksetup: used to setup a disk for use with Veritas Volume Manager
vxdg: usage examples:

  • vxdg list — shows all veritas diskgroups
  • vxdg deport <disk_group> — deport a diskgroups
  • vxdg import <disk_group> — import a diskgroups

vxdiskadm: Menu driven command

vxvol:
usage example:

  • vxvol -g <disk_group> startall

vxdisk list: displays disk listing
vxprint -ht: displays volume manager object listing
vxdg -g <diskgroup> free: displays free space in the specified diskgroup
vxtask list: list all volume manager tasks currently running on the system
vxdiskadd <diskname>: add a disk to Volume Manager (devicename = cXtXdX)
vxedit set spare=on <diskname>: designate a disk as a hot-relocation spare
vxedit set spare=off <diskname>: remove a disk as a hot-relocation spare
vxedit rename <old_name> <new_name>: rename a disk
vxdisk offline <disk_name>: take a disk offline (first remove the disk from its disk group) (devicename=cXtXdXs2)
vxdg -g <diskgroup> rmdisk <disk_name>: remove a disk from a disk group
vxdisk rm <disk_name>: remove a disk from veritas control
vxvol -d <diskgroup> startall: start all volumes in a diskgroup

Resizing a volume:

# vxassist -g <diskgroup> growto <volumename> <length>
# vxassist -g <diskgroup> growby <volumename> <length>
# vxassist -g <diskgroup> shrinkto <volumename> <length>
# vxassist -g <diskgroup> shrinkby <volumename> <length>

Veritas Clean-Up Procedure:

  • rm /etc/vx/disk.info
  • rm /etc/vx/array.info
  • rm /dev/vx/dmp/*
  • rm /dev/vx/rdmp/*
  • Then reboot

How To Add New Disks to a System

1.Once you have your disks and disk names, you need to
run the following command on you host, which will make Veritas aware of
any new disks that the OS has access to.

vxdctl enable

2.Running the command vxdisk list will list all the disks that your
system has access to. The disk name may or may not be similar to the
name that on your storage array. Use the following command to
cross reference the vertias name to the os name.

vxdisk -e list

3.At this point you can log into the Veritas GUI, and initialize your disk. Then you can add to your existing diskgroup or create a new one

GUI

Start Veritas Volume Manager (VVM) in GUI base

  1. cd /opt/VRTSob/bin/vea