Tracking Down I/O Hotspots in Linux w/ Veritas

MagTapeDrive Got a heads up from a DBA today stating that one of the their database servers was running hot from an i/o perspective. So in order to troubleshoot jumped on the server and ran an iostat

>iostat -d -x 5 3

Based on the output below i was able to determine which disks where the most utilized by looking at the last column which is %utilized. Two other columns to take note of are await and svctime,where await time is the average response time in ms for an i/o request to the device, including any time spent waiting in a queue. Svctime is the average time it took to service a request after it was sent to the disk and out of the queue. In this case service times are low so i can pretty much rule out SAN issues.

VxVM65519     0.00     0.00 230.80 20.80  5753.60   556.80    25.08     1.77    6.94   3.50  99.08
VxVM65516     0.00     0.00 257.60 21.80  6022.40   467.20    23.23     1.68    6.02   3.01  93.22
VxVM65515     0.00     0.00 265.80 18.80  6563.20   364.80    24.34     1.42    4.89   2.69  94.58
VxVM65513     0.00     0.00 233.20 24.00  6032.00   969.60    27.22     1.41    5.47   2.99  88.92
VxVM65493     0.00     0.00 308.80 21.00  7590.40   944.00    25.88     1.74    5.27   2.52  98.14
VxVM65492     0.00     0.00 262.80 20.40  6502.40   716.80    25.49     1.76    6.10   2.94  99.22

In this case svctimes are ok so i know that the disks are performing well, however i do know that the hot disks are under Veritas control, so i take a quick look at the disk groups on the box

>vxdg list

Then do i vxstat on each disk group individually to look for the ones with high number or reads and rights. Then i can advise the dbas which specific database is causing the load on the box.

>vxstat -g example_dg -i 1

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.