Differences

no way to compare when less than two revisions

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+====== DiskStat ======
+^ Author | [[ everett.vernon@gmail.com | Vernon Everett ]] |
+^ Compatibility | Tested on Solaris 10 |
+^ Requirements | Nothing special |
+^ Download | None |
+^ Last Update | 2010-09-21 |
+===== Description =====
+Graphs of iostat output designed to appear on the trends page.
+Really useful to see what disks are being hit hard, and getting an idea of where your bottlenecks are.
+I called it diskstat, instead of iostat, for two reasons.
+. There was already an iostat graph definition, and I had no idea what it was for
+. Since it appears in the trends, it really makes no difference what it's called.
+By default, it ignores NFS disks, but you can change that with the following in the appropriate section of clientlocal.cfg (or just hack the code)
+  DISKSTAT:SHOW_NFS=yes
+===== Installation =====
+=== Client side ===
+. Copy diskstat.ksh to ~$HOME/client/ext
+. Edit the client/etc/clientlaunch.cfg and insert the following text:
+  [diskstat]
+        ENVFILE $HOBBITCLIENTHOME/etc/hobbitclient.cfg
+        CMD $HOBBITCLIENTHOME/ext/diskstat.ksh
+        LOGFILE $HOBBITCLIENTHOME/logs/diskstat.log
+        INTERVAL 5m
+=== Server side ===
+. Add this to TEST2RRD= in hobbitserver.cfg
+  diskstat-reads=ncv,diskstat-writes=ncv,diskstat-kreads=ncv,diskstat-kwrites=ncv,diskstat-wait=ncv,diskstat-actv=ncv,diskstat-svct=ncv,diskstat-wsvc=ncv,diskstat-pw=ncv,diskstat-pb=ncv
+. Add this to GRAPHS= in hobbitserver.cfg
+  diskstat-reads::7,diskstat-writes::7,diskstat-kreads::7,diskstat-kwrites::7,diskstat-wait::7,diskstat-actv::7,diskstat-svct::7,diskstat-wsvc::7,diskstat-pw::7,diskstat-pb::7
+  # ::7 indicated number of lines per graph. (Default 4) Flavour to taste
+. Add this to hobbitserver.cfg
+  SPLITNCV_diskstat-pb="*:GAUGE"
+  SPLITNCV_diskstat-reads="*:GAUGE"
+  SPLITNCV_diskstat-writes="*:GAUGE"
+  SPLITNCV_diskstat-kreads="*:GAUGE"
+  SPLITNCV_diskstat-kwrites="*:GAUGE"
+  SPLITNCV_diskstat-wait="*:GAUGE"
+  SPLITNCV_diskstat-actv="*:GAUGE"
+  SPLITNCV_diskstat-wsvc="*:GAUGE"
+  SPLITNCV_diskstat-svct="*:GAUGE"
+  SPLITNCV_diskstat-pw="*:GAUGE"
+. Add this hobbitgraph.cfg
+  [diskstat-reads]
+    FNPATTERN diskstat-reads,(.*).rrd
+    TITLE Disk Reads per Second
+    YAXIS Reads
+    -l 0
+    DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE
+    LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@
+    GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur)
+    GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max)
+    GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min)
+    GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n
+  [diskstat-writes]
+    FNPATTERN diskstat-writes,(.*).rrd
+    TITLE Disk Writes per Second
+    YAXIS Writes
+    -l 0
+    DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE
+    LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@
+    GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur)
+    GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max)
+    GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min)
+    GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n
+  [diskstat-kreads]
+    FNPATTERN diskstat-kreads,(.*).rrd
+    TITLE Disk Reads per Second in Kb
+    YAXIS Kb
+    -l 0
+    DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE
+    LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@
+    GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur)
+    GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max)
+    GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min)
+    GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n
+  [diskstat-kwrites]
+    FNPATTERN diskstat-writes,(.*).rrd
+    TITLE Disk Writes per Second in Kb
+    YAXIS Kb
+    -l 0
+    DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE
+    LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@
+    GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur)
+    GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max)
+    GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min)
+    GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n
+  [diskstat-wait]
+    FNPATTERN diskstat-wait,(.*).rrd
+    TITLE Average Number of Transactions Waiting
+    YAXIS Total
+    -l 0
+    DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE
+    LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@
+    GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur)
+    GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max)
+    GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min)
+    GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n
+  [diskstat-actv]
+    FNPATTERN diskstat-actv,(.*).rrd
+    TITLE Average Number of Transactions Active
+    YAXIS Total
+    -l 0
+    DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE
+    LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@
+    GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur)
+    GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max)
+    GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min)
+    GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n
+  [diskstat-svct]
+    FNPATTERN diskstat-svct,(.*).rrd
+    TITLE Average Response Time of Transaction
+    YAXIS Milliseconds
+    -l 0
+    DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE
+    LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@
+    GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur)
+    GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max)
+    GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min)
+    GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n
+  [diskstat-wsvc]
+    FNPATTERN diskstat-wsvc,(.*).rrd
+    TITLE Average Number of Transactions Waiting
+    YAXIS Total
+    -l 0
+    DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE
+    LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@
+    GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur)
+    GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max)
+    GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min)
+    GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n
+  [diskstat-pw]
+    FNPATTERN diskstat-pw,(.*).rrd
+    TITLE Percent of Time Waiting
+    YAXIS %
+    -l 0
+    -u 100
+    DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE
+    LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@
+    GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur)
+    GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max)
+    GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min)
+    GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n
+  [diskstat-pb]
+    FNPATTERN diskstat-pb,(.*).rrd
+    TITLE Percent of Time Disk Busy
+    YAXIS %
+    -l 0
+    -u 100
+    DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE
+    LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@
+    GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur)
+    GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max)
+    GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min)
+    GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n
+===== Source =====
+==== diskstat.ksh ====
+<hidden onHidden="Show Code ⇲" onVisible="Hide Code ⇱">
+<code>
+#!/bin/ksh
+TEMPFILE=$BBTMP/diskstat.tmp
+SHOW_NFS=no   # Set this to yes on server side clientlocal.cfg to change it
+              # DISKSTAT:SHOW_NFS=yes
+DURATION=10   # The duration of the iostat sample
+              # This can be updated in the same way as above
+# Now we redefine some variables, if they are set in clientlocal
+LOGFETCH=${BBTMP}/logfetch.$(uname -n).cfg
+if [ -f $LOGFETCH ]
+    then
+       grep "^DISKSTAT:" $LOGFETCH | cut -d":" -f2 \
+                                   | while read NEW_DEF
+                                     do
+                                        $NEW_DEF
+                                     done
+fi
+> $TEMPFILE  # Make sure it's empty
+/usr/bin/iostat -xrn $DURATION 2 > $TEMPFILE.raw  # And collect some data to work with.
+# We have to collect 2 sets, because the first set is the average since boot.
+# Define where the second set of data starts
+LINE=$(cat $TEMPFILE.raw | grep -n ",device$" | tail -1 | cut -d":" -f1)
+# take the second set, and massage it into usable data
+cat $TEMPFILE.raw | awk "NR>$LINE" \
+                  | sed "s/,/ /g" \
+                  | awk '{ print $NF" "$0 }' \
+                  | awk '{ $NF="";print }' > $TEMPFILE.data
+rm $TEMPFILE.raw
+count=1
+# Now we format the data and send it off to the server
+for subtest in reads writes kreads kwrites wait actv wsvc svct pw pb
+do
+   ((count=count+1))
+   echo "" >> $TEMPFILE
+   cat $TEMPFILE.data | cut -d" " -f1,$count \
+                      | while read DEVICE VAL
+                        do
+                           echo "$DEVICE" | grep ":/" > /dev/null
+                           if [ $? -eq 0 -a "$SHOW_NFS" = "no" ]
+                           then
+                              break
+                           else
+                              DEVICE=$(echo $DEVICE | tr : - )
+                           fi
+                           echo "${DEVICE}:${VAL}" >> $TEMPFILE
+                        done
+                        echo "" >> $TEMPFILE
+                        $BB $BBDISP "data $MACHINE.diskstat-${subtest} $(echo; cat $TEMPFILE ;echo "" ;echo "ignore this" )"
+                        # Without the last echo "ignore this", it seems to not graph the last entry.
+                        # Odd really, but that seems to fix it.
+                        rm $TEMPFILE
+done
+rm $TEMPFILE.data
+</code>
+</hidden>
+===== Known  Bugs and Issues =====
+-09-21 - Found and fixed a bug. (Left out the wsvc_t stat.)
+All bugs are currently unknown.
+If you find any, let me know, and I will see what I can do to fix them.
+===== To Do =====
+Was toying with the idea of having some of the values appear as alerts, with standard red/yellow/green alert thresholds and all the rest, but not sure why?
+Might be useful to watch the average service time?
+However, to be of concern, high iostat figures need to be sustained. Disk usage is expected to peak from time to time, so is it really suitable for alerts?
+And even if it does peak, sustained, what exactly can you do about it?
+Your comments on the back of $100 bills only.
+===== Credits =====
+This all started because a piece of software is crashing on one of my servers every month or so. The application admin is blaming me (and my server)
+I said it's not the server, but after some constructive googling, I found a link which hinted that it might be disk performance.
+I decided to monitor disk performance, and get some graphs for when it crashes again.
+So all credit for this goes to really poorly written mail server that doesn't do single instancing. (Name of application withheld to protect the guilty)
+===== Changelog =====
+  * **2010-09-09**
+    * Initial release
+  * **2010-09-21**
+    * Fairly major bug fix. (Left out the wsvc_t stats)