DiskStat
Author | Vernon Everett |
---|---|
Compatibility | Tested on Solaris 10 |
Requirements | Nothing special |
Download | None |
Last Update | 2010-09-21 |
Description
Graphs of iostat output designed to appear on the trends page. Really useful to see what disks are being hit hard, and getting an idea of where your bottlenecks are.
I called it diskstat, instead of iostat, for two reasons.
1. There was already an iostat graph definition, and I had no idea what it was for
2. Since it appears in the trends, it really makes no difference what it's called.
By default, it ignores NFS disks, but you can change that with the following in the appropriate section of clientlocal.cfg (or just hack the code)
DISKSTAT:SHOW_NFS=yes
Installation
Client side
1. Copy diskstat.ksh to ~$HOME/client/ext
2. Edit the client/etc/clientlaunch.cfg and insert the following text:
[diskstat] ENVFILE $HOBBITCLIENTHOME/etc/hobbitclient.cfg CMD $HOBBITCLIENTHOME/ext/diskstat.ksh LOGFILE $HOBBITCLIENTHOME/logs/diskstat.log INTERVAL 5m
Server side
3. Add this to TEST2RRD= in hobbitserver.cfg
diskstat-reads=ncv,diskstat-writes=ncv,diskstat-kreads=ncv,diskstat-kwrites=ncv,diskstat-wait=ncv,diskstat-actv=ncv,diskstat-svct=ncv,diskstat-wsvc=ncv,diskstat-pw=ncv,diskstat-pb=ncv
4. Add this to GRAPHS= in hobbitserver.cfg
diskstat-reads::7,diskstat-writes::7,diskstat-kreads::7,diskstat-kwrites::7,diskstat-wait::7,diskstat-actv::7,diskstat-svct::7,diskstat-wsvc::7,diskstat-pw::7,diskstat-pb::7 # ::7 indicated number of lines per graph. (Default 4) Flavour to taste
5. Add this to hobbitserver.cfg
SPLITNCV_diskstat-pb="*:GAUGE" SPLITNCV_diskstat-reads="*:GAUGE" SPLITNCV_diskstat-writes="*:GAUGE" SPLITNCV_diskstat-kreads="*:GAUGE" SPLITNCV_diskstat-kwrites="*:GAUGE" SPLITNCV_diskstat-wait="*:GAUGE" SPLITNCV_diskstat-actv="*:GAUGE" SPLITNCV_diskstat-wsvc="*:GAUGE" SPLITNCV_diskstat-svct="*:GAUGE" SPLITNCV_diskstat-pw="*:GAUGE"
6. Add this hobbitgraph.cfg
[diskstat-reads] FNPATTERN diskstat-reads,(.*).rrd TITLE Disk Reads per Second YAXIS Reads -l 0 DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@ GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur) GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max) GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min) GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n [diskstat-writes] FNPATTERN diskstat-writes,(.*).rrd TITLE Disk Writes per Second YAXIS Writes -l 0 DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@ GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur) GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max) GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min) GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n [diskstat-kreads] FNPATTERN diskstat-kreads,(.*).rrd TITLE Disk Reads per Second in Kb YAXIS Kb -l 0 DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@ GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur) GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max) GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min) GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n [diskstat-kwrites] FNPATTERN diskstat-writes,(.*).rrd TITLE Disk Writes per Second in Kb YAXIS Kb -l 0 DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@ GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur) GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max) GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min) GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n [diskstat-wait] FNPATTERN diskstat-wait,(.*).rrd TITLE Average Number of Transactions Waiting YAXIS Total -l 0 DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@ GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur) GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max) GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min) GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n [diskstat-actv] FNPATTERN diskstat-actv,(.*).rrd TITLE Average Number of Transactions Active YAXIS Total -l 0 DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@ GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur) GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max) GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min) GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n [diskstat-svct] FNPATTERN diskstat-svct,(.*).rrd TITLE Average Response Time of Transaction YAXIS Milliseconds -l 0 DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@ GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur) GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max) GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min) GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n [diskstat-wsvc] FNPATTERN diskstat-wsvc,(.*).rrd TITLE Average Number of Transactions Waiting YAXIS Total -l 0 DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@ GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur) GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max) GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min) GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n [diskstat-pw] FNPATTERN diskstat-pw,(.*).rrd TITLE Percent of Time Waiting YAXIS % -l 0 -u 100 DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@ GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur) GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max) GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min) GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n [diskstat-pb] FNPATTERN diskstat-pb,(.*).rrd TITLE Percent of Time Disk Busy YAXIS % -l 0 -u 100 DEF:p@RRDIDX@=@RRDFN@:lambda:AVERAGE LINE2:p@RRDIDX@#@COLOR@:@RRDPARAM@ GPRINT:p@RRDIDX@:LAST: \: %5.1lf (cur) GPRINT:p@RRDIDX@:MAX: \: %5.1lf (max) GPRINT:p@RRDIDX@:MIN: \: %5.1lf (min) GPRINT:p@RRDIDX@:AVERAGE: \: %5.1lf (avg)\n
Source
diskstat.ksh
Known Bugs and Issues
2010-09-21 - Found and fixed a bug. (Left out the wsvc_t stat.)
All bugs are currently unknown.
If you find any, let me know, and I will see what I can do to fix them.
To Do
Was toying with the idea of having some of the values appear as alerts, with standard red/yellow/green alert thresholds and all the rest, but not sure why?
Might be useful to watch the average service time?
However, to be of concern, high iostat figures need to be sustained. Disk usage is expected to peak from time to time, so is it really suitable for alerts? And even if it does peak, sustained, what exactly can you do about it?
Your comments on the back of $100 bills only.
Credits
This all started because a piece of software is crashing on one of my servers every month or so. The application admin is blaming me (and my server)
I said it's not the server, but after some constructive googling, I found a link which hinted that it might be disk performance.
I decided to monitor disk performance, and get some graphs for when it crashes again.
So all credit for this goes to really poorly written mail server that doesn't do single instancing. (Name of application withheld to protect the guilty)
Changelog
- 2010-09-09
- Initial release
- 2010-09-21
- Fairly major bug fix. (Left out the wsvc_t stats)