====== hba.ksh ====== ^ Author | [[ everett.vernon@gmail.com | Vernon Everett ]] | ^ Compatibility | Should work on all versions | ^ Requirements | Solaris | ^ Download | None | ^ Last Update | 2010-08-11 | ===== Description ===== A monitoring script to keep tabs on your HBAs. It can check they are * Online * Running at optimal speed * Not getting errors * Correctly multi-pathed It will also * List SCSI devices * List remote ports * List multi-path info, even if you don't want it to generate alerts. Alerts are configurable, as are what information to display, and this can be defined at server level, using client-local.cfg (See comments in script, lines 22-25) Override variables are (With default value) * CHECKSPEED=true -- Check HBAs at optimal speed and alert if not * CHECKONLINE=true -- Check HBAs are online and alert if not * CHECKERRS=true -- Check for link errors and alert if not * CHECKMPATH=true -- Show multipath info and alert if path is down. (See MPATHFAILCOL) * LIST_MPATH=true -- List the multi-path info only. (No alerts, but will be ignored if CHECKMPATH is true) * LIST_SCSI=true -- List the SCSI devices * LIST_REMOTE=true -- List the remote ports * ERR_YELLOW=3 -- Number of link errors for a yellow * ERR_RED=100 -- Number of link errors for a red * MPATHFAILCOL=yellow -- Colour of a multipath fail Some may notice that this script could probably have been much shorter, and simpler to create using Perl, but that would require me to learn enough Perl to do so. Anybody feeling energetic enough to redo it in Perl - knock yourself out. ===== Installation ===== === Client side === 1. Copy hba.ksh to ~$HOME/client/ext 2. Edit the ''client/etc/clientlaunch.cfg'' and insert the following text: [hba] ENVFILE $HOBBITCLIENTHOME/etc/hobbitclient.cfg CMD $HOBBITCLIENTHOME/ext/hba.ksh LOGFILE $HOBBITCLIENTHOME/logs/hba.log INTERVAL 5m === Server side === If you want to override some of the default variables in the script, add them to client-local.cfg under the appropriate label. HBA:export CHECKERRS=false HBA:export ERR_RED=20 HBA:export LIST_SCSI=false See Description above for more. ===== Source ===== ==== hba.ksh ==== #!/bin/ksh # HBA monitoring script # Author : Vernon Everett - everett.vernon(at)gmaildotcom # Development History # Date | Author | Summary #--------------------------------------------------------------- # 10/08/2010 | Vernon Everett | Initial release. # 11/08/2010 | Vernon Everett | Added override variables # | Added mult-path, SCSI details, remote port info # | #--------------------------------------------------------------- if [ -x /usr/bin/zonename ] then [ $(/usr/bin/zonename) == "global" ] || exit 0 # I only run on global zones fi TEMPFILE=$BBTMP/hba.$$ FCINFO="/opt/csw/bin/sudo /usr/sbin/fcinfo" MPATHADM="/usr/sbin/mpathadm" # sudo probably not needed COLOUR=green # Define what to check and default threshholds for the error counts CHECKSPEED=true CHECKONLINE=true CHECKERRS=true CHECKMPATH=true LIST_MPATH=true LIST_SCSI=true LIST_REMOTE=true ERR_YELLOW=3 ERR_RED=100 MPATHFAILCOL=yellow # Now we define them, let's over-ride them if defined in clientlocal.cfg # Add lines like this in clientlocal.cfg to override the defaults. # HBA:export CHECKERRS=false # HBA:export ERR_RED=20 LOGFETCH=${BBTMP}/logfetch.$(uname -n).cfg if [ -f $LOGFETCH ] then grep "^HBA:" $LOGFETCH | cut -d":" -f2 \ | while read NEW_DEF do $NEW_DEF done fi date > $TEMPFILE.out $FCINFO hba-port | grep "No Adapters Found" > /dev/null if [ $? -eq 0 ] then # There are no adapters to work with. echo "No Adapters Found" >> $TEMPFILE.out # Let's skip the rest of the crap else $FCINFO hba-port | grep "HBA Port WWN:" \ | cut -d":" -f2 \ | while read WWN do $FCINFO hba-port -l $WWN done >> $TEMPFILE if [ "$CHECKONLINE" = "true" ] then cat $TEMPFILE | while read LINE do ONLINE=$(echo "$LINE" | grep "State:" | cut -d":" -f2 | sed 's/^[ ]*//;s/[ ]*$//' ) if [ -n "$ONLINE" ] then if [ "$ONLINE" = "online" ] then echo "&green $LINE" >> $TEMPFILE.online else echo "&red $LINE" >> $TEMPFILE.online COLOUR=red fi else echo "$LINE" >> $TEMPFILE.online fi done [ "$COLOUR" = "red" ] && echo "&red HBA not online" >> $TEMPFILE.out mv $TEMPFILE.online $TEMPFILE fi if [ "$CHECKSPEED" = "true" ] then cat $TEMPFILE | while read LINE do echo "$LINE" | grep "^HBA" > /dev/null && MAXSPEED="" && CURRSPEED="" && SPEEDS="" SPEEDS=$(echo "$LINE" | grep "Supported Speeds:") [ -n "$SPEEDS" ] && MAXSPEED=$(echo "$SPEEDS" | awk '{ print $NF }') CURRSPEED=$(echo "$LINE" | grep "Current Speed:" | awk '{ print $NF }') if [ -n "$CURRSPEED" -a "$CURRSPEED" != "$MAXSPEED" ] then [ "$COLOUR" != "red" ] && COLOUR="yellow" echo "&yellow Some HBAs not at optimal speed" >> $TEMPFILE.out echo "$LINE" | sed "s/Current/\&yellow Current/g" >> $TEMPFILE.speed MAXSPEED="" SPEEDS="" CURRSPEED="" else echo "$LINE" | sed "s/Current/\&green Current/g" >> $TEMPFILE.speed fi done mv $TEMPFILE.speed $TEMPFILE fi TCOLOUR=$COLOUR COLOUR=green if [ "$CHECKERRS" = "true" ] then cat $TEMPFILE | while read LINE do LCOL=green ERRLINE=$(echo "$LINE" | grep "Count:") if [ -n "$ERRLINE" ] then ERRCOUNT=$(echo "$ERRLINE" | cut -d":" -f2) [ $ERRCOUNT -lt $ERR_YELLOW ] && LCOL=green [ $ERRCOUNT -ge $ERR_YELLOW ] && LCOL=yellow [ $ERRCOUNT -ge $ERR_RED ] && LCOL=red echo "&$LCOL $LINE" >> $TEMPFILE.err else echo "$LINE" >> $TEMPFILE.err fi [ "$LCOL" = "red" ]&& COLOUR=red [ "$LCOL" = "yellow" -a "$COLOUR" != "red" ] && COLOUR=yellow done [ "$COLOUR" = "red" ] && echo "&red Critical error count detected" >> $TEMPFILE.out [ "$COLOUR" = "yellow" ] && echo "&yellow High error count detected" >> $TEMPFILE.out mv $TEMPFILE.err $TEMPFILE fi [ "$TCOLOUR" = "red" ] && COLOUR="red" [ "$TCOLOUR" = "yellow" -a "$COLOUR" != "red" ] && COLOUR= "yellow" if [ "$CHECKMPATH" = "true" -o "$LIST_MPATH" = "true" ] then rm $TEMPFILE.badpath 2> /dev/null # Make sure it's not there echo "
" >> $TEMPFILE.path.out echo "" >> $TEMPFILE.path.out echo "Multi-Pathing" >> $TEMPFILE.path.out echo "" >> $TEMPFILE.path.out $MPATHADM list lu > $TEMPFILE.path if [ "$CHECKMPATH" = "true" ] then cp $TEMPFILE.path $TEMPFILE.path.colours cat $TEMPFILE.path | awk '{ FS=":" ; print $NF }' \ | nawk 'ORS=NR%3?" ":"\n"' \ | while read DEV TOTPATH OPERPATH do LCOL=green if [ $TOTPATH -ne $OPERPATH ] then LCOL=$MPATHFAILCOL MPATH=bad [ $COLOUR != "red" ] && COLOUR=$MPATHFAILCOL fi SEDDEV=$(echo $DEV | sed "s/\//\\\\\//g") sed "s/$SEDDEV/\&$LCOL&/g" $TEMPFILE.path.colours > $TEMPFILE.path.tmp mv $TEMPFILE.path.tmp $TEMPFILE.path.colours done mv $TEMPFILE.path.colours $TEMPFILE.path [ "$MPATH" = "bad" ] && echo "&$MPATHFAILCOL Multipath error detected" >> $TEMPFILE.out fi cat $TEMPFILE.path >> $TEMPFILE.path.out mv $TEMPFILE.path.out $TEMPFILE.path echo >> $TEMPFILE.path fi cat $TEMPFILE | while read LINE do echo "$LINE" | grep "HBA Port" > /dev/null if [ $? -eq 0 ] then echo "" >> $TEMPFILE.out echo "$LINE" >> $TEMPFILE.out else echo "$LINE" >> $TEMPFILE.out fi done [ "$CHECKMPATH" = "true" -o "$LIST_MPATH" = "true" ] && cat $TEMPFILE.path >> $TEMPFILE.out rm $TEMPFILE.path 2>/dev/null if [ "$LIST_REMOTE" = "true" ] then echo "
" >> $TEMPFILE.out echo "" >> $TEMPFILE.out echo "Remote Port Listing" >> $TEMPFILE.out echo "" >> $TEMPFILE.out $FCINFO hba-port | grep "HBA Port WWN:" \ | cut -d":" -f2 \ | while read WWN do $FCINFO remote-port -p $WWN done >> $TEMPFILE.out fi if [ "$LIST_SCSI" = "true" ] then FIRST=true echo "
" >> $TEMPFILE.out echo "SCSI Device Information" >> $TEMPFILE.out echo "" >> $TEMPFILE.out $FCINFO hba-port | grep "HBA Port WWN:" \ | cut -d":" -f2 \ | while read WWN; do $FCINFO remote-port -s -p $WWN done > $TEMPFILE cat $TEMPFILE | while read LINE do echo $LINE | grep "^Remote Port WWN" >/dev/null RES=$? [ $RES -eq 0 -a "$FIRST" = "false" ] && echo "
" >> $TEMPFILE.out [ $RES -eq 0 -a "$FIRST" = "true" ] && FIRST=false echo "$LINE" >> $TEMPFILE.out done fi fi $BB $BBDISP "status $MACHINE.hba $COLOUR $(cat $TEMPFILE.out)" rm $TEMPFILE $TEMPFILE.out 2> /dev/null
===== Known Bugs and Issues ===== **PUBLIC SERVICE WARNING** I have not had the opportunity to test this script under all fail conditions! This means that there could be bugs that I am not aware of. If you spot any, please let me know, and I will see what I can do about it. No bugs known - at the moment. ===== To Do ===== Fix any bugs that come along. Features? No. I am done with this one for now. I leave it to the rest of the community to add any more features. ===== Credits ===== I am going to take the blame for this one. However, I am indebted to Kevin Kelly for giving me the idea. http://www.xymon.com/archive/2010/08/msg00016.html It was a great way to keep myself from going crazy on what was panning out to be a very slow and boring day. :-) ===== Changelog ===== * **2010-08-11** * Initial release