Differences

This shows you the differences between two versions of the page.

Link to this comparison view

monitors:hba.ksh [2010/08/24 01:43] (current)
Line 1: Line 1:
 +====== hba.ksh ======
 +
 +^ Author | [[ everett.vernon@gmail.com | Vernon Everett ]] |
 +^ Compatibility | Should work on all versions |
 +^ Requirements | Solaris |
 +^ Download | None |
 +^ Last Update | 2010-08-11 |
 +
 +===== Description =====
 +A monitoring script to keep tabs on your HBAs.
 +It can check they are 
 +  * Online
 +  * Running at optimal speed
 +  * Not getting errors
 +  * Correctly multi-pathed
 +It will also
 +  * List SCSI devices
 +  * List remote ports
 +  * List multi-path info, even if you don't want it to generate alerts.
 +Alerts are configurable,​ as are what information to display, and this can be defined at server level, using client-local.cfg (See comments in script, lines 22-25)
 +Override variables are (With default value)
 +  * CHECKSPEED=true ​    -- Check HBAs at optimal speed and alert if not
 +  * CHECKONLINE=true ​   -- Check HBAs are online and alert if not
 +  * CHECKERRS=true ​     -- Check for link errors and alert if not
 +  * CHECKMPATH=true ​    -- Show multipath info and alert if path is down. (See MPATHFAILCOL)
 +  * LIST_MPATH=true ​    -- List the multi-path info only. (No alerts, but will be ignored if CHECKMPATH is true)
 +  * LIST_SCSI=true ​     -- List the SCSI devices
 +  * LIST_REMOTE=true ​   -- List the remote ports
 +  * ERR_YELLOW=3 ​       -- Number of link errors for a yellow
 +  * ERR_RED=100 ​        -- Number of link errors for a red
 +  * MPATHFAILCOL=yellow -- Colour of a multipath fail
 +
 +Some may notice that this script could probably have been much shorter, and simpler to create using Perl, but that would require me to learn enough Perl to do so.
 +Anybody feeling energetic enough to redo it in Perl - knock yourself out.
 +
 +===== Installation =====
 +=== Client side ===
 +1. Copy hba.ksh to ~$HOME/​client/​ext
 +
 +2. Edit the ''​client/​etc/​clientlaunch.cfg''​ and insert the following text:
 +
 +  [hba]
 +          ENVFILE $HOBBITCLIENTHOME/​etc/​hobbitclient.cfg
 +          CMD $HOBBITCLIENTHOME/​ext/​hba.ksh
 +          LOGFILE $HOBBITCLIENTHOME/​logs/​hba.log
 +          INTERVAL 5m
 +=== Server side ===
 +If you want to override some of the default variables in the script, add them to client-local.cfg under the appropriate label.
 +  HBA:export CHECKERRS=false
 +  HBA:export ERR_RED=20
 +  HBA:export LIST_SCSI=false
 +
 +See Description above for more.
 +
 +===== Source =====
 +==== hba.ksh ====
 +
 +<hidden onHidden="​Show Code ⇲" onVisible="​Hide Code ⇱">​
 +<code bash>
 +#!/bin/ksh
 +# HBA monitoring script
 +# Author : Vernon Everett ​ - everett.vernon(at)gmaildotcom
 +# Development History
 +# Date       | Author ​        | Summary
 +#​---------------------------------------------------------------
 +# 10/08/2010 | Vernon Everett | Initial release. ​
 +# 11/08/2010 | Vernon Everett | Added override variables
 +#                             | Added mult-path, SCSI details, remote port info
 +#                             ​| ​
 +#​---------------------------------------------------------------
 +if [ -x /​usr/​bin/​zonename ]
 +then
 +    [ $(/​usr/​bin/​zonename) == "​global"​ ] || exit 0  # I only run on global zones
 +fi
 +TEMPFILE=$BBTMP/​hba.$$
 +FCINFO="/​opt/​csw/​bin/​sudo /​usr/​sbin/​fcinfo"​
 +MPATHADM="/​usr/​sbin/​mpathadm"​ # sudo probably not needed
 +COLOUR=green
 +# Define what to check and default threshholds for the error counts
 +CHECKSPEED=true
 +CHECKONLINE=true
 +CHECKERRS=true
 +CHECKMPATH=true
 +LIST_MPATH=true
 +LIST_SCSI=true
 +LIST_REMOTE=true
 +ERR_YELLOW=3
 +ERR_RED=100
 +MPATHFAILCOL=yellow
 +# Now we define them, let's over-ride them if defined in clientlocal.cfg
 +# Add lines like this in clientlocal.cfg to override the defaults.
 +# HBA:export CHECKERRS=false
 +# HBA:export ERR_RED=20
 +LOGFETCH=${BBTMP}/​logfetch.$(uname -n).cfg
 +if [ -f $LOGFETCH ]
 +then
 +   grep "​^HBA:"​ $LOGFETCH | cut -d":"​ -f2 \
 +                          | while read NEW_DEF
 +                            do
 +                                $NEW_DEF
 +                            done
 +fi
 +
 +date > $TEMPFILE.out
 +$FCINFO hba-port | grep "No Adapters Found" > /dev/null
 +if [ $? -eq 0 ]
 +then
 +   # There are no adapters to work with.
 +   echo "No Adapters Found" >> $TEMPFILE.out
 +   # Let's skip the rest of the crap
 +else
 +   ​$FCINFO hba-port | grep "HBA Port WWN:" \
 +                    | cut -d":"​ -f2 \
 +                    | while read WWN
 +                      do
 +                         ​$FCINFO hba-port -l $WWN
 +                      done >> $TEMPFILE
 +   if [ "​$CHECKONLINE"​ = "​true"​ ]
 +   then
 +      cat $TEMPFILE | while read LINE
 +                      do
 +                         ​ONLINE=$(echo "​$LINE"​ | grep "​State:"​ | cut -d":"​ -f2 | sed 's/^[ ]*//;s/[ ]*$//' )
 +                         if [ -n "​$ONLINE"​ ]
 +                         then
 +                            if [ "​$ONLINE"​ = "​online"​ ]
 +                            then
 +                               echo "&​green $LINE" >> $TEMPFILE.online
 +                            else
 +                               echo "&​red $LINE" >> $TEMPFILE.online
 +                               ​COLOUR=red
 +                            fi
 +                         else
 +                            echo "​$LINE"​ >> $TEMPFILE.online
 +                         fi
 +                      done
 +                      [ "​$COLOUR"​ = "​red"​ ] && echo "&​red HBA not online"​ >> $TEMPFILE.out
 +      mv $TEMPFILE.online $TEMPFILE
 +   fi
 +
 +   if [ "​$CHECKSPEED"​ = "​true"​ ]
 +   then
 +      cat $TEMPFILE | while read LINE
 +                      do
 +                         echo "​$LINE"​ | grep "​^HBA"​ > /dev/null && MAXSPEED=""​ && CURRSPEED=""​ && SPEEDS=""​
 +                         ​SPEEDS=$(echo "​$LINE"​ | grep "​Supported Speeds:"​)
 +                         [ -n "​$SPEEDS"​ ] && MAXSPEED=$(echo "​$SPEEDS"​ | awk '{ print $NF }')
 +                         ​CURRSPEED=$(echo "​$LINE"​ | grep "​Current Speed:"​ | awk '{ print $NF }')
 +                         if [ -n "​$CURRSPEED"​ -a "​$CURRSPEED"​ != "​$MAXSPEED"​ ]
 +                         then
 +                            [ "​$COLOUR"​ != "​red"​ ] && COLOUR="​yellow"​
 +                            echo "&​yellow Some HBAs not at optimal speed" >> $TEMPFILE.out
 +                            echo "​$LINE"​ | sed "​s/​Current/​\&​yellow Current/​g"​ >> $TEMPFILE.speed
 +                            MAXSPEED=""​
 +                            SPEEDS=""​
 +                            CURRSPEED=""​
 +                         else
 +                            echo "​$LINE"​ | sed "​s/​Current/​\&​green Current/​g"​ >> $TEMPFILE.speed
 +                         fi
 +                      done
 +      mv $TEMPFILE.speed $TEMPFILE
 +   fi
 +
 +   ​TCOLOUR=$COLOUR
 +   ​COLOUR=green
 +   if [ "​$CHECKERRS"​ = "​true"​ ]
 +   then
 +      cat $TEMPFILE | while read LINE
 +      do
 +         ​LCOL=green
 +         ​ERRLINE=$(echo "​$LINE"​ | grep "​Count:"​)
 +         if [ -n "​$ERRLINE"​ ]
 +         then
 +            ERRCOUNT=$(echo "​$ERRLINE"​ | cut -d":"​ -f2)
 +            [ $ERRCOUNT -lt $ERR_YELLOW ] && LCOL=green
 +            [ $ERRCOUNT -ge $ERR_YELLOW ] && LCOL=yellow
 +            [ $ERRCOUNT -ge $ERR_RED ] && LCOL=red
 +            echo "&​$LCOL $LINE" >> $TEMPFILE.err
 +         else
 +            echo "​$LINE"​ >> $TEMPFILE.err
 +         fi
 +         [ "​$LCOL"​ = "​red"​ ]&& COLOUR=red
 +         [ "​$LCOL"​ = "​yellow"​ -a "​$COLOUR"​ != "​red"​ ] && COLOUR=yellow
 +      done
 +      [ "​$COLOUR"​ = "​red"​ ] && echo "&​red Critical error count detected"​ >> $TEMPFILE.out
 +      [ "​$COLOUR"​ = "​yellow"​ ] && echo "&​yellow High error count detected"​ >> $TEMPFILE.out
 +      mv $TEMPFILE.err $TEMPFILE
 +   fi
 +   [ "​$TCOLOUR"​ = "​red"​ ] && COLOUR="​red"​
 +   [ "​$TCOLOUR"​ = "​yellow"​ -a "​$COLOUR"​ != "​red"​ ] && COLOUR= "​yellow"​
 +
 +   if [ "​$CHECKMPATH"​ = "​true"​ -o "​$LIST_MPATH"​ = "​true"​ ]
 +   then
 +      rm $TEMPFILE.badpath 2> /​dev/​null ​ # Make sure it's not there
 +      echo "<​hr width="​50%"​ size="​3"​ />" >> $TEMPFILE.path.out
 +      echo ""​ >> $TEMPFILE.path.out
 +      echo "<​b>​Multi-Pathing</​b>"​ >> $TEMPFILE.path.out
 +      echo ""​ >> $TEMPFILE.path.out
 +      $MPATHADM list lu > $TEMPFILE.path
 +      if [ "​$CHECKMPATH"​ = "​true"​ ]
 +      then
 +         cp $TEMPFILE.path $TEMPFILE.path.colours
 +         cat $TEMPFILE.path | awk '{ FS=":"​ ; print $NF }' \
 +                            | nawk '​ORS=NR%3?"​ ":"​\n"'​ \
 +                            | while read DEV TOTPATH OPERPATH
 +                              do
 +                                 ​LCOL=green
 +                                 if [ $TOTPATH -ne $OPERPATH ]
 +                                 then
 +                                    LCOL=$MPATHFAILCOL
 +                                    MPATH=bad
 +                                    [ $COLOUR != "​red"​ ] && COLOUR=$MPATHFAILCOL
 +                                 fi
 +                                 ​SEDDEV=$(echo $DEV | sed "​s/​\//​\\\\\//​g"​)
 +                                 sed "​s/​$SEDDEV/​\&​$LCOL&/​g"​ $TEMPFILE.path.colours > $TEMPFILE.path.tmp
 +                                 mv $TEMPFILE.path.tmp $TEMPFILE.path.colours
 +                              done
 +         mv $TEMPFILE.path.colours $TEMPFILE.path
 +         [ "​$MPATH"​ = "​bad"​ ] && echo "&​$MPATHFAILCOL Multipath error detected"​ >> $TEMPFILE.out
 +      fi
 +      cat $TEMPFILE.path >> $TEMPFILE.path.out
 +      mv $TEMPFILE.path.out $TEMPFILE.path
 +      echo >> $TEMPFILE.path
 +   fi
 +
 +   cat $TEMPFILE | while read LINE
 +   do
 +      echo "​$LINE"​ | grep "HBA Port" > /dev/null
 +      if [ $? -eq 0 ]
 +      then
 +         echo "<​b>"​ >> $TEMPFILE.out
 +         echo "​$LINE</​b>"​ >> $TEMPFILE.out
 +      else
 +         echo "​$LINE"​ >> $TEMPFILE.out
 +      fi
 +   done
 +   [ "​$CHECKMPATH"​ = "​true"​ -o "​$LIST_MPATH"​ = "​true"​ ] && cat $TEMPFILE.path >> $TEMPFILE.out
 +   rm $TEMPFILE.path 2>/​dev/​null
 +
 +
 +   if [ "​$LIST_REMOTE"​ = "​true"​ ]
 +   then
 +      echo "<​hr width="​50%"​ size="​3"​ />" >> $TEMPFILE.out
 +      echo ""​ >> $TEMPFILE.out
 +      echo "<​b>​Remote Port Listing</​b>"​ >> $TEMPFILE.out
 +      echo ""​ >> $TEMPFILE.out
 +      $FCINFO hba-port | grep "HBA Port WWN:" \
 +                       | cut -d":"​ -f2  \
 +                       | while read WWN
 +                         do
 +                            $FCINFO remote-port -p $WWN
 +                         done >> $TEMPFILE.out
 +   fi
 +
 +   if [ "​$LIST_SCSI"​ = "​true"​ ]
 +   then
 +      FIRST=true
 +      echo "<​hr width="​50%"​ size="​3"​ />" >> $TEMPFILE.out
 +      echo "<​b>​SCSI Device Information</​b>"​ >> $TEMPFILE.out
 +      echo ""​ >> $TEMPFILE.out
 +      $FCINFO hba-port | grep "HBA Port WWN:" \
 +                       | cut -d":"​ -f2  \
 +                       | while read WWN;
 +                         do
 +                            $FCINFO remote-port -s -p $WWN
 +                         done > $TEMPFILE
 +      cat $TEMPFILE | while read LINE
 +                      do
 +                         echo $LINE | grep "​^Remote Port WWN" >/​dev/​null
 +                         ​RES=$?​
 +                         [ $RES -eq 0 -a "​$FIRST"​ = "​false"​ ] && echo "<​hr width="​50%"​ size="​3"​ />" >> $TEMPFILE.out
 +                         [ $RES -eq 0 -a "​$FIRST"​ = "​true"​ ] && FIRST=false
 +                         echo "​$LINE"​ >> $TEMPFILE.out
 +                      done
 +   fi
 +fi
 +
 +$BB $BBDISP "​status $MACHINE.hba $COLOUR $(cat $TEMPFILE.out)"​
 +rm $TEMPFILE $TEMPFILE.out 2> /dev/null
 +
 +</​code>​
 +</​hidden>​
 +
 +===== Known  Bugs and Issues =====
 +**PUBLIC SERVICE WARNING**
 +I have not had the opportunity to test this script under all fail conditions!
 +
 +This means that there could be bugs that I am not aware of.
 +
 +If you spot any, please let me know, and I will see what I can do about it.
 +
 +
 +No bugs known - at the moment.
 +
 +===== To Do =====
 +Fix any bugs that come along.
 +
 +Features?
 +
 +No. I am done with this one for now. I leave it to the rest of the community to add any more features.
 +
 +
 +===== Credits =====
 +I am going to take the blame for this one.
 +
 +However, I am indebted to Kevin Kelly for giving me the idea.
 +http://​www.xymon.com/​archive/​2010/​08/​msg00016.html
 +
 +It was a great way to keep myself from going crazy on what was panning out to be a very slow and boring day. :-)
 +
 +===== Changelog =====
 +
 +  * **2010-08-11**
 +    * Initial release
  
  • monitors/hba.ksh.txt
  • Last modified: 2010/08/24 01:43
  • (external edit)