monitors:hba.ksh

no way to compare when less than two revisions

Differences

This shows you the differences between two versions of the page.


monitors:hba.ksh [2010/08/24 01:43] (current) – created - external edit 127.0.0.1
Line 1: Line 1:
 +====== hba.ksh ======
 +
 +^ Author | [[ everett.vernon@gmail.com | Vernon Everett ]] |
 +^ Compatibility | Should work on all versions |
 +^ Requirements | Solaris |
 +^ Download | None |
 +^ Last Update | 2010-08-11 |
 +
 +===== Description =====
 +A monitoring script to keep tabs on your HBAs.
 +It can check they are 
 +  * Online
 +  * Running at optimal speed
 +  * Not getting errors
 +  * Correctly multi-pathed
 +It will also
 +  * List SCSI devices
 +  * List remote ports
 +  * List multi-path info, even if you don't want it to generate alerts.
 +Alerts are configurable, as are what information to display, and this can be defined at server level, using client-local.cfg (See comments in script, lines 22-25)
 +Override variables are (With default value)
 +  * CHECKSPEED=true     -- Check HBAs at optimal speed and alert if not
 +  * CHECKONLINE=true    -- Check HBAs are online and alert if not
 +  * CHECKERRS=true      -- Check for link errors and alert if not
 +  * CHECKMPATH=true     -- Show multipath info and alert if path is down. (See MPATHFAILCOL)
 +  * LIST_MPATH=true     -- List the multi-path info only. (No alerts, but will be ignored if CHECKMPATH is true)
 +  * LIST_SCSI=true      -- List the SCSI devices
 +  * LIST_REMOTE=true    -- List the remote ports
 +  * ERR_YELLOW=3        -- Number of link errors for a yellow
 +  * ERR_RED=100         -- Number of link errors for a red
 +  * MPATHFAILCOL=yellow -- Colour of a multipath fail
 +
 +Some may notice that this script could probably have been much shorter, and simpler to create using Perl, but that would require me to learn enough Perl to do so.
 +Anybody feeling energetic enough to redo it in Perl - knock yourself out.
 +
 +===== Installation =====
 +=== Client side ===
 +1. Copy hba.ksh to ~$HOME/client/ext
 +
 +2. Edit the ''client/etc/clientlaunch.cfg'' and insert the following text:
 +
 +  [hba]
 +          ENVFILE $HOBBITCLIENTHOME/etc/hobbitclient.cfg
 +          CMD $HOBBITCLIENTHOME/ext/hba.ksh
 +          LOGFILE $HOBBITCLIENTHOME/logs/hba.log
 +          INTERVAL 5m
 +=== Server side ===
 +If you want to override some of the default variables in the script, add them to client-local.cfg under the appropriate label.
 +  HBA:export CHECKERRS=false
 +  HBA:export ERR_RED=20
 +  HBA:export LIST_SCSI=false
 +
 +See Description above for more.
 +
 +===== Source =====
 +==== hba.ksh ====
 +
 +<hidden onHidden="Show Code ⇲" onVisible="Hide Code ⇱">
 +<code bash>
 +#!/bin/ksh
 +# HBA monitoring script
 +# Author : Vernon Everett  - everett.vernon(at)gmaildotcom
 +# Development History
 +# Date       | Author         | Summary
 +#---------------------------------------------------------------
 +# 10/08/2010 | Vernon Everett | Initial release. 
 +# 11/08/2010 | Vernon Everett | Added override variables
 +#                             | Added mult-path, SCSI details, remote port info
 +#                             
 +#---------------------------------------------------------------
 +if [ -x /usr/bin/zonename ]
 +then
 +    [ $(/usr/bin/zonename) == "global" ] || exit 0  # I only run on global zones
 +fi
 +TEMPFILE=$BBTMP/hba.$$
 +FCINFO="/opt/csw/bin/sudo /usr/sbin/fcinfo"
 +MPATHADM="/usr/sbin/mpathadm" # sudo probably not needed
 +COLOUR=green
 +# Define what to check and default threshholds for the error counts
 +CHECKSPEED=true
 +CHECKONLINE=true
 +CHECKERRS=true
 +CHECKMPATH=true
 +LIST_MPATH=true
 +LIST_SCSI=true
 +LIST_REMOTE=true
 +ERR_YELLOW=3
 +ERR_RED=100
 +MPATHFAILCOL=yellow
 +# Now we define them, let's over-ride them if defined in clientlocal.cfg
 +# Add lines like this in clientlocal.cfg to override the defaults.
 +# HBA:export CHECKERRS=false
 +# HBA:export ERR_RED=20
 +LOGFETCH=${BBTMP}/logfetch.$(uname -n).cfg
 +if [ -f $LOGFETCH ]
 +then
 +   grep "^HBA:" $LOGFETCH | cut -d":" -f2 \
 +                          | while read NEW_DEF
 +                            do
 +                                $NEW_DEF
 +                            done
 +fi
 +
 +date > $TEMPFILE.out
 +$FCINFO hba-port | grep "No Adapters Found" > /dev/null
 +if [ $? -eq 0 ]
 +then
 +   # There are no adapters to work with.
 +   echo "No Adapters Found" >> $TEMPFILE.out
 +   # Let's skip the rest of the crap
 +else
 +   $FCINFO hba-port | grep "HBA Port WWN:" \
 +                    | cut -d":" -f2 \
 +                    | while read WWN
 +                      do
 +                         $FCINFO hba-port -l $WWN
 +                      done >> $TEMPFILE
 +   if [ "$CHECKONLINE" = "true" ]
 +   then
 +      cat $TEMPFILE | while read LINE
 +                      do
 +                         ONLINE=$(echo "$LINE" | grep "State:" | cut -d":" -f2 | sed 's/^[ ]*//;s/[ ]*$//' )
 +                         if [ -n "$ONLINE" ]
 +                         then
 +                            if [ "$ONLINE" = "online" ]
 +                            then
 +                               echo "&green $LINE" >> $TEMPFILE.online
 +                            else
 +                               echo "&red $LINE" >> $TEMPFILE.online
 +                               COLOUR=red
 +                            fi
 +                         else
 +                            echo "$LINE" >> $TEMPFILE.online
 +                         fi
 +                      done
 +                      [ "$COLOUR" = "red" ] && echo "&red HBA not online" >> $TEMPFILE.out
 +      mv $TEMPFILE.online $TEMPFILE
 +   fi
 +
 +   if [ "$CHECKSPEED" = "true" ]
 +   then
 +      cat $TEMPFILE | while read LINE
 +                      do
 +                         echo "$LINE" | grep "^HBA" > /dev/null && MAXSPEED="" && CURRSPEED="" && SPEEDS=""
 +                         SPEEDS=$(echo "$LINE" | grep "Supported Speeds:")
 +                         [ -n "$SPEEDS" ] && MAXSPEED=$(echo "$SPEEDS" | awk '{ print $NF }')
 +                         CURRSPEED=$(echo "$LINE" | grep "Current Speed:" | awk '{ print $NF }')
 +                         if [ -n "$CURRSPEED" -a "$CURRSPEED" != "$MAXSPEED" ]
 +                         then
 +                            [ "$COLOUR" != "red" ] && COLOUR="yellow"
 +                            echo "&yellow Some HBAs not at optimal speed" >> $TEMPFILE.out
 +                            echo "$LINE" | sed "s/Current/\&yellow Current/g" >> $TEMPFILE.speed
 +                            MAXSPEED=""
 +                            SPEEDS=""
 +                            CURRSPEED=""
 +                         else
 +                            echo "$LINE" | sed "s/Current/\&green Current/g" >> $TEMPFILE.speed
 +                         fi
 +                      done
 +      mv $TEMPFILE.speed $TEMPFILE
 +   fi
 +
 +   TCOLOUR=$COLOUR
 +   COLOUR=green
 +   if [ "$CHECKERRS" = "true" ]
 +   then
 +      cat $TEMPFILE | while read LINE
 +      do
 +         LCOL=green
 +         ERRLINE=$(echo "$LINE" | grep "Count:")
 +         if [ -n "$ERRLINE" ]
 +         then
 +            ERRCOUNT=$(echo "$ERRLINE" | cut -d":" -f2)
 +            [ $ERRCOUNT -lt $ERR_YELLOW ] && LCOL=green
 +            [ $ERRCOUNT -ge $ERR_YELLOW ] && LCOL=yellow
 +            [ $ERRCOUNT -ge $ERR_RED ] && LCOL=red
 +            echo "&$LCOL $LINE" >> $TEMPFILE.err
 +         else
 +            echo "$LINE" >> $TEMPFILE.err
 +         fi
 +         [ "$LCOL" = "red" ]&& COLOUR=red
 +         [ "$LCOL" = "yellow" -a "$COLOUR" != "red" ] && COLOUR=yellow
 +      done
 +      [ "$COLOUR" = "red" ] && echo "&red Critical error count detected" >> $TEMPFILE.out
 +      [ "$COLOUR" = "yellow" ] && echo "&yellow High error count detected" >> $TEMPFILE.out
 +      mv $TEMPFILE.err $TEMPFILE
 +   fi
 +   [ "$TCOLOUR" = "red" ] && COLOUR="red"
 +   [ "$TCOLOUR" = "yellow" -a "$COLOUR" != "red" ] && COLOUR= "yellow"
 +
 +   if [ "$CHECKMPATH" = "true" -o "$LIST_MPATH" = "true" ]
 +   then
 +      rm $TEMPFILE.badpath 2> /dev/null  # Make sure it's not there
 +      echo "<hr width="50%" size="3" />" >> $TEMPFILE.path.out
 +      echo "" >> $TEMPFILE.path.out
 +      echo "<b>Multi-Pathing</b>" >> $TEMPFILE.path.out
 +      echo "" >> $TEMPFILE.path.out
 +      $MPATHADM list lu > $TEMPFILE.path
 +      if [ "$CHECKMPATH" = "true" ]
 +      then
 +         cp $TEMPFILE.path $TEMPFILE.path.colours
 +         cat $TEMPFILE.path | awk '{ FS=":" ; print $NF }' \
 +                            | nawk 'ORS=NR%3?" ":"\n"' \
 +                            | while read DEV TOTPATH OPERPATH
 +                              do
 +                                 LCOL=green
 +                                 if [ $TOTPATH -ne $OPERPATH ]
 +                                 then
 +                                    LCOL=$MPATHFAILCOL
 +                                    MPATH=bad
 +                                    [ $COLOUR != "red" ] && COLOUR=$MPATHFAILCOL
 +                                 fi
 +                                 SEDDEV=$(echo $DEV | sed "s/\//\\\\\//g")
 +                                 sed "s/$SEDDEV/\&$LCOL&/g" $TEMPFILE.path.colours > $TEMPFILE.path.tmp
 +                                 mv $TEMPFILE.path.tmp $TEMPFILE.path.colours
 +                              done
 +         mv $TEMPFILE.path.colours $TEMPFILE.path
 +         [ "$MPATH" = "bad" ] && echo "&$MPATHFAILCOL Multipath error detected" >> $TEMPFILE.out
 +      fi
 +      cat $TEMPFILE.path >> $TEMPFILE.path.out
 +      mv $TEMPFILE.path.out $TEMPFILE.path
 +      echo >> $TEMPFILE.path
 +   fi
 +
 +   cat $TEMPFILE | while read LINE
 +   do
 +      echo "$LINE" | grep "HBA Port" > /dev/null
 +      if [ $? -eq 0 ]
 +      then
 +         echo "<b>" >> $TEMPFILE.out
 +         echo "$LINE</b>" >> $TEMPFILE.out
 +      else
 +         echo "$LINE" >> $TEMPFILE.out
 +      fi
 +   done
 +   [ "$CHECKMPATH" = "true" -o "$LIST_MPATH" = "true" ] && cat $TEMPFILE.path >> $TEMPFILE.out
 +   rm $TEMPFILE.path 2>/dev/null
 +
 +
 +   if [ "$LIST_REMOTE" = "true" ]
 +   then
 +      echo "<hr width="50%" size="3" />" >> $TEMPFILE.out
 +      echo "" >> $TEMPFILE.out
 +      echo "<b>Remote Port Listing</b>" >> $TEMPFILE.out
 +      echo "" >> $TEMPFILE.out
 +      $FCINFO hba-port | grep "HBA Port WWN:" \
 +                       | cut -d":" -f2  \
 +                       | while read WWN
 +                         do
 +                            $FCINFO remote-port -p $WWN
 +                         done >> $TEMPFILE.out
 +   fi
 +
 +   if [ "$LIST_SCSI" = "true" ]
 +   then
 +      FIRST=true
 +      echo "<hr width="50%" size="3" />" >> $TEMPFILE.out
 +      echo "<b>SCSI Device Information</b>" >> $TEMPFILE.out
 +      echo "" >> $TEMPFILE.out
 +      $FCINFO hba-port | grep "HBA Port WWN:" \
 +                       | cut -d":" -f2  \
 +                       | while read WWN;
 +                         do
 +                            $FCINFO remote-port -s -p $WWN
 +                         done > $TEMPFILE
 +      cat $TEMPFILE | while read LINE
 +                      do
 +                         echo $LINE | grep "^Remote Port WWN" >/dev/null
 +                         RES=$?
 +                         [ $RES -eq 0 -a "$FIRST" = "false" ] && echo "<hr width="50%" size="3" />" >> $TEMPFILE.out
 +                         [ $RES -eq 0 -a "$FIRST" = "true" ] && FIRST=false
 +                         echo "$LINE" >> $TEMPFILE.out
 +                      done
 +   fi
 +fi
 +
 +$BB $BBDISP "status $MACHINE.hba $COLOUR $(cat $TEMPFILE.out)"
 +rm $TEMPFILE $TEMPFILE.out 2> /dev/null
 +
 +</code>
 +</hidden>
 +
 +===== Known  Bugs and Issues =====
 +**PUBLIC SERVICE WARNING**
 +I have not had the opportunity to test this script under all fail conditions!
 +
 +This means that there could be bugs that I am not aware of.
 +
 +If you spot any, please let me know, and I will see what I can do about it.
 +
 +
 +No bugs known - at the moment.
 +
 +===== To Do =====
 +Fix any bugs that come along.
 +
 +Features?
 +
 +No. I am done with this one for now. I leave it to the rest of the community to add any more features.
 +
 +
 +===== Credits =====
 +I am going to take the blame for this one.
 +
 +However, I am indebted to Kevin Kelly for giving me the idea.
 +http://www.xymon.com/archive/2010/08/msg00016.html
 +
 +It was a great way to keep myself from going crazy on what was panning out to be a very slow and boring day. :-)
 +
 +===== Changelog =====
 +
 +  * **2010-08-11**
 +    * Initial release
  
  • monitors/hba.ksh.txt
  • Last modified: 2010/08/24 01:43
  • by 127.0.0.1