Author | Vernon Everett |
Compatibility | Should work on all versions |
Requirements | Solaris |
Download | None |
Last Update | 2010-08-11 |
A monitoring script to keep tabs on your HBAs.
It can check they are
It will also
Alerts are configurable, as are what information to display, and this can be defined at server level, using client-local.cfg (See comments in script, lines 22-25)
Override variables are (With default value)
Some may notice that this script could probably have been much shorter, and simpler to create using Perl, but that would require me to learn enough Perl to do so.
Anybody feeling energetic enough to redo it in Perl - knock yourself out.
Client side
1. Copy hba.ksh to ~$HOME/client/ext
2. Edit the client/etc/clientlaunch.cfg
and insert the following text:
[hba]
ENVFILE $HOBBITCLIENTHOME/etc/hobbitclient.cfg
CMD $HOBBITCLIENTHOME/ext/hba.ksh
LOGFILE $HOBBITCLIENTHOME/logs/hba.log
INTERVAL 5m
Server side
If you want to override some of the default variables in the script, add them to client-local.cfg under the appropriate label.
HBA:export CHECKERRS=false
HBA:export ERR_RED=20
HBA:export LIST_SCSI=false
See Description above for more.
#!/bin/ksh
# HBA monitoring script
# Author : Vernon Everett - everett.vernon(at)gmaildotcom
# Development History
# Date | Author | Summary
#---------------------------------------------------------------
# 10/08/2010 | Vernon Everett | Initial release.
# 11/08/2010 | Vernon Everett | Added override variables
# | Added mult-path, SCSI details, remote port info
# |
#---------------------------------------------------------------
if [ -x /usr/bin/zonename ]
then
[ $(/usr/bin/zonename) == "global" ] || exit 0 # I only run on global zones
fi
TEMPFILE=$BBTMP/hba.$$
FCINFO="/opt/csw/bin/sudo /usr/sbin/fcinfo"
MPATHADM="/usr/sbin/mpathadm" # sudo probably not needed
COLOUR=green
# Define what to check and default threshholds for the error counts
CHECKSPEED=true
CHECKONLINE=true
CHECKERRS=true
CHECKMPATH=true
LIST_MPATH=true
LIST_SCSI=true
LIST_REMOTE=true
ERR_YELLOW=3
ERR_RED=100
MPATHFAILCOL=yellow
# Now we define them, let's over-ride them if defined in clientlocal.cfg
# Add lines like this in clientlocal.cfg to override the defaults.
# HBA:export CHECKERRS=false
# HBA:export ERR_RED=20
LOGFETCH=${BBTMP}/logfetch.$(uname -n).cfg
if [ -f $LOGFETCH ]
then
grep "^HBA:" $LOGFETCH | cut -d":" -f2 \
| while read NEW_DEF
do
$NEW_DEF
done
fi
date > $TEMPFILE.out
$FCINFO hba-port | grep "No Adapters Found" > /dev/null
if [ $? -eq 0 ]
then
# There are no adapters to work with.
echo "No Adapters Found" >> $TEMPFILE.out
# Let's skip the rest of the crap
else
$FCINFO hba-port | grep "HBA Port WWN:" \
| cut -d":" -f2 \
| while read WWN
do
$FCINFO hba-port -l $WWN
done >> $TEMPFILE
if [ "$CHECKONLINE" = "true" ]
then
cat $TEMPFILE | while read LINE
do
ONLINE=$(echo "$LINE" | grep "State:" | cut -d":" -f2 | sed 's/^[ ]*//;s/[ ]*$//' )
if [ -n "$ONLINE" ]
then
if [ "$ONLINE" = "online" ]
then
echo "&green $LINE" >> $TEMPFILE.online
else
echo "&red $LINE" >> $TEMPFILE.online
COLOUR=red
fi
else
echo "$LINE" >> $TEMPFILE.online
fi
done
[ "$COLOUR" = "red" ] && echo "&red HBA not online" >> $TEMPFILE.out
mv $TEMPFILE.online $TEMPFILE
fi
if [ "$CHECKSPEED" = "true" ]
then
cat $TEMPFILE | while read LINE
do
echo "$LINE" | grep "^HBA" > /dev/null && MAXSPEED="" && CURRSPEED="" && SPEEDS=""
SPEEDS=$(echo "$LINE" | grep "Supported Speeds:")
[ -n "$SPEEDS" ] && MAXSPEED=$(echo "$SPEEDS" | awk '{ print $NF }')
CURRSPEED=$(echo "$LINE" | grep "Current Speed:" | awk '{ print $NF }')
if [ -n "$CURRSPEED" -a "$CURRSPEED" != "$MAXSPEED" ]
then
[ "$COLOUR" != "red" ] && COLOUR="yellow"
echo "&yellow Some HBAs not at optimal speed" >> $TEMPFILE.out
echo "$LINE" | sed "s/Current/\&yellow Current/g" >> $TEMPFILE.speed
MAXSPEED=""
SPEEDS=""
CURRSPEED=""
else
echo "$LINE" | sed "s/Current/\&green Current/g" >> $TEMPFILE.speed
fi
done
mv $TEMPFILE.speed $TEMPFILE
fi
TCOLOUR=$COLOUR
COLOUR=green
if [ "$CHECKERRS" = "true" ]
then
cat $TEMPFILE | while read LINE
do
LCOL=green
ERRLINE=$(echo "$LINE" | grep "Count:")
if [ -n "$ERRLINE" ]
then
ERRCOUNT=$(echo "$ERRLINE" | cut -d":" -f2)
[ $ERRCOUNT -lt $ERR_YELLOW ] && LCOL=green
[ $ERRCOUNT -ge $ERR_YELLOW ] && LCOL=yellow
[ $ERRCOUNT -ge $ERR_RED ] && LCOL=red
echo "&$LCOL $LINE" >> $TEMPFILE.err
else
echo "$LINE" >> $TEMPFILE.err
fi
[ "$LCOL" = "red" ]&& COLOUR=red
[ "$LCOL" = "yellow" -a "$COLOUR" != "red" ] && COLOUR=yellow
done
[ "$COLOUR" = "red" ] && echo "&red Critical error count detected" >> $TEMPFILE.out
[ "$COLOUR" = "yellow" ] && echo "&yellow High error count detected" >> $TEMPFILE.out
mv $TEMPFILE.err $TEMPFILE
fi
[ "$TCOLOUR" = "red" ] && COLOUR="red"
[ "$TCOLOUR" = "yellow" -a "$COLOUR" != "red" ] && COLOUR= "yellow"
if [ "$CHECKMPATH" = "true" -o "$LIST_MPATH" = "true" ]
then
rm $TEMPFILE.badpath 2> /dev/null # Make sure it's not there
echo "<hr width="50%" size="3" />" >> $TEMPFILE.path.out
echo "" >> $TEMPFILE.path.out
echo "<b>Multi-Pathing</b>" >> $TEMPFILE.path.out
echo "" >> $TEMPFILE.path.out
$MPATHADM list lu > $TEMPFILE.path
if [ "$CHECKMPATH" = "true" ]
then
cp $TEMPFILE.path $TEMPFILE.path.colours
cat $TEMPFILE.path | awk '{ FS=":" ; print $NF }' \
| nawk 'ORS=NR%3?" ":"\n"' \
| while read DEV TOTPATH OPERPATH
do
LCOL=green
if [ $TOTPATH -ne $OPERPATH ]
then
LCOL=$MPATHFAILCOL
MPATH=bad
[ $COLOUR != "red" ] && COLOUR=$MPATHFAILCOL
fi
SEDDEV=$(echo $DEV | sed "s/\//\\\\\//g")
sed "s/$SEDDEV/\&$LCOL&/g" $TEMPFILE.path.colours > $TEMPFILE.path.tmp
mv $TEMPFILE.path.tmp $TEMPFILE.path.colours
done
mv $TEMPFILE.path.colours $TEMPFILE.path
[ "$MPATH" = "bad" ] && echo "&$MPATHFAILCOL Multipath error detected" >> $TEMPFILE.out
fi
cat $TEMPFILE.path >> $TEMPFILE.path.out
mv $TEMPFILE.path.out $TEMPFILE.path
echo >> $TEMPFILE.path
fi
cat $TEMPFILE | while read LINE
do
echo "$LINE" | grep "HBA Port" > /dev/null
if [ $? -eq 0 ]
then
echo "<b>" >> $TEMPFILE.out
echo "$LINE</b>" >> $TEMPFILE.out
else
echo "$LINE" >> $TEMPFILE.out
fi
done
[ "$CHECKMPATH" = "true" -o "$LIST_MPATH" = "true" ] && cat $TEMPFILE.path >> $TEMPFILE.out
rm $TEMPFILE.path 2>/dev/null
if [ "$LIST_REMOTE" = "true" ]
then
echo "<hr width="50%" size="3" />" >> $TEMPFILE.out
echo "" >> $TEMPFILE.out
echo "<b>Remote Port Listing</b>" >> $TEMPFILE.out
echo "" >> $TEMPFILE.out
$FCINFO hba-port | grep "HBA Port WWN:" \
| cut -d":" -f2 \
| while read WWN
do
$FCINFO remote-port -p $WWN
done >> $TEMPFILE.out
fi
if [ "$LIST_SCSI" = "true" ]
then
FIRST=true
echo "<hr width="50%" size="3" />" >> $TEMPFILE.out
echo "<b>SCSI Device Information</b>" >> $TEMPFILE.out
echo "" >> $TEMPFILE.out
$FCINFO hba-port | grep "HBA Port WWN:" \
| cut -d":" -f2 \
| while read WWN;
do
$FCINFO remote-port -s -p $WWN
done > $TEMPFILE
cat $TEMPFILE | while read LINE
do
echo $LINE | grep "^Remote Port WWN" >/dev/null
RES=$?
[ $RES -eq 0 -a "$FIRST" = "false" ] && echo "<hr width="50%" size="3" />" >> $TEMPFILE.out
[ $RES -eq 0 -a "$FIRST" = "true" ] && FIRST=false
echo "$LINE" >> $TEMPFILE.out
done
fi
fi
$BB $BBDISP "status $MACHINE.hba $COLOUR $(cat $TEMPFILE.out)"
rm $TEMPFILE $TEMPFILE.out 2> /dev/null
Known Bugs and Issues
PUBLIC SERVICE WARNING
I have not had the opportunity to test this script under all fail conditions!
This means that there could be bugs that I am not aware of.
If you spot any, please let me know, and I will see what I can do about it.
No bugs known - at the moment.
Fix any bugs that come along.
Features?
No. I am done with this one for now. I leave it to the rest of the community to add any more features.
I am going to take the blame for this one.
However, I am indebted to Kevin Kelly for giving me the idea.
http://www.xymon.com/archive/2010/08/msg00016.html
It was a great way to keep myself from going crazy on what was panning out to be a very slow and boring day.