monitors:hardware_sensors

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revisionBoth sides next revision
monitors:hardware_sensors [2013/09/27 10:05] – [Hardware monitoring] doctor_madnessmonitors:hardware_sensors [2022/12/11 11:08] – old revision restored (2022/12/11 11:04) doktoil_makresh
Line 1: Line 1:
 ====== Hardware monitoring ====== ====== Hardware monitoring ======
  
-^ Author | [[ doctor@makelofine.org | Damien Martins ]] | +^ Author         | [[doctor@makelofine.org| Damien Martins ]]                    
-^ Compatibility | Xymon 4.2.2/4.3.12 | +^ Compatibility  | Xymon 4.2.2/4.3.12                                            
-^ Requirements | sh (or bash), hddtemp, smartmontools | +^ Requirements   | sh (or bash), hddtemp, smartmontools                          
-^ Download | https://www.makelofine.org/xymon-plugins/hobbit-hardware-v0.5.tar.bz2 +^ Download       Part of https://github.com/doktoil-makresh/xymon-plugins.git  
-^ Last Update | 2013-09-27 |+^ Last Update    2022-07-13                                                    |
  
 ===== Description ===== ===== Description =====
Line 11: Line 11:
 ===== Installation ===== ===== Installation =====
 === Client side === === Client side ===
-Untar this package, put hobbit-hardware.sh in $BBHOME/ext directory +Untar this package, put hobbit-hardware.sh in $XYMONCLIENTHOME/ext directory 
-Put hobbit-hardware.conf in $BBHOME/etc directory+Put xymon-hardware.cfg in $XYMONCLIENTHOME/etc directory
 Modify variables in both files to fit your needs/system Modify variables in both files to fit your needs/system
 +User 'xymon' should be allowed to use sudo on some commands (check variables including 'sudo' in xymon-hardware.sh)
 === Server side === === Server side ===
-Add hardware to you $BBHOME/server/bb-hosts line for the host running this script+Add hardware to you $XYMONHOME/server/hosts line for the host running this script
  
 ===== Source ===== ===== Source =====
Line 24: Line 25:
  
 # ALL THIS SCRIPT IS UNDER GPL LICENSE # ALL THIS SCRIPT IS UNDER GPL LICENSE
-# Version 0.4 +# Version 0.6 
-# Title:     hobbit-hardware+# Title:     xymon-hardware
 # Author:    Damien Martins  ( doctor |at| makelofine |dot| org) # Author:    Damien Martins  ( doctor |at| makelofine |dot| org)
-# Date:      2013-06-27+# Date:      2013-09-27
 # Purpose:   Check Uni* hardware sensors # Purpose:   Check Uni* hardware sensors
 # Platforms: Uni* having lm-sensor and hddtemp utilities # Platforms: Uni* having lm-sensor and hddtemp utilities
 # Tested:    Xymon 4.3.4 / hddtemp version 0.3-beta15 (Debian Lenny and Etch packages) / sensors version 3.0.2 with libsensors version 3.0.2 (Debian Lenny package) / sensors version 3.0.1 with libsensors version 3.0.1 (Debian Etch package) # Tested:    Xymon 4.3.4 / hddtemp version 0.3-beta15 (Debian Lenny and Etch packages) / sensors version 3.0.2 with libsensors version 3.0.2 (Debian Lenny package) / sensors version 3.0.1 with libsensors version 3.0.1 (Debian Etch package)
-  
-#TODO for v0.5 
-#       -To be independent of /etc/sensors.conf -> we get raw values, and we set right ones from those, and define thresolds in xymon-hardware.conf file 
-# -Support for multiples sensors 
-# -Support for independant temperatures thresolds for each disk 
-# 
-# History : 
-# 27 jun 2013 - Damien Martins and Xavier Carol i Rosell 
-# v0.4 : Fix hddtemp output handling (print last field instead of field N)  
-# 09 sep 2011 - Damien Martins 
-# v0.3 : Add support for OpenManage Physical disks, temps 
-# 17 feb 2010 - Damien Martins 
-# v0.2.2 : Minor code optimizations 
-# 22 jan 2010 - Damien Martins 
-# v0.2.1 : Minor bug fix 
-# 14 nov 2009 - Damien Martins 
-# v0.2 : -Getting sensor probe no more hard coded 
-# -More verbosity when commands fail 
-# -Disk temperature thresolds in xymon-hardware.conf file. 
-# -Support smartctl to replace hddtemp (if needed) 
-# -Possibility to disable lm-sensors 
-# -Possibility to choose smartctl chipset 
-# 25 jun 2009 - Damien Martins 
-#       v0.1.2 : -New error messages (more verbose, more accurate) 
-# 18 jun 2009 - Damien Martins 
-#       v0.1.1 : -Bug fixes 
-# 15 jan 2009 - Damien Martins 
-#        v0.1 : First lines, trying to get : 
-#       -temperatures value, and defined thresolds 
-#       -fan rotation speed and thresold 
-#       -voltages and thresolds 
-#       -HDD temperature (thresold is not include, so we set it in this file) 
    
 ################################################################################# #################################################################################
Line 69: Line 38:
    
 #This script should be stored in ext directory, located in Xymon/Xymon client home (typically ~xymon/client/ext or ~xymon/client/ext). #This script should be stored in ext directory, located in Xymon/Xymon client home (typically ~xymon/client/ext or ~xymon/client/ext).
-#You must configure the xymon-hardware.conf file (or whatever name defined in CONFIG_FILE +#You must configure the xymon-hardware.cfg file (or whatever name defined in CONFIG_FILE)
- +
-#Change to fit your system/wills : +
-TEST="hardware" +
-MSG_FILE="${BBTMP}/xymon-hardware.msg" +
-CONFIG_FILE="${HOBBITCLIENTHOME}/etc/xymon-hardware.conf" +
-TMP_FILE="${BBTMP}/xymon-hardware.tmp" +
-CMD_HDDTEMP="sudo /usr/sbin/hddtemp" +
-SENSORS="/usr/bin/sensors" +
-BC="/usr/bin/bc" +
-SUDO="/usr/bin/sudo" +
-SMARTCTL="/usr/sbin/smartctl" +
-OMREPORT="/opt/dell/srvadmin/sbin/omreport"+
  
 #Debug #Debug
Line 87: Line 44:
  echo "Debug ON"  echo "Debug ON"
         BB=echo         BB=echo
-        HOBBITCLIENTHOME="/usr/local/xymon/client/" +        XYMONCLIENTHOME="/usr/local/Xymon/client/" 
-        BBTMP="$PWD"+        XYMONTMP="$PWD"
         BBDISP=your_xymon_server         BBDISP=your_xymon_server
         MACHINE=$(hostname)         MACHINE=$(hostname)
Line 98: Line 55:
  DATE="/bin/date"  DATE="/bin/date"
  SED="/bin/sed"  SED="/bin/sed"
- CONFIG_FILE="xymon-hardware.conf"+ CONFIG_FILE="xymon-hardware.cfg"
  TMP_FILE="xymon-hardware.tmp"  TMP_FILE="xymon-hardware.tmp"
  MSG_FILE="xymon-hardware.msg"  MSG_FILE="xymon-hardware.msg"
 fi fi
 +
 +#Change to fit your system/wills :
 +TEST="hardware"
 +MSG_FILE="${XYMONTMP}/xymon-hardware.msg"
 +CONFIG_FILE="${XYMONCLIENTHOME}/etc/xymon-hardware.cfg"
 +TMP_FILE="${XYMONTMP}/xymon-hardware.tmp"
 +CMD_HDDTEMP="sudo /usr/sbin/hddtemp"
 +SENSORS="/usr/bin/sensors"
 +BC="/usr/bin/bc"
 +SMARTCTL="sudo /usr/sbin/smartctl"
 +OMREPORT="/opt/dell/srvadmin/sbin/omreport"
 +HPACUCLI="sudo /usr/sbin/hpacucli"
  
 #Don't change anything from here (or assume all responsibility) #Don't change anything from here (or assume all responsibility)
Line 108: Line 77:
  
 #Basic tests : #Basic tests :
-if [ -z "$HOBBITCLIENTHOME" ] ; then +if [ -z "$XYMONCLIENTHOME" ] ; then 
-        echo "HOBBITCLIENTHOME not defined !"+        echo "XYMONCLIENTHOME not defined !"
         exit 1         exit 1
 fi fi
-if [ -z "$BBTMP" ] ; then +if [ -z "$XYMONTMP" ] ; then 
-        echo "BBTMP not defined !"+        echo "XYMONTMP not defined !"
         exit 1         exit 1
 fi fi
Line 175: Line 144:
 fi fi
 for DISK in $("$GREP" "^DISK=" "$CONFIG_FILE" | "$SED" s/^DISK=//) ; do for DISK in $("$GREP" "^DISK=" "$CONFIG_FILE" | "$SED" s/^DISK=//) ; do
- HDD_TEMP="$($SUDO $SMARTCTL $SMARTCTL_ARGS $DISK | $GREP "^194" | $AWK '{print $10}')"+ HDD_TEMP="$($SMARTCTL $SMARTCTL_ARGS $DISK | $GREP "^194" | $AWK '{print $10}')"
         if [ ! "$(echo $HDD_TEMP | grep "^[ [:digit:] ]*$")" ] ; then         if [ ! "$(echo $HDD_TEMP | grep "^[ [:digit:] ]*$")" ] ; then
                 RED=1                 RED=1
Line 344: Line 313:
 function use_openmanage () function use_openmanage ()
 { {
-rm -f ${BBTMP}/xymon-hardware_volts.tmp ${BBTMP}/xymon-hardware_fans.tmp ${BBTMP}/xymon-hardware_disks.tmp+rm -f ${XYMONTMP}/xymon-hardware_volts.tmp ${XYMONTMP}/xymon-hardware_fans.tmp ${XYMONTMP}/xymon-hardware_disks.tmp
 #Tests temperatures : #Tests temperatures :
  CHASSIS_TEMP=$($OMREPORT chassis temps | grep Reading |awk '{print $3}' | $AWK -F\. '{print $1}')  CHASSIS_TEMP=$($OMREPORT chassis temps | grep Reading |awk '{print $3}' | $AWK -F\. '{print $1}')
Line 356: Line 325:
  CHASSIS_TEMP_STATUS=red  CHASSIS_TEMP_STATUS=red
  echo "&red La temperature du chassis est en ALERTE !!! :  echo "&red La temperature du chassis est en ALERTE !!! :
-temperature_chassis: $CHASSIS_TEMP" >> ${BBTMP}/xymon-hardware.msg+temperature_chassis: $CHASSIS_TEMP" >> $MSG_FILE
  RED=1  RED=1
  elif [ $CHASSIS_TEMP -ge $CHASSIS_TEMP_WARNING ] ; then  elif [ $CHASSIS_TEMP -ge $CHASSIS_TEMP_WARNING ] ; then
Line 362: Line 331:
  YELLOW=1  YELLOW=1
  echo "&yellow La temperature du chassis est en LIMITE-LIMITE !!! :  echo "&yellow La temperature du chassis est en LIMITE-LIMITE !!! :
-temperature_chassis: $CHASSIS_TEMP" >> ${BBTMP}/xymon-hardware.msg+temperature_chassis: $CHASSIS_TEMP" >> $MSG_FILE
  elif [ $CHASSIS_TEMP -lt $CHASSIS_TEMP_WARNING ] ; then  elif [ $CHASSIS_TEMP -lt $CHASSIS_TEMP_WARNING ] ; then
  CHASSIS_TEMP_STATUS=green  CHASSIS_TEMP_STATUS=green
- echo "&green Les voltages sont Ok !" >> ${BBTMP}/xymon-hardware.msg+ echo "&green Les voltages sont Ok !" >> $MSG_FILE
  else  else
  echo "Erreur dans les valeurs de temperatures :  echo "Erreur dans les valeurs de temperatures :
Line 379: Line 348:
  VOLT_GLOBAL_STATUS=green  VOLT_GLOBAL_STATUS=green
  else  else
- $OMREPORT chassis volts | grep -A 2 Index  |grep -v Index | grep -v "\-\-" | cut -c 29- > ${BBTMP}/xymon-hardware_volts.tmp+ $OMREPORT chassis volts | grep -A 2 Index  |grep -v Index | grep -v "\-\-" | cut -c 29- > ${XYMONTMP}/xymon-hardware_volts.tmp
  while read LINE ; do  while read LINE ; do
  echo $LINE | grep -q Status | grep -q Ok  echo $LINE | grep -q Status | grep -q Ok
  if [ $ERROR ] ; then  if [ $ERROR ] ; then
  PROBE_IN_ERROR="$LINE"  PROBE_IN_ERROR="$LINE"
- echo "&yellow Le voltage de $PROBE_IN_ERROR est incorrect !" >> ${BBTMP}/xymon-hardware.msg+ echo "&yellow Le voltage de $PROBE_IN_ERROR est incorrect !" >> $MSG_FILE
  fi  fi
  unset ERROR  unset ERROR
Line 391: Line 360:
  ERROR=1  ERROR=1
  fi  fi
- done < ${BBTMP}/xymon-hardware_volts.tmp+ done < ${XYMONTMP}/xymon-hardware_volts.tmp
  fi  fi
 if [ $VOLT_YELLOW ] ; then if [ $VOLT_YELLOW ] ; then
Line 402: Line 371:
  FANS_GLOBAL_STATUS=green  FANS_GLOBAL_STATUS=green
  else  else
- $OMREPORT chassis fans | grep -A 6 Index  |grep -v Index | grep -v "\-\-" |grep -v "N\/A" | cut -c 29- > ${BBTMP}/xymon-hardware_fans.tmp+ $OMREPORT chassis fans | grep -A 6 Index  |grep -v Index | grep -v "\-\-" |grep -v "N\/A" | cut -c 29- > ${XYMONTMP}/xymon-hardware_fans.tmp
                 while read LINE ; do                 while read LINE ; do
  if [ $NEXT_LINE == FAN_MIN_RPM ] ; then  if [ $NEXT_LINE == FAN_MIN_RPM ] ; then
  FAN_MIN_RPM=$(echo $LINE | awk '{print $1}')  FAN_MIN_RPM=$(echo $LINE | awk '{print $1}')
  echo "&yellow Le ventilateur $FAN_NAME tourne trop lentement ($FAN_RPM inferieur a ${FAN_MIN_RPM}) !!!  echo "&yellow Le ventilateur $FAN_NAME tourne trop lentement ($FAN_RPM inferieur a ${FAN_MIN_RPM}) !!!
-${FAN_NAME}_rpm: $FAN_RPM" >> ${BBTMP}/xymon-hardware_fans.msg+${FAN_NAME}_rpm: $FAN_RPM" >> ${XYMONTMP}/xymon-hardware_fans.msg
  unset NEXT_LINE  unset NEXT_LINE
  fi  fi
Line 419: Line 388:
  if [ $FAN_RPM -le 0 ] ; then  if [ $FAN_RPM -le 0 ] ; then
  FAN_RED=1  FAN_RED=1
- echo "&red Le ventilateur $FAN_NAME ne tourne plus !!!" >> ${BBTMP}/xymon-hardware_fans.msg+ echo "&red Le ventilateur $FAN_NAME ne tourne plus !!!" >> ${XYMONTMP}/xymon-hardware_fans.msg
  fi  fi
                         unset ERROR                         unset ERROR
Line 429: Line 398:
  NEXT_LINE=FAN_NAME  NEXT_LINE=FAN_NAME
                         fi                         fi
-                        done < ${BBTMP}/xymon-hardware_fans.tmp+                        done < ${XYMONTMP}/xymon-hardware_fans.tmp
         fi         fi
 if [ $FAN_RED ] ; then if [ $FAN_RED ] ; then
  RED=1  RED=1
  echo "&red Probleme avec les vitesses des ventilateurs !  echo "&red Probleme avec les vitesses des ventilateurs !
-$(cat ${BBTMP}/xymon-hardware_fans.msg)" >> ${BBTMP}/xymon-hardware.msg+$(cat ${XYMONTMP}/xymon-hardware_fans.msg)" >> $MSG_FILE
 elif [ $FAN_YELLOW ] ; then elif [ $FAN_YELLOW ] ; then
         YELLOW=1         YELLOW=1
  echo "&yellow Probleme avec les vitesses des ventilateurs !  echo "&yellow Probleme avec les vitesses des ventilateurs !
-$(cat ${BBTMP}/xymon-hardware_fans.msg)" >> ${BBTMP}/xymon-hardware.msg+$(cat ${XYMONTMP}/xymon-hardware_fans.msg)" >> $MSG_FILE
 else else
  VOLT_GLOBAL_STATUS=green  VOLT_GLOBAL_STATUS=green
- echo "&green Tout va bien avec les ventilateurs" >> ${BBTMP}/xymon-hardware.msg+ echo "&green Tout va bien avec les ventilateurs" >> $MSG_FILE
 fi fi
  
Line 447: Line 416:
 $OMREPORT storage pdisk controller=0 |grep ^Status | grep -q Ok $OMREPORT storage pdisk controller=0 |grep ^Status | grep -q Ok
 if [ $? -eq 0 ] ; then if [ $? -eq 0 ] ; then
- echo "&green Le statut des disques est Ok !" >> ${BBTMP}/xymon-hardware.msg+ echo "&green Le statut des disques est Ok !" >> $MSG_FILE
 else else
  DISK_COLOR=yellow  DISK_COLOR=yellow
- $OMREPORT storage pdisk controller=0 |grep -A 1 ^Status | grep -v "\-\-" > ${BBTMP}/xymon-hardware_disks.tmp+ $OMREPORT storage pdisk controller=0 |grep -A 1 ^Status | grep -v "\-\-" > ${XYMONTMP}/xymon-hardware_disks.tmp
  while read LINE ; do  while read LINE ; do
  echo $LINE | grep -q Status | grep -q Ok  echo $LINE | grep -q Status | grep -q Ok
  if [ $NEXT_LINE == DISK_NAME ] ; then  if [ $NEXT_LINE == DISK_NAME ] ; then
  DISK_NAME=$(echo $LINE | cut -c 29-)  DISK_NAME=$(echo $LINE | cut -c 29-)
- echo "&yellow Le disque $DISK_NAME est en mauvaise situation !" >> ${BBTMP}/xymon-hardware.msg+ echo "&yellow Le disque $DISK_NAME est en mauvaise situation !" >> $MSG_FILE
  unset NEXT_LINE  unset NEXT_LINE
  fi  fi
Line 464: Line 433:
  NEXT_LINE=DISK_NAME  NEXT_LINE=DISK_NAME
  fi  fi
- done < ${BBTMP}/xymon-hardware_disks.tmp+ done < ${XYMONTMP}/xymon-hardware_disks.tmp
   
 fi fi
 } }
 +function use_hpacucli ()
 +{
 +$HPACUCLI ctrl all show config | grep drive | while read OUTPUT ; do
 +        TYPE=$(echo $OUTPUT | awk '{print $1}' | sed s/drive//)
 +        SLOT=$(echo $OUTPUT | awk '{print $2}')
 +        STATUS=$(echo $OUTPUT | awk '{print $NF}' | sed s/\)//)
 +        if [ $TYPE == "logical" ] ; then
 +                RAID=$(echo $OUTPUT | awk '{print $6}')
 +                SIZE=$(echo $OUTPUT | awk '{print $3 $4}' | sed s/\(// | sed s/\,//)
 +                if [ "$STATUS" != "OK" ] ; then
 +                        RED=1
 +                        LINE="&red Logical drive $SLOT \(RAID $RAID, size : $SIZE\) status is BAD !!!"
 +                elif [ "$STATUS" == "OK" ] ; then
 +                        LINE="&green Logical drive $SLOT \(RAID $RAID, size : $SIZE\) status is OK"
 +                else
 +                        RED=1
 +                        LINE="&red Unknow status \(or stupid monitoring script\) for logical drive $SLOT \(RAID $RAID, size : $SIZE\) !!!"
 +                fi
 +        elif [ "$TYPE" == "physical" ] ; then
 +                SIZE=$(echo $OUTPUT | awk '{print $8 $9}' | sed s/\,//)
 +                if [ "$STATUS" != "OK" ] ; then
 +                        YELLOW=1
 +                        LINE="&yellow Physical drive in slot $SLOT \(size : $SIZE\) status is BAD !!!"
 +                elif [ "$STATUS" == "OK" ] ; then
 +                        LINE="&green Physical drive in slot $SLOT \(size : $SIZE\) status is OK"
 +                else
 +                        RED=1
 +                        LINE="&red Unknow status \(or stupid monitoring script\) for physical drive in slot $SLOT \(size : $SIZE\) !!!"
 +                fi
 +        fi
 +        echo $LINE >> $MSG_FILE
 +done
 +}
 +
 +$GREP -q ^HPACUCLI=1 $CONFIG_FILE
 +if [ $? -eq 0 ] ; then
 +        use_hpacucli
 +fi
 $GREP -q ^SMARTCTL=1 $CONFIG_FILE $GREP -q ^SMARTCTL=1 $CONFIG_FILE
 if [ $? -eq 0 ] ; then if [ $? -eq 0 ] ; then
Line 480: Line 487:
  use_openmanage  use_openmanage
 fi fi
- 
 $GREP -q ^SENSOR=1 $CONFIG_FILE $GREP -q ^SENSOR=1 $CONFIG_FILE
 if [ $? -eq 0 ] ; then if [ $? -eq 0 ] ; then
Line 503: Line 509:
  
 ===== To Do ===== ===== To Do =====
-v0.5+v0.6
   * To be independent of /etc/sensors.conf -> we get raw values, and we set right ones from those, and define thresolds in hobbit-hardware.conf file            * To be independent of /etc/sensors.conf -> we get raw values, and we set right ones from those, and define thresolds in hobbit-hardware.conf file         
   * Support for independant temperatures thresolds for each disk   * Support for independant temperatures thresolds for each disk
Line 532: Line 538:
   * **2013-06-27 v0.4**   * **2013-06-27 v0.4**
     * Fix hddtemp output handling (print last field instead of field N)     * Fix hddtemp output handling (print last field instead of field N)
 +  * **2013-09-27 v0.5**
 +    * Add support for HP monitoring tool (hpacucli)
 +  * **2022-07-13 v0.6**
 +    * Add support for disks independent temperatures
 +</code>
 +
  • monitors/hardware_sensors.txt
  • Last modified: 2022/12/11 11:12
  • by doktoil_makresh