monitors:hardware_sensors

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
monitors:hardware_sensors [2013/09/27 10:05] – [Hardware monitoring] doctor_madnessmonitors:hardware_sensors [2022/12/11 11:12] (current) – [Source] doktoil_makresh
Line 1: Line 1:
 ====== Hardware monitoring ====== ====== Hardware monitoring ======
  
-^ Author | [[ doctor@makelofine.org | Damien Martins ]] | +^ Author         | [[doctor@makelofine.org| Damien Martins ]]                    
-^ Compatibility | Xymon 4.2.2/4.3.12 | +^ Compatibility  | Xymon 4.2.2/4.3.12                                            
-^ Requirements | sh (or bash), hddtemp, smartmontools | +^ Requirements   | sh (or bash), hddtemp, smartmontools                          
-^ Download | https://www.makelofine.org/xymon-plugins/hobbit-hardware-v0.5.tar.bz2 +^ Download       Part of https://github.com/doktoil-makresh/xymon-plugins.git  
-^ Last Update | 2013-09-27 |+^ Last Update    2022-07-13                                                    |
  
 ===== Description ===== ===== Description =====
Line 11: Line 11:
 ===== Installation ===== ===== Installation =====
 === Client side === === Client side ===
-Untar this package, put hobbit-hardware.sh in $BBHOME/ext directory +Untar this package, put hobbit-hardware.sh in $XYMONCLIENTHOME/ext directory 
-Put hobbit-hardware.conf in $BBHOME/etc directory+Put xymon-hardware.cfg in $XYMONCLIENTHOME/etc directory
 Modify variables in both files to fit your needs/system Modify variables in both files to fit your needs/system
 +User 'xymon' should be allowed to use sudo on some commands (check variables including 'sudo' in xymon-hardware.sh)
 === Server side === === Server side ===
-Add hardware to you $BBHOME/server/bb-hosts line for the host running this script+Add hardware to you $XYMONHOME/server/hosts line for the host running this script
  
 ===== Source ===== ===== Source =====
Line 24: Line 25:
  
 # ALL THIS SCRIPT IS UNDER GPL LICENSE # ALL THIS SCRIPT IS UNDER GPL LICENSE
-# Version 0.4 +# Version 0.6 
-# Title:     hobbit-hardware+# Title:     xymon-hardware
 # Author:    Damien Martins  ( doctor |at| makelofine |dot| org) # Author:    Damien Martins  ( doctor |at| makelofine |dot| org)
-# Date:      2013-06-27+# Date:      2022-07-13
 # Purpose:   Check Uni* hardware sensors # Purpose:   Check Uni* hardware sensors
 # Platforms: Uni* having lm-sensor and hddtemp utilities # Platforms: Uni* having lm-sensor and hddtemp utilities
 # Tested:    Xymon 4.3.4 / hddtemp version 0.3-beta15 (Debian Lenny and Etch packages) / sensors version 3.0.2 with libsensors version 3.0.2 (Debian Lenny package) / sensors version 3.0.1 with libsensors version 3.0.1 (Debian Etch package) # Tested:    Xymon 4.3.4 / hddtemp version 0.3-beta15 (Debian Lenny and Etch packages) / sensors version 3.0.2 with libsensors version 3.0.2 (Debian Lenny package) / sensors version 3.0.1 with libsensors version 3.0.1 (Debian Etch package)
-  
-#TODO for v0.5 
-#       -To be independent of /etc/sensors.conf -> we get raw values, and we set right ones from those, and define thresolds in xymon-hardware.conf file 
-# -Support for multiples sensors 
-# -Support for independant temperatures thresolds for each disk 
-# 
-# History : 
-# 27 jun 2013 - Damien Martins and Xavier Carol i Rosell 
-# v0.4 : Fix hddtemp output handling (print last field instead of field N)  
-# 09 sep 2011 - Damien Martins 
-# v0.3 : Add support for OpenManage Physical disks, temps 
-# 17 feb 2010 - Damien Martins 
-# v0.2.2 : Minor code optimizations 
-# 22 jan 2010 - Damien Martins 
-# v0.2.1 : Minor bug fix 
-# 14 nov 2009 - Damien Martins 
-# v0.2 : -Getting sensor probe no more hard coded 
-# -More verbosity when commands fail 
-# -Disk temperature thresolds in xymon-hardware.conf file. 
-# -Support smartctl to replace hddtemp (if needed) 
-# -Possibility to disable lm-sensors 
-# -Possibility to choose smartctl chipset 
-# 25 jun 2009 - Damien Martins 
-#       v0.1.2 : -New error messages (more verbose, more accurate) 
-# 18 jun 2009 - Damien Martins 
-#       v0.1.1 : -Bug fixes 
-# 15 jan 2009 - Damien Martins 
-#        v0.1 : First lines, trying to get : 
-#       -temperatures value, and defined thresolds 
-#       -fan rotation speed and thresold 
-#       -voltages and thresolds 
-#       -HDD temperature (thresold is not include, so we set it in this file) 
    
 ################################################################################# #################################################################################
Line 69: Line 38:
    
 #This script should be stored in ext directory, located in Xymon/Xymon client home (typically ~xymon/client/ext or ~xymon/client/ext). #This script should be stored in ext directory, located in Xymon/Xymon client home (typically ~xymon/client/ext or ~xymon/client/ext).
-#You must configure the xymon-hardware.conf file (or whatever name defined in CONFIG_FILE +#You must configure the xymon-hardware.cfg file (or whatever name defined in CONFIG_FILE)
- +
-#Change to fit your system/wills : +
-TEST="hardware" +
-MSG_FILE="${BBTMP}/xymon-hardware.msg" +
-CONFIG_FILE="${HOBBITCLIENTHOME}/etc/xymon-hardware.conf" +
-TMP_FILE="${BBTMP}/xymon-hardware.tmp" +
-CMD_HDDTEMP="sudo /usr/sbin/hddtemp" +
-SENSORS="/usr/bin/sensors" +
-BC="/usr/bin/bc" +
-SUDO="/usr/bin/sudo" +
-SMARTCTL="/usr/sbin/smartctl" +
-OMREPORT="/opt/dell/srvadmin/sbin/omreport"+
  
 #Debug #Debug
Line 87: Line 44:
  echo "Debug ON"  echo "Debug ON"
         BB=echo         BB=echo
-        HOBBITCLIENTHOME="/usr/local/xymon/client/" +        XYMONCLIENTHOME="/usr/local/Xymon/client/" 
-        BBTMP="$PWD"+        XYMONTMP="$PWD"
         BBDISP=your_xymon_server         BBDISP=your_xymon_server
         MACHINE=$(hostname)         MACHINE=$(hostname)
Line 98: Line 55:
  DATE="/bin/date"  DATE="/bin/date"
  SED="/bin/sed"  SED="/bin/sed"
- CONFIG_FILE="xymon-hardware.conf"+ CONFIG_FILE="xymon-hardware.cfg"
  TMP_FILE="xymon-hardware.tmp"  TMP_FILE="xymon-hardware.tmp"
  MSG_FILE="xymon-hardware.msg"  MSG_FILE="xymon-hardware.msg"
 fi fi
 +
 +#Change to fit your system/wills :
 +TEST="hardware"
 +MSG_FILE="${XYMONTMP}/xymon-hardware.msg"
 +CONFIG_FILE="${XYMONCLIENTHOME}/etc/xymon-hardware.cfg"
 +TMP_FILE="${XYMONTMP}/xymon-hardware.tmp"
 +CMD_HDDTEMP="sudo /usr/sbin/hddtemp"
 +SENSORS="/usr/bin/sensors"
 +BC="/usr/bin/bc"
 +SMARTCTL="sudo /usr/sbin/smartctl"
 +OMREPORT="/opt/dell/srvadmin/sbin/omreport"
 +HPACUCLI="sudo /usr/sbin/hpacucli"
  
 #Don't change anything from here (or assume all responsibility) #Don't change anything from here (or assume all responsibility)
Line 108: Line 77:
  
 #Basic tests : #Basic tests :
-if [ -z "$HOBBITCLIENTHOME" ] ; then +if [ -z "$XYMONCLIENTHOME" ] ; then 
-        echo "HOBBITCLIENTHOME not defined !"+        echo "XYMONCLIENTHOME not defined !"
         exit 1         exit 1
 fi fi
-if [ -z "$BBTMP" ] ; then +if [ -z "$XYMONTMP" ] ; then 
-        echo "BBTMP not defined !"+        echo "XYMONTMP not defined !"
         exit 1         exit 1
 fi fi
Line 175: Line 144:
 fi fi
 for DISK in $("$GREP" "^DISK=" "$CONFIG_FILE" | "$SED" s/^DISK=//) ; do for DISK in $("$GREP" "^DISK=" "$CONFIG_FILE" | "$SED" s/^DISK=//) ; do
- HDD_TEMP="$($SUDO $SMARTCTL $SMARTCTL_ARGS $DISK | $GREP "^194" | $AWK '{print $10}')"+ HDD_TEMP="$($SMARTCTL $SMARTCTL_ARGS $DISK | $GREP "^194" | $AWK '{print $10}')"
         if [ ! "$(echo $HDD_TEMP | grep "^[ [:digit:] ]*$")" ] ; then         if [ ! "$(echo $HDD_TEMP | grep "^[ [:digit:] ]*$")" ] ; then
                 RED=1                 RED=1
Line 344: Line 313:
 function use_openmanage () function use_openmanage ()
 { {
-rm -f ${BBTMP}/xymon-hardware_volts.tmp ${BBTMP}/xymon-hardware_fans.tmp ${BBTMP}/xymon-hardware_disks.tmp+rm -f ${XYMONTMP}/xymon-hardware_volts.tmp ${XYMONTMP}/xymon-hardware_fans.tmp ${XYMONTMP}/xymon-hardware_disks.tmp
 #Tests temperatures : #Tests temperatures :
  CHASSIS_TEMP=$($OMREPORT chassis temps | grep Reading |awk '{print $3}' | $AWK -F\. '{print $1}')  CHASSIS_TEMP=$($OMREPORT chassis temps | grep Reading |awk '{print $3}' | $AWK -F\. '{print $1}')
Line 356: Line 325:
  CHASSIS_TEMP_STATUS=red  CHASSIS_TEMP_STATUS=red
  echo "&red La temperature du chassis est en ALERTE !!! :  echo "&red La temperature du chassis est en ALERTE !!! :
-temperature_chassis: $CHASSIS_TEMP" >> ${BBTMP}/xymon-hardware.msg+temperature_chassis: $CHASSIS_TEMP" >> $MSG_FILE
  RED=1  RED=1
  elif [ $CHASSIS_TEMP -ge $CHASSIS_TEMP_WARNING ] ; then  elif [ $CHASSIS_TEMP -ge $CHASSIS_TEMP_WARNING ] ; then
Line 362: Line 331:
  YELLOW=1  YELLOW=1
  echo "&yellow La temperature du chassis est en LIMITE-LIMITE !!! :  echo "&yellow La temperature du chassis est en LIMITE-LIMITE !!! :
-temperature_chassis: $CHASSIS_TEMP" >> ${BBTMP}/xymon-hardware.msg+temperature_chassis: $CHASSIS_TEMP" >> $MSG_FILE
  elif [ $CHASSIS_TEMP -lt $CHASSIS_TEMP_WARNING ] ; then  elif [ $CHASSIS_TEMP -lt $CHASSIS_TEMP_WARNING ] ; then
  CHASSIS_TEMP_STATUS=green  CHASSIS_TEMP_STATUS=green
- echo "&green Les voltages sont Ok !" >> ${BBTMP}/xymon-hardware.msg+ echo "&green Les voltages sont Ok !" >> $MSG_FILE
  else  else
  echo "Erreur dans les valeurs de temperatures :  echo "Erreur dans les valeurs de temperatures :
Line 379: Line 348:
  VOLT_GLOBAL_STATUS=green  VOLT_GLOBAL_STATUS=green
  else  else
- $OMREPORT chassis volts | grep -A 2 Index  |grep -v Index | grep -v "\-\-" | cut -c 29- > ${BBTMP}/xymon-hardware_volts.tmp+ $OMREPORT chassis volts | grep -A 2 Index  |grep -v Index | grep -v "\-\-" | cut -c 29- > ${XYMONTMP}/xymon-hardware_volts.tmp
  while read LINE ; do  while read LINE ; do
  echo $LINE | grep -q Status | grep -q Ok  echo $LINE | grep -q Status | grep -q Ok
  if [ $ERROR ] ; then  if [ $ERROR ] ; then
  PROBE_IN_ERROR="$LINE"  PROBE_IN_ERROR="$LINE"
- echo "&yellow Le voltage de $PROBE_IN_ERROR est incorrect !" >> ${BBTMP}/xymon-hardware.msg+ echo "&yellow Le voltage de $PROBE_IN_ERROR est incorrect !" >> $MSG_FILE
  fi  fi
  unset ERROR  unset ERROR
Line 391: Line 360:
  ERROR=1  ERROR=1
  fi  fi
- done < ${BBTMP}/xymon-hardware_volts.tmp+ done < ${XYMONTMP}/xymon-hardware_volts.tmp
  fi  fi
 if [ $VOLT_YELLOW ] ; then if [ $VOLT_YELLOW ] ; then
Line 402: Line 371:
  FANS_GLOBAL_STATUS=green  FANS_GLOBAL_STATUS=green
  else  else
- $OMREPORT chassis fans | grep -A 6 Index  |grep -v Index | grep -v "\-\-" |grep -v "N\/A" | cut -c 29- > ${BBTMP}/xymon-hardware_fans.tmp+ $OMREPORT chassis fans | grep -A 6 Index  |grep -v Index | grep -v "\-\-" |grep -v "N\/A" | cut -c 29- > ${XYMONTMP}/xymon-hardware_fans.tmp
                 while read LINE ; do                 while read LINE ; do
  if [ $NEXT_LINE == FAN_MIN_RPM ] ; then  if [ $NEXT_LINE == FAN_MIN_RPM ] ; then
  FAN_MIN_RPM=$(echo $LINE | awk '{print $1}')  FAN_MIN_RPM=$(echo $LINE | awk '{print $1}')
  echo "&yellow Le ventilateur $FAN_NAME tourne trop lentement ($FAN_RPM inferieur a ${FAN_MIN_RPM}) !!!  echo "&yellow Le ventilateur $FAN_NAME tourne trop lentement ($FAN_RPM inferieur a ${FAN_MIN_RPM}) !!!
-${FAN_NAME}_rpm: $FAN_RPM" >> ${BBTMP}/xymon-hardware_fans.msg+${FAN_NAME}_rpm: $FAN_RPM" >> ${XYMONTMP}/xymon-hardware_fans.msg
  unset NEXT_LINE  unset NEXT_LINE
  fi  fi
Line 419: Line 388:
  if [ $FAN_RPM -le 0 ] ; then  if [ $FAN_RPM -le 0 ] ; then
  FAN_RED=1  FAN_RED=1
- echo "&red Le ventilateur $FAN_NAME ne tourne plus !!!" >> ${BBTMP}/xymon-hardware_fans.msg+ echo "&red Le ventilateur $FAN_NAME ne tourne plus !!!" >> ${XYMONTMP}/xymon-hardware_fans.msg
  fi  fi
                         unset ERROR                         unset ERROR
Line 429: Line 398:
  NEXT_LINE=FAN_NAME  NEXT_LINE=FAN_NAME
                         fi                         fi
-                        done < ${BBTMP}/xymon-hardware_fans.tmp+                        done < ${XYMONTMP}/xymon-hardware_fans.tmp
         fi         fi
 if [ $FAN_RED ] ; then if [ $FAN_RED ] ; then
  RED=1  RED=1
  echo "&red Probleme avec les vitesses des ventilateurs !  echo "&red Probleme avec les vitesses des ventilateurs !
-$(cat ${BBTMP}/xymon-hardware_fans.msg)" >> ${BBTMP}/xymon-hardware.msg+$(cat ${XYMONTMP}/xymon-hardware_fans.msg)" >> $MSG_FILE
 elif [ $FAN_YELLOW ] ; then elif [ $FAN_YELLOW ] ; then
         YELLOW=1         YELLOW=1
  echo "&yellow Probleme avec les vitesses des ventilateurs !  echo "&yellow Probleme avec les vitesses des ventilateurs !
-$(cat ${BBTMP}/xymon-hardware_fans.msg)" >> ${BBTMP}/xymon-hardware.msg+$(cat ${XYMONTMP}/xymon-hardware_fans.msg)" >> $MSG_FILE
 else else
  VOLT_GLOBAL_STATUS=green  VOLT_GLOBAL_STATUS=green
- echo "&green Tout va bien avec les ventilateurs" >> ${BBTMP}/xymon-hardware.msg+ echo "&green Tout va bien avec les ventilateurs" >> $MSG_FILE
 fi fi
  
Line 447: Line 416:
 $OMREPORT storage pdisk controller=0 |grep ^Status | grep -q Ok $OMREPORT storage pdisk controller=0 |grep ^Status | grep -q Ok
 if [ $? -eq 0 ] ; then if [ $? -eq 0 ] ; then
- echo "&green Le statut des disques est Ok !" >> ${BBTMP}/xymon-hardware.msg+ echo "&green Le statut des disques est Ok !" >> $MSG_FILE
 else else
  DISK_COLOR=yellow  DISK_COLOR=yellow
- $OMREPORT storage pdisk controller=0 |grep -A 1 ^Status | grep -v "\-\-" > ${BBTMP}/xymon-hardware_disks.tmp+ $OMREPORT storage pdisk controller=0 |grep -A 1 ^Status | grep -v "\-\-" > ${XYMONTMP}/xymon-hardware_disks.tmp
  while read LINE ; do  while read LINE ; do
  echo $LINE | grep -q Status | grep -q Ok  echo $LINE | grep -q Status | grep -q Ok
  if [ $NEXT_LINE == DISK_NAME ] ; then  if [ $NEXT_LINE == DISK_NAME ] ; then
  DISK_NAME=$(echo $LINE | cut -c 29-)  DISK_NAME=$(echo $LINE | cut -c 29-)
- echo "&yellow Le disque $DISK_NAME est en mauvaise situation !" >> ${BBTMP}/xymon-hardware.msg+ echo "&yellow Le disque $DISK_NAME est en mauvaise situation !" >> $MSG_FILE
  unset NEXT_LINE  unset NEXT_LINE
  fi  fi
Line 464: Line 433:
  NEXT_LINE=DISK_NAME  NEXT_LINE=DISK_NAME
  fi  fi
- done < ${BBTMP}/xymon-hardware_disks.tmp+ done < ${XYMONTMP}/xymon-hardware_disks.tmp
   
 fi fi
 } }
 +function use_hpacucli ()
 +{
 +$HPACUCLI ctrl all show config | grep drive | while read OUTPUT ; do
 +        TYPE=$(echo $OUTPUT | awk '{print $1}' | sed s/drive//)
 +        SLOT=$(echo $OUTPUT | awk '{print $2}')
 +        STATUS=$(echo $OUTPUT | awk '{print $NF}' | sed s/\)//)
 +        if [ $TYPE == "logical" ] ; then
 +                RAID=$(echo $OUTPUT | awk '{print $6}')
 +                SIZE=$(echo $OUTPUT | awk '{print $3 $4}' | sed s/\(// | sed s/\,//)
 +                if [ "$STATUS" != "OK" ] ; then
 +                        RED=1
 +                        LINE="&red Logical drive $SLOT \(RAID $RAID, size : $SIZE\) status is BAD !!!"
 +                elif [ "$STATUS" == "OK" ] ; then
 +                        LINE="&green Logical drive $SLOT \(RAID $RAID, size : $SIZE\) status is OK"
 +                else
 +                        RED=1
 +                        LINE="&red Unknow status \(or stupid monitoring script\) for logical drive $SLOT \(RAID $RAID, size : $SIZE\) !!!"
 +                fi
 +        elif [ "$TYPE" == "physical" ] ; then
 +                SIZE=$(echo $OUTPUT | awk '{print $8 $9}' | sed s/\,//)
 +                if [ "$STATUS" != "OK" ] ; then
 +                        YELLOW=1
 +                        LINE="&yellow Physical drive in slot $SLOT \(size : $SIZE\) status is BAD !!!"
 +                elif [ "$STATUS" == "OK" ] ; then
 +                        LINE="&green Physical drive in slot $SLOT \(size : $SIZE\) status is OK"
 +                else
 +                        RED=1
 +                        LINE="&red Unknow status \(or stupid monitoring script\) for physical drive in slot $SLOT \(size : $SIZE\) !!!"
 +                fi
 +        fi
 +        echo $LINE >> $MSG_FILE
 +done
 +}
 +
 +$GREP -q ^HPACUCLI=1 $CONFIG_FILE
 +if [ $? -eq 0 ] ; then
 +        use_hpacucli
 +fi
 $GREP -q ^SMARTCTL=1 $CONFIG_FILE $GREP -q ^SMARTCTL=1 $CONFIG_FILE
 if [ $? -eq 0 ] ; then if [ $? -eq 0 ] ; then
Line 480: Line 487:
  use_openmanage  use_openmanage
 fi fi
- 
 $GREP -q ^SENSOR=1 $CONFIG_FILE $GREP -q ^SENSOR=1 $CONFIG_FILE
 if [ $? -eq 0 ] ; then if [ $? -eq 0 ] ; then
Line 503: Line 509:
  
 ===== To Do ===== ===== To Do =====
-v0.5+v0.6
   * To be independent of /etc/sensors.conf -> we get raw values, and we set right ones from those, and define thresolds in hobbit-hardware.conf file            * To be independent of /etc/sensors.conf -> we get raw values, and we set right ones from those, and define thresolds in hobbit-hardware.conf file         
   * Support for independant temperatures thresolds for each disk   * Support for independant temperatures thresolds for each disk
Line 532: Line 538:
   * **2013-06-27 v0.4**   * **2013-06-27 v0.4**
     * Fix hddtemp output handling (print last field instead of field N)     * Fix hddtemp output handling (print last field instead of field N)
 +  * **2013-09-27 v0.5**
 +    * Add support for HP monitoring tool (hpacucli)
 +  * **2022-07-13 v0.6**
 +    * Add support for disks independent temperatures
 +</code>
 +
  • monitors/hardware_sensors.1380276349.txt.gz
  • Last modified: 2013/09/27 10:05
  • by doctor_madness