no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
— | monitors:xymon-smart [2012/08/30 05:14] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== xymon-SMART.sh ====== | ||
+ | ^ Author | [[ jlaidman+xymon-smart@rebel-it.com.au | Jeremy Laidman ]] | | ||
+ | ^ Compatibility | Xymon 4.3.3 | | ||
+ | ^ Requirements | smarttools, GNU ls, GNU date | | ||
+ | ^ Download | None | | ||
+ | ^ Last Update | 2012-08-30 | | ||
+ | |||
+ | ===== Description ===== | ||
+ | This script queries the SMART parameters of the drives on a system, and returns the status of those drives as well as reporting various metrics available from the SMART data. | ||
+ | |||
+ | The script gets its configuration from the environment or from a configuration file. | ||
+ | |||
+ | The script runs in write mode (with a " | ||
+ | |||
+ | The script also runs in read mode (with a " | ||
+ | |||
+ | In read mode, the script constructs a status report for Xymon to warn if one of the following problems are detected: | ||
+ | * SMART is not enabled on the drive | ||
+ | * SMART self-test is not " | ||
+ | * SMART health status is not " | ||
+ | |||
+ | The script also sends a data report for Xymon to turn into RRD files for graphing. | ||
+ | * corrected read errors | ||
+ | * corrected write errors | ||
+ | * uncorrected read errors | ||
+ | * uncorrected write errors | ||
+ | * non-medium errors | ||
+ | * disk temperature | ||
+ | |||
+ | {{: | ||
+ | |||
+ | {{: | ||
+ | |||
+ | ===== Installation ===== | ||
+ | === Client side === | ||
+ | 1) Copy the script into a suitable location, such as ''/ | ||
+ | |||
+ | 2) Create a crontab entry (eg / | ||
+ | |||
+ | < | ||
+ | */5 * * * * root ( umask 002; XYMONCLIENTHOME=/ | ||
+ | CONTROLLER=cciss COUNT=0 DEVICE=cciss/ | ||
+ | / | ||
+ | </ | ||
+ | |||
+ | Adjust for your requirements. | ||
+ | find a suitable DEVICE value. | ||
+ | |||
+ | smartctl -d $CONTROLLER, | ||
+ | |||
+ | For multiple devices, specify a comma-separated list of numbers | ||
+ | in the COUNT variable, such as: | ||
+ | ... COUNT=0,1 ... | ||
+ | Note: This usage of COUNT is not supported by smartctl. | ||
+ | |||
+ | 3) Create a Xymon client tasks entry like this: | ||
+ | |||
+ | [smart] | ||
+ | | ||
+ | CMD / | ||
+ | | ||
+ | | ||
+ | |||
+ | === Server side === | ||
+ | 4) Create entries in graphs.cfg like so: | ||
+ | |||
+ | [smart] | ||
+ | # total read/write errors | ||
+ | TITLE S.M.A.R.T. Total Media Errors | ||
+ | YAXIS errors per second | ||
+ | FNPATTERN ^smart.(.*).rrd | ||
+ | DEF: | ||
+ | DEF: | ||
+ | DEF: | ||
+ | DEF: | ||
+ | CDEF: | ||
+ | CDEF: | ||
+ | COMMENT: | ||
+ | LINE1: | ||
+ | GPRINT: | ||
+ | GPRINT: | ||
+ | GPRINT: | ||
+ | GPRINT: | ||
+ | LINE1: | ||
+ | GPRINT: | ||
+ | GPRINT: | ||
+ | GPRINT: | ||
+ | GPRINT: | ||
+ | | ||
+ | [smart_temp] | ||
+ | TITLE S.M.A.R.T. Disk Temperature | ||
+ | YAXIS Celcius | ||
+ | FNPATTERN ^smart.(.*).rrd | ||
+ | DEF: | ||
+ | LINE1: | ||
+ | GPRINT: | ||
+ | GPRINT: | ||
+ | GPRINT: | ||
+ | GPRINT: | ||
+ | | ||
+ | [smart_nonmedium] | ||
+ | TITLE S.M.A.R.T. Non-Medium Errors | ||
+ | YAXIS errors per second | ||
+ | FNPATTERN ^smart.(.*).rrd | ||
+ | DEF: | ||
+ | LINE1: | ||
+ | GPRINT: | ||
+ | GPRINT: | ||
+ | GPRINT: | ||
+ | GPRINT: | ||
+ | |||
+ | Add further graph definitions are desired. | ||
+ | * err_r_c | ||
+ | * err_w_c | ||
+ | * err_r_u | ||
+ | * err_w_u | ||
+ | * err_nmec = non-medium errors | ||
+ | * temp = disk temperature | ||
+ | |||
+ | 5) Add " | ||
+ | |||
+ | 6) Add " | ||
+ | ===== Source ===== | ||
+ | ==== xymon-SMART.sh ==== | ||
+ | |||
+ | <hidden onHidden=" | ||
+ | < | ||
+ | #!/bin/sh | ||
+ | |||
+ | # SMART disk monitor | ||
+ | # Jeremy Laidman, 2012 | ||
+ | # | ||
+ | # Version 0.5 - August 2012 | ||
+ | # - initial public release | ||
+ | # | ||
+ | # Initially based on Michael Adelmann' | ||
+ | # (see: http:// | ||
+ | # improvements are to support multiple disks, and to | ||
+ | # send error counts for graphing. | ||
+ | # | ||
+ | # This script queries the SMART parameters of the drives | ||
+ | # on a system, and returns the status of those drives | ||
+ | # as well as reporting various metrics available from | ||
+ | # the SMART data. | ||
+ | # | ||
+ | # How it Works | ||
+ | # ------------ | ||
+ | # | ||
+ | # The script gets its configuration from the environment | ||
+ | # or from a configuration file. | ||
+ | # | ||
+ | # The script runs in write mode (with a " | ||
+ | # create the status file from the output of the smartctl | ||
+ | # command. | ||
+ | # | ||
+ | # The script also runs in read mode (with a " | ||
+ | # to read in the status file and parse it for sending data | ||
+ | # and status reports to Xymon. | ||
+ | # every 5 minutes from a xymonlaunch configuration file | ||
+ | # (tasks.cfg on a Xymon server, or xymonlaunch.cfg on | ||
+ | # a Xymon client). | ||
+ | # | ||
+ | # In read mode, the script constructs a status report | ||
+ | # for Xymon to warn if one of the following problems are | ||
+ | # detected: | ||
+ | # - SMART is not enabled on the drive | ||
+ | # - SMART self-test is not " | ||
+ | # - SMART health status is not " | ||
+ | # | ||
+ | # The script also sends a data report for Xymon to turn | ||
+ | # into RRD files for graphing. | ||
+ | # are: | ||
+ | # - corrected read errors | ||
+ | # - corrected write errors | ||
+ | # - uncorrected read errors | ||
+ | # - uncorrected write errors | ||
+ | # - non-medium errors | ||
+ | # - disk temperature | ||
+ | # | ||
+ | # | ||
+ | # To Install | ||
+ | # ---------- | ||
+ | # | ||
+ | # Client-side: | ||
+ | # | ||
+ | # 1) Copy the script into a suitable location, | ||
+ | # such as / | ||
+ | # | ||
+ | # 2) Create a crontab entry (eg / | ||
+ | # | ||
+ | # */5 * * * * root ( umask 002; XYMONCLIENTHOME=/ | ||
+ | # | ||
+ | # / | ||
+ | # | ||
+ | # Adjust for your requirements. | ||
+ | # find a suitable DEVICE value. | ||
+ | # | ||
+ | # smartctl -d $CONTROLLER, | ||
+ | # | ||
+ | # For multiple devices, specify a comma-separated list of numbers | ||
+ | # in the COUNT variable, such as: | ||
+ | # ... COUNT=0,1 ... | ||
+ | # This usage of COUNT is not supported by smartctl. | ||
+ | # | ||
+ | # 3) Create a Xymon client tasks entry like this: | ||
+ | # | ||
+ | # [smart] | ||
+ | # | ||
+ | # CMD / | ||
+ | # | ||
+ | # | ||
+ | # | ||
+ | # Server-side: | ||
+ | # | ||
+ | # 4) Create entries in graphs.cfg like so: | ||
+ | # | ||
+ | # [smart] | ||
+ | # # total read/write errors | ||
+ | # TITLE S.M.A.R.T. Total Media Errors | ||
+ | # YAXIS errors per second | ||
+ | # FNPATTERN ^smart.(.*).rrd | ||
+ | # DEF: | ||
+ | # DEF: | ||
+ | # DEF: | ||
+ | # DEF: | ||
+ | # CDEF: | ||
+ | # CDEF: | ||
+ | # COMMENT: | ||
+ | # LINE1: | ||
+ | # GPRINT: | ||
+ | # GPRINT: | ||
+ | # GPRINT: | ||
+ | # GPRINT: | ||
+ | # LINE1: | ||
+ | # GPRINT: | ||
+ | # GPRINT: | ||
+ | # GPRINT: | ||
+ | # GPRINT: | ||
+ | # | ||
+ | # [smart_temp] | ||
+ | # TITLE S.M.A.R.T. Disk Temperature | ||
+ | # YAXIS Celcius | ||
+ | # FNPATTERN ^smart.(.*).rrd | ||
+ | # DEF: | ||
+ | # LINE1: | ||
+ | # GPRINT: | ||
+ | # GPRINT: | ||
+ | # GPRINT: | ||
+ | # GPRINT: | ||
+ | # | ||
+ | # [smart_nonmedium] | ||
+ | # TITLE S.M.A.R.T. Non-Medium Errors | ||
+ | # YAXIS errors per second | ||
+ | # FNPATTERN ^smart.(.*).rrd | ||
+ | # DEF: | ||
+ | # LINE1: | ||
+ | # GPRINT: | ||
+ | # GPRINT: | ||
+ | # GPRINT: | ||
+ | # GPRINT: | ||
+ | # | ||
+ | # Add further graph definitions are desired. | ||
+ | # The RRD files produce the following DS names: | ||
+ | # - err_r_c | ||
+ | # - err_w_c | ||
+ | # - err_r_u | ||
+ | # - err_w_u | ||
+ | # - err_nmec = non-medium errors | ||
+ | # - temp = disk temperature | ||
+ | # | ||
+ | # 5) Add " | ||
+ | # xymonserver.cfg, | ||
+ | # smart status page and the trends page. | ||
+ | # | ||
+ | # 6) Add " | ||
+ | # entries in hosts.cfg, or the " | ||
+ | # | ||
+ | # | ||
+ | # Troubleshooting | ||
+ | # --------------- | ||
+ | # | ||
+ | # * Check the cron output in / | ||
+ | # for errors that indicate where the problem might be. | ||
+ | # | ||
+ | # * Check that the file / | ||
+ | # If not, ensure that the script is being run by cron. | ||
+ | # | ||
+ | # * Ensure that the crontab entry is being run. On some | ||
+ | # | ||
+ | # not tell crond that there has been a change to its | ||
+ | # | ||
+ | # touch the directory containing the crontabs, such as | ||
+ | # | ||
+ | # sudo touch / | ||
+ | # | ||
+ | # * If the status file appears correct, manually run the | ||
+ | # | ||
+ | # | ||
+ | # xymoncmd / | ||
+ | # | ||
+ | # * Check the Xymon log files, particularly xymonclient.log, | ||
+ | # | ||
+ | # | ||
+ | # | ||
+ | # A note about compatibility | ||
+ | # -------------------------- | ||
+ | # | ||
+ | # This script makes use of features of GNU " | ||
+ | # GNU " | ||
+ | # This probably won't work on systems that don't have | ||
+ | # GNU " | ||
+ | # is unlikely on systems where smartctl is functioning. | ||
+ | |||
+ | die() { echo " | ||
+ | |||
+ | VERSION=0.5 | ||
+ | |||
+ | NL=" | ||
+ | " | ||
+ | |||
+ | |||
+ | if [ " | ||
+ | BB=" | ||
+ | [ " | ||
+ | [ " | ||
+ | [ " | ||
+ | fi | ||
+ | |||
+ | [ " | ||
+ | [ " | ||
+ | |||
+ | COLOR=" | ||
+ | COLUMN=" | ||
+ | CONFIG=" | ||
+ | MSG=" | ||
+ | RAID="" | ||
+ | RAIDADDR="" | ||
+ | SMARTCTL="/ | ||
+ | SUDO="/ | ||
+ | |||
+ | setup_config() { | ||
+ | # read config file | ||
+ | if [ -f $CONFIG ]; then | ||
+ | source $CONFIG | ||
+ | else | ||
+ | [ " | ||
+ | die " | ||
+ | fi | ||
+ | |||
+ | if [ -n " | ||
+ | RAIDADDR=" | ||
+ | RAID=" | ||
+ | [ 0$DEBUG -gt 1 ] && echo " | ||
+ | fi | ||
+ | |||
+ | [ -b "/ | ||
+ | |||
+ | RESULT=" | ||
+ | } | ||
+ | |||
+ | get_smart_status() { | ||
+ | # we parese the output and set some flags | ||
+ | echo " | ||
+ | case $LINE in | ||
+ | " | ||
+ | COUNTER=`expr 0$COUNTER + 1` | ||
+ | set - $LINE"" | ||
+ | DEVADDR=$3 | ||
+ | echo " | ||
+ | echo " | ||
+ | ;; | ||
+ | "Self Test returned without error" | ||
+ | echo " | ||
+ | ;; | ||
+ | "SMART Health Status:" | ||
+ | set - $LINE"" | ||
+ | echo " | ||
+ | ;; | ||
+ | " | ||
+ | set - $LINE"" | ||
+ | echo " | ||
+ | echo " | ||
+ | ;; | ||
+ | esac | ||
+ | done | ||
+ | } | ||
+ | |||
+ | get_rrd_data() { | ||
+ | # we parse the output and show some numbers | ||
+ | echo " | ||
+ | case $LINE in | ||
+ | " | ||
+ | set - $LINE"" | ||
+ | [ " | ||
+ | echo " | ||
+ | FIRST=1 | ||
+ | [ 0$DEBUG -gt 0 ] && echo "Found device $3" >&2 | ||
+ | ;; | ||
+ | read:*) | ||
+ | set - $LINE"" | ||
+ | echo " | ||
+ | echo " | ||
+ | ;; | ||
+ | write:*) | ||
+ | set - $LINE"" | ||
+ | echo " | ||
+ | echo " | ||
+ | ;; | ||
+ | " | ||
+ | set - $LINE"" | ||
+ | echo " | ||
+ | ;; | ||
+ | " | ||
+ | set - $LINE"" | ||
+ | echo " | ||
+ | ;; | ||
+ | esac | ||
+ | done | ||
+ | } | ||
+ | |||
+ | show_version() { | ||
+ | echo " | ||
+ | } | ||
+ | |||
+ | show_usage() { | ||
+ | echo " | ||
+ | show_version; | ||
+ | echo " | ||
+ | echo " | ||
+ | echo " | ||
+ | echo " | ||
+ | echo " | ||
+ | echo "If no switches are given, Xymon must have sudo rights to run the script with no password." | ||
+ | } | ||
+ | |||
+ | # Handle CLI modifiers | ||
+ | while [ " | ||
+ | case " | ||
+ | "" | ||
+ | -d|--debug) | ||
+ | test 0$2 -gt 0 2>/ | ||
+ | echo " | ||
+ | ;; | ||
+ | -q|--quiet) | ||
+ | ;; | ||
+ | -r|--read) | ||
+ | [ 0$DEBUG -gt 0 ] && echo " | ||
+ | [ " | ||
+ | READFILE=" | ||
+ | shift | ||
+ | if [ -f " | ||
+ | [ -r " | ||
+ | else | ||
+ | [ 0$QUIET -gt 0 ] && exit | ||
+ | die "File not found: $READFILE" | ||
+ | fi | ||
+ | ;; | ||
+ | -w|--write) | ||
+ | [ 0$DEBUG -gt 0 ] && echo " | ||
+ | [ " | ||
+ | WRITEFILE=" | ||
+ | shift | ||
+ | > $WRITEFILE | ||
+ | for C in `IFS=,; set - "" | ||
+ | COUNT=$C setup_config | ||
+ | if [ " | ||
+ | [ " | ||
+ | $SMARTCTL / | ||
+ | else | ||
+ | # assume that $SMARTCTL or ">" | ||
+ | # so we just bail silently with RC=1 | ||
+ | { | ||
+ | if [ " | ||
+ | [ -s $WRITEFILE ] && echo "" | ||
+ | echo " | ||
+ | fi | ||
+ | $SMARTCTL / | ||
+ | } >> $WRITEFILE || exit 1 | ||
+ | fi | ||
+ | [ 0$DEBUG -eq 0 -o " | ||
+ | done | ||
+ | exit | ||
+ | ;; | ||
+ | -n|--dryrun) | ||
+ | ;; | ||
+ | -V|--version) | ||
+ | show_version | ||
+ | exit | ||
+ | ;; | ||
+ | -h|--help) | ||
+ | show_usage | ||
+ | exit | ||
+ | ;; | ||
+ | *) die " | ||
+ | esac | ||
+ | shift | ||
+ | done | ||
+ | |||
+ | if [ 0$READ -gt 0 ]; then | ||
+ | [ 0$DEBUG -gt 0 ] && echo " | ||
+ | # bail if the file is older than 5 minutes | ||
+ | if [ " | ||
+ | FILETIME=`ls -lL --time-style " | ||
+ | else | ||
+ | FILETIME=`ls -lL --time-style " | ||
+ | fi | ||
+ | TIMENOW=`date " | ||
+ | TIMEDIFF=`expr $TIMENOW - $FILETIME` | ||
+ | [ 0$TIMEDIFF -lt 0 ] && die " | ||
+ | [ 0$TIMEDIFF -gt 600 ] && die "Stale SMART file is $TIMEDIFF seconds old" | ||
+ | if [ " | ||
+ | TMP=`cat` | ||
+ | else | ||
+ | TMP=`cat $READFILE` | ||
+ | fi | ||
+ | else | ||
+ | TMP="" | ||
+ | for C in `IFS=,; set - "" | ||
+ | COUNT=$C setup_config | ||
+ | [ " | ||
+ | TMP=" | ||
+ | done | ||
+ | fi | ||
+ | |||
+ | SMARTSTATUS=`get_smart_status " | ||
+ | [ 0$DEBUG -gt 1 ] && echo " | ||
+ | eval $SMARTSTATUS | ||
+ | |||
+ | RRDDATA=`get_rrd_data " | ||
+ | |||
+ | [ " | ||
+ | |||
+ | [ " | ||
+ | |||
+ | MSG=" | ||
+ | for DEVINDEX in $DEVICES; do | ||
+ | COLOR=" | ||
+ | |||
+ | eval DEVNAME=\$DEVADDR_$DEVINDEX | ||
+ | [ 0$DEBUG -gt 0 ] && echo " | ||
+ | |||
+ | eval SMART_ENABLED=\$SMART_ENABLED_$DEVINDEX | ||
+ | if [ " | ||
+ | RESULT=" | ||
+ | else | ||
+ | COLOR=" | ||
+ | RESULT=" | ||
+ | fi | ||
+ | |||
+ | eval SMART_HEALTH=\$SMART_HEALTH_$DEVINDEX | ||
+ | if [ " | ||
+ | RESULT=" | ||
+ | else | ||
+ | COLOR=" | ||
+ | RESULT=" | ||
+ | fi | ||
+ | |||
+ | SELF=`echo " | ||
+ | eval SMART_SELFTEST=\$SMART_SELFTEST_$DEVINDEX | ||
+ | if [ " | ||
+ | RESULT=" | ||
+ | else | ||
+ | COLOR=" | ||
+ | RESULT=" | ||
+ | fi | ||
+ | done | ||
+ | |||
+ | MSG=`echo -e " | ||
+ | |||
+ | if [ 0$DEBUG -gt 0 ]; then | ||
+ | echo " | ||
+ | echo | ||
+ | echo $XYMON $BBDISP " | ||
+ | echo | ||
+ | echo $XYMON $BBDISP "data $MACHINE.trends${NL}$RRDDATA" | ||
+ | fi | ||
+ | if [ 0$DRYRUN -eq 0 ]; then | ||
+ | $XYMON $BBDISP " | ||
+ | $XYMON $BBDISP "data $MACHINE.trends${NL}$RRDDATA" | ||
+ | fi | ||
+ | </ | ||
+ | </ | ||
+ | |||
+ | ===== Known Bugs and Issues ===== | ||
+ | |||
+ | ===== To Do ===== | ||
+ | |||
+ | ===== Credits ===== | ||
+ | |||
+ | ===== Changelog ===== | ||
+ | |||
+ | * **2012-08-30** | ||
+ | * Initial release |