====== MySQL-checker ====== ^ Author | [[ cleaver-xymon@terabithia.org | Japheth Cleaver ]] | ^ Compatibility | Xymon 4.2 | ^ Requirements | Perl, MySQL, thread-safe libmysql/DBD::mysql libraries | ^ Download | None | ^ Last Update | 2015-10-16 | ===== Description ===== A high-performance, multi-threaded script for performing a TCP connection check (w/ login) to a very large number of MySQL servers at once. On high-end multi-core systems with sufficient RAM, 1000's of tests can be performed in <15 seconds. Because MySQL in a default configuration will eventually block an IP address that does a "dumb" TCP port check without authenticating, a single username and password (or cert) is used and must be valid on all tested systems. There are a number of tuning parameters that can be used on high-volume installations to affect how many threads to use and how often to send messages to your xymon server. ===== Installation ===== Copy the scripts to an appropriate directory on your xymon server, modifying the credentials and parameters in the perl script, and any needed logic in the wrapper script. Add "mysql" items to your hosts.cfg files as needed: # 192.0.2.11 db10011.example.net # mysql # 192.0.2.12 db10012.example.net # mysql:3306 # 192.0.2.13 db10013.example.net # mysql:5678 If only a subset of your servers are being tested (or have the credentials installed), I suggest making a separate hosts.d/* file for just those servers and adjusting the wrapper script to feed that file into the perl script instead of using `xymongrep`. Add a tasks.cfg section: [mysql-checker] ENVFILE /etc/xymon/xymonserver.cfg CMD /etc/xymon/ext/mysql-checker.sh LOGFILE $XYMONSERVERLOGS/mysql-checker.log INTERVAL 2m MAXTIME 5m === Client side === No client changes are required, but the script requires a single valid user/password (or user/cert) to be present on all servers. This account only needs USAGE and no SQL access. GRANT USAGE ON *.* TO 'youruseraccount'@'10.%' IDENTIFIED BY PASSWORD '*longpasswordhash'; SSL certs can be used instead, but this will increase test overhead. === Server side === Because the script uses perl threads, your DBD::MySQL module MUST be compiled to use the re-entrant, thread-safe libmysql. On RHEL6, this means compiling DBD::mysql with --libs="`mysql_config --libs_r`". We've also found better luck using the official MySQL-server RPMs from Oracle rather than the vendor-distributed versions installed on the monitor server itself. Once you've confirmed that you have re-entrant libraries working, it should be safe to remove the "die" command near the bottom of the script. If your library isn't thread-safe, you get occasional segfaults, 'locale' errors and random red mysql statuses. (Because of the nature of threading on perl, you will occasionally get a segfault once in a while anyway, but if you're getting a segfault in every run, it's probably a library problem.) === Optional === To add some stats graphing support for the mysql test (except the connection time) add to xymonserver.cfg: TEST2RRD+=",mysql=ncv" GRAPHS+=",mysql" SPLITNCV_mysql="Uptime:NONE,Threads:GAUGE,Questions:DERIVE,Slow_queries:DERIVE,Opens:DERIVE,Flush_tables:NONE,Open_tables:GAUGE,Queries_per_second_avg:GAUGE,Seconds:NONE" In graphs.cfg/d: [mysql] TITLE MySQL Questions YAXIS Count DEF:queries=mysql,Questions.rrd:lambda:AVERAGE LINE2:queries#0000FF:Questions per second COMMENT:\n GPRINT:queries:LAST:Questions \: %5.1lf (cur) GPRINT:queries:MAX: \: %5.1lf (max) GPRINT:queries:MIN: \: %5.1lf (min) GPRINT:queries:AVERAGE: \: %5.1lf (avg)\n ===== Source ===== ==== mysql-checker.sh ==== #!/bin/sh # Feed a list of mysql servers to the dedicated mysql checker script. # # We don't do this with a simple port up/down check in xymonnet because we actually # need to successfully authenticate to prevent mysqld from (by default) penalizing our IP. # If you are using a non-default mysql server list, dispense with xymongrep and # just feed it in directly to the script. #export DEBUG=1 #export HOSTSFILE='/etc/xymon/hosts.d/just-my-mysql-servers' #export MAXMSGSPERCOMBO=150 if [ -n "$HOSTSFILE" ] ; then /etc/xymon/ext/mysql-checker.pl <"$HOSTSFILE" else TMPFILE="$XYMONTMP/mysql-checker.tmp.$$" xymongrep mysql\* > "$TMPFILE" if [ -s "$TMPFILE" ] ; then /etc/xymon/ext/mysql-checker.pl <"$TMPFILE" fi rm -f "$TMPFILE" fi # Kill anything that stuck around (after the *previous* run) for some reason # Threaded perl is weird and occasionally pipes don't close. pgrep -P 1 -U xymon mysql-checker | xargs -r kill -SIGKILL exit 0 ==== mysql-checker.pl ==== #!/usr/bin/perl # mysql-checker.pl # # Given a list, check for mysql service on a given port as efficently as possible. # # WARNING: This is a multi-threaded script. If your DBD::mysql was built or linked # against something that wasn't thread safe (or your libmysql subsequently uses stuff # like a openssl that wasn't thread safe) you will have probs. This includes RHEL 6. # # perl-DBD-MySQL needs to be compiled with --libs=`mysql_config --libs_r` and your # mysql-config needs to match. # # We've had better luck with the official Oracle MySQL-shared-5.5 RPMs instead of # the RHEL 6 mysql-libs-5.1 RPM. # # Includes some code from the 'threads' module's pool_reuse.pl script. # # Copyright Japheth Cleaver # Licensed under GPLv2 # # # # Master debug flag... use constant DEBUG => $ENV{'DEBUG'} || 0; # Autoflush DEBUG and $|=1; use threads 1.39; # threading for fun and profit use threads::shared; # ... the right way use Thread::Queue; # ... don't set up and tear down, use queues! use strict; # meh use warnings; # double meh use DBD::mysql; # connect to mysql use Time::HiRes qw(time usleep); # use hi-res timing info use POSIX qw(strftime); # auto-print 'date' -- we could just call `date` once at the # beginning and be done with it, but this is slightly more # accurate (each test comes back with the timestamp it was # actually tested at) and I'm kind of particular like that. # MySQL connect info -- modify as appropriate # host and IP address are added later on... use constant { USER => 'youruseraccount', PASS => 'yourpasswordhere', CERT => '/some/path/here.crt', PRIVKEY => '/some/path/here.key', }; use constant DSN => 'DBI:mysql:mysql_connect_timeout=5;'; # use constant DSN => 'DBI:mysql:mysql_ssl=1;mysql_ssl_client_key=' . PRIVKEY . ';mysql_ssl_client_cert=' . CERT . ';mysql_connect_timeout=5;'; # Hard code these for lookup speed use constant { XYMON => $ENV{'XYMON'}, XYMSRV => $ENV{'XYMSRV'}, OKSTRING => 'MySQL connection ok', COMBO_SIZE => $ENV{'MAXMSGSPERCOMBO'} || 200, COMBO_HEADER => "combo\n", CHECK_INT => (defined $ENV{'mysql_CHECK_INT'} and $ENV{'mysql_CHECK_INT'}) || 1.1, MAX_THREADS => (defined $ENV{'mysql_MAX_THREADS'} and $ENV{'mysql_MAX_THREADS'}) || 84, MAX_LOOPS => (defined $ENV{'mysql_MAX_LOOPS'} and $ENV{'mysql_MAX_LOOPS'}) || 55, STACK_SIZE => (defined $ENV{'mysql_STACK_SIZE'} and $ENV{'mysql_STACK_SIZE'}) || 65536, }; our %downHosts; %downHosts = (); if ($ENV{'EXCLUSIONLIST'} && -f $ENV{'EXCLUSIONLIST'} && -s _ ) { # A list of hosts to exclude from alerting -- typically due to being conn down, but could be for any reason if (open (CONNDOWN, $ENV{'EXCLUSIONLIST'})) { chomp and $downHosts{$_}=1 while (); close CONNDOWN; DEBUG and print " - found ", scalar(keys %downHosts), " hosts to be mooted (ping down)\n"; }; }; ####################################################################### ### Signal Handling ### ####################################################################### # Flag to inform all threads that application is terminating my $TERM :shared = 0; # Threads add their ID to this queue when they are ready for work # Also, when app terminates a -1 is added to this queue my $IDLE_QUEUE :shared = Thread::Queue->new(); # Gracefully terminate application on ^C or command line 'kill' $SIG{'INT'} = $SIG{'TERM'} = sub { warn ">>> Terminating early, per signal <<<\n"; $TERM = 1; # Add -1 to head of idle queue to signal termination $IDLE_QUEUE->insert(0, -1); }; # We open and write in to xymon with perl's open function, meaning we need # to deal with PIPE signals properly. Same with our children. $SIG{PIPE} = 'IGNORE'; $SIG{CHLD} = 'IGNORE'; ####################################################################### ### Begin -- Set up our threads and queues ### ####################################################################### threads->set_stack_size(STACK_SIZE); our $drh = DBI->install_driver('mysql'); our $checkerStart = time; our %workerQueues :shared; our %responses :shared; for (1 .. MAX_THREADS) { # create the queue my $workQueue = Thread::Queue->new(); # create the thread, pointing it to the queue my $thr = threads->create('threadedChecker', $workQueue); DEBUG > 1 and print " - made thread ", $thr->tid(), "\n"; # save the thread's info $workerQueues{$thr->tid()} = $workQueue; }; ####################################################################### ### Main event loop ### ####################################################################### my %seen; my $comboMessage = my $code = ''; my $hosts = my $count = 0; my $nextHosts = COMBO_SIZE; my $nextCheck = time; DEBUG and print "- Work starting at $nextCheck\n"; $nextCheck += CHECK_INT; while () { # We're looking for things like: # 192.0.2.11 db10011.example.net # mysql # 192.0.2.12 db10012.example.net # mysql:3306 # 192.0.2.13 db10013.example.net # mysql:5678 DEBUG > 2 and print " -> saw: " . $_; # Isolate the interesting fields next unless m/^([\d\.]+)\s+(\S+)\s+#.*(mysql:?(\d+)?)/; # my $ip = $1; my $hostname = $2; my $port = $4 || 3306; # "ip \t hostname \t port \t isdownflag" is used as a key value and passed to the thread $code = join("\t", $1, $2, ($4 || 3306), (defined($downHosts{$2}) ? 1 : 0) ); next if $seen{$code}++; # Retrieve the first idle thread my $tid = $IDLE_QUEUE->dequeue(); last if ($tid < 0); # none left, exit # Enqueue this key DEBUG > 1 and print "Submitting: " . $code . " to " . $tid . "...\n"; $workerQueues{$tid}->enqueue($code); $hosts++; # Once every set of COMBO_SIZE hosts and CHECK_INT seconds while # processing STDIN, pick up results. This helps spread out message sending # load to xymon instead of just trying to pound it when we're done. next unless ($hosts > $nextHosts && time > $nextCheck); last if $TERM; # if told to quit DEBUG and print "Checking for thread results...\n"; while ((my @codes = keys %responses) >= COMBO_SIZE) { DEBUG and print "- see " . @codes . " total results, " . scalar threads->list() . " threads active at " . (time - $checkerStart) . "s\n"; $_ and $comboMessage .= $responses{$_} and delete $responses{$_} foreach (@codes[0 .. COMBO_SIZE]); # Anything to send? last unless $comboMessage; # Off to report! if (DEBUG <= 2) { DEBUG and print " -- sending combo message --\n"; # Fork, so that we can send to xymon asynchronously? if (! fork() ) { # in the child... DEBUG and print "Forked child...\n"; close(STDIN) or warn "Can't close STDIN in child ($$): $!\n"; # Send to xymon over its STDIN open (XYMFH, '|-', XYMON, '--timeout=30', XYMSRV, '@') or die "Unable to open xymon in child ($$): $!"; print XYMFH COMBO_HEADER, $comboMessage; close (XYMFH); # or warn "Unable to close xymon in child ($$); message possibly dropped? ($!)"; # Clean up and go away close(STDOUT); close(STDERR); exit 0; }; } else { print "would run XYMON\n==Output==\n$comboMessage\n=== end ===\n"; }; $comboMessage = ''; }; # end WHILE >= COMBO_SIZE # Don't check again until these thresholds are crossed $nextCheck = time + CHECK_INT; $nextHosts = $hosts + COMBO_SIZE; }; # end while STDIN # We've sent all the work off... Begin "final" response collection.. # # Signal all threads that there is no more work $workerQueues{$_}->enqueue('') foreach keys(%workerQueues); # Loop while we wait for threads to complete their processing... if no # responses are present (no @codes), sleep # DEBUG and print "Checking for final results...\n"; while ((my @codes = keys %responses) || (sleep 1 && threads->list()) ) { DEBUG and print "- see " . @codes . " total results, " . scalar threads->list() . " threads active (wait loop $count)\n"; $_ and $comboMessage .= $responses{$_} and delete $responses{$_} foreach (@codes[0 ... (@codes < COMBO_SIZE ? ($#codes + 1) : COMBO_SIZE)]); # Anything to send? (note the continue block below) next unless $comboMessage; # Off to report! if (DEBUG <= 2) { DEBUG and print " -- sending combo message --\n"; # Fork, so that we can send to xymon asynchronously? if (! fork() ) { # in the child... DEBUG and print "Forked child...\n"; close(STDIN) or warn "Can't close STDIN in child ($$): $!\n"; # Send to xymon over its STDIN open (XYMFH, '|-', XYMON, '--timeout=30', XYMSRV, '@') or die "Unable to open xymon in child ($$): $!"; print XYMFH COMBO_HEADER, $comboMessage; close (XYMFH); # or warn "Unable to close xymon in child ($$); message possibly dropped? ($!)"; # Clean up and go away close(STDOUT); close(STDERR); exit 0; }; } else { print "would run XYMON\n==Output==\n$comboMessage\n=== end ===\n"; }; $comboMessage = ''; } continue { # Collect finished threads $_->join() foreach threads->list(threads::joinable); last if $count++ > MAX_LOOPS; }; # One way or another, we're done DEBUG and threads->list(threads::all) and print "- Killing " . threads->list(threads::all) . " thread(s)\n"; $_->kill('KILL')->detach() foreach threads->list(threads::all); print '- Run complete; did ', $hosts, ' hosts in ', (time - $checkerStart), " seconds\n"; exit 0; ########################################## # The Actual MySQL Checker ########################################## sub threadedChecker { # This thread's queue and ID my ($workQueue) = @_; my $tid = threads->tid(); # Allocate memory once for each thread... no need to increase the chance of things going awry my ($ip, $hostname, $port, $down); my ($color, $status, $checks, $data, $stats, $info); my $starttime = my $elapsed = my $date = 0; my $dbh; # Work loop WORK: { do { # Indicate that were are ready to do work #DEBUG > 2 and print "Idle -> $tid\n"; $IDLE_QUEUE->enqueue($tid); # Set variables - by default things are good (we hope) $color = 'green'; $status = OKSTRING; $checks = '&green Connected to a MySQL '; $stats = ''; # pop the stack # we're getting a "ip \t hostname \t port \t down" from the main queue ($ip, $hostname, $port, $down) = split (/\t/, my $work = $workQueue->dequeue(), 4); DEBUG > 1 and print(" -> thread $tid checking $work\n"); # Did we get anything weird or invalid? close out the thread last WORK unless $port; $starttime = time; # If we're not actually doing any work at all (in DEBUG mode), simulate # by adding a small delay here (DEBUG > 2 and usleep(150000)) or do { # Check for a connection error... unless ($dbh = DBI->connect(DSN . "host=$ip;port=$port", USER, PASS, { RaiseError => 0, PrintError => DEBUG ? 1 : 0, mysql_no_autocommit_cmd => 1, mysql_auto_reconnect => 0 }) ) { $elapsed = time - $starttime; $color = 'red'; $checks = '&red ' . $DBI::errstr . "\n"; $status = 'CRITICAL: Unable to connect'; # Comment this line out once you're sure you've got thread-safe libraries die "Got latin1 multi-threading error... $tid is bailing out" if index ($DBI::errstr, "Can't initialize character set latin1", 0) != -1; } else { { $elapsed = time - $starttime; $info = $dbh->{'mysql_serverinfo'}; do { $color='yellow'; $checks = '&yellow ' . $dbh->errstr . "\n"; $status = 'WARNING: Unexpected response getting info'; last; } if defined $dbh->err; $checks .= $info . " server\n"; $stats = $dbh->{'mysql_stat'}; do { $color='yellow'; $checks .= '&yellow ' . $dbh->errstr . "\n"; $status = 'WARNING: Unexpected response getting stats'; last; } if defined $dbh->err; $stats =~ s/\b\s+(\w[a-zA-Z ]+)\:/\n$1:/g; }; # the 'last's above take us here.... this is cheap error exiting $dbh->disconnect(); do { $color='yellow'; $checks .= '&yellow ' . $dbh->errstr . "\n"; $status = 'WARNING: Unexpected response closing connection'; } if defined $dbh->err; }; # end unless connection failed }; # end do unless DEBUG block # Send the result back $date = strftime("%a %b %d %H:%M:%S %Z %Y", localtime($starttime)); $hostname =~ s/\./,/g; # Modify as needed DEBUG > 1 and $down and $color eq 'red' and print(" -> thread $tid checking $work, is down but also ping down ... setting to clear\n"); $color = 'clear' if ($down && $color eq 'red'); $responses{$work} = "status/group:mysql ${hostname}.mysql $color $date - $status\n\n$checks\n$stats\n\nSeconds: $elapsed\n\n\n"; # Loop back to idle state if not told to terminate } while (! $TERM); # end of do block }; # end of WORK (label used to quit for good without modifying $TERM) # All done DEBUG > 1 and print "Finished -> $tid, size " . threads->self()->get_stack_size() . "\n"; }; ===== Known Bugs and Issues ===== Threads on perl involve a slight amount of black magic. Occasional segfaults may occur, but if you're receiving them on every run then there's probably an issue with your mysql libraries or the DBD::mysql module. ===== To Do ===== Rather than generate combo messages textually, a future version should use future xymon capabilities to combo-ize for us. The current script returns the 'STATUS' output in NCV and can be used for basic mysql graphing of queries per second and a few other metrics. Certain commands like 'SHOW VARIABLES' and 'SHOW GLOBAL STATUS' may also be able to be run and, if present, be payloaded back in a separate message set (eg, 'data' or 'client'). This script is primarily designed for fast TCP checking, however, and not full health evaluation. ===== Credits ===== Includes some code from the 'threads' module on CPAN's [[http://cpansearch.perl.org/src/JDHEDDEN/threads-2.02/examples/pool_reuse.pl|pool_reuse.pl]] script, which was a great inspiration. Thank you to [[http://www.servicenow.com/|ServiceNow]] for providing an environment where this could be deployed. ===== Changelog ===== * **2015-10-16** * Initial public release