monitors:mysql-checker

Error loading plugin struct
ParseError: syntax error, unexpected 'fn' (T_STRING), expecting :: (T_PAAMAYIM_NEKUDOTAYIM)
More info is available in the error log.

This is an old revision of the document!


MySQL-checker

Author Japheth Cleaver
Compatibility Xymon 4.2
Requirements Perl, MySQL, thread-safe libmysql/DBD::mysql libraries
Download None
Last Update 2015-10-16

A high-performance, multi-threaded script for performing a TCP connection check (w/ login) to a very large number of MySQL servers at once. On high-end multi-core systems with sufficient RAM, 1000's of tests can be performed in <15 seconds.

Because MySQL in a default configuration will eventually block an IP address that does a “dumb” TCP port check without authenticating, a single username and password (or cert) is used and must be valid on all tested systems.

There are a number of tuning parameters that can be used on high-volume installations to affect how many threads to use and how often to send messages to your xymon server.

Copy the scripts to an appropriate directory on your xymon server, modifying the credentials and parameters in the perl script, and any needed logic in the wrapper script. Add “mysql” items to your hosts.cfg files as needed:

# 192.0.2.11	db10011.example.net	# <whatever> mysql <whatever>
# 192.0.2.12	db10012.example.net	# <whatever> mysql:3306 <whatever>
# 192.0.2.13	db10013.example.net	# <whatever> mysql:5678 <whatever>

If only a subset of your servers are being tested (or have the credentials installed), I suggest making a separate hosts.d/* file for just those servers and adjusting the wrapper script to feed that file into the perl script instead of using `xymongrep`.

Add a tasks.cfg section:

      [mysql-checker]
        ENVFILE /etc/xymon/xymonserver.cfg
        CMD /etc/xymon/ext/mysql-checker.sh
        LOGFILE $XYMONSERVERLOGS/mysql-checker.log
        INTERVAL 2m
        MAXTIME 5m

Client side

No client changes are required, but the script requires a single valid user/password (or user/cert) to be present on all servers. This account only needs USAGE and no SQL access.

 GRANT USAGE ON *.* TO 'youruseraccount'@'10.%' IDENTIFIED BY PASSWORD '*longpasswordhash';

SSL certs can be used instead, but this will increase test overhead.

Server side

Because the script uses perl threads, your DBD::MySQL module MUST be compiled to use the re-entrant, thread-safe libmysql. On RHEL6, this means compiling DBD::mysql with –libs=“`mysql_config –libs_r`”. We've also found better luck using the official MySQL-server RPMs from Oracle rather than the vendor-distributed versions installed on the monitor server itself.

Once you've confirmed that you have re-entrant libraries working, it should be safe to remove the “die” command near the bottom of the script. If your library isn't thread-safe, you get occasional segfaults, 'locale' errors and random red mysql statuses.

(Because of the nature of threading on perl, you will occasionally get a segfault once in a while anyway, but if you're getting a segfault in every run, it's probably a library problem.)

Optional

To add some stats graphing support for the mysql test (except the connection time) add to xymonserver.cfg:

 TEST2RRD+=",mysql=ncv"
 GRAPHS+=",mysql"
 SPLITNCV_mysql="Uptime:NONE,Threads:GAUGE,Questions:DERIVE,Slow_queries:DERIVE,Opens:DERIVE,Flush_tables:NONE,Open_tables:GAUGE,Queries_per_second_avg:GAUGE,Seconds:NONE"

In graphs.cfg/d:

 [mysql]
   TITLE MySQL Questions
   YAXIS Count  
   DEF:queries=mysql,Questions.rrd:lambda:AVERAGE
   LINE2:queries#0000FF:Questions per second
   COMMENT:\n
   GPRINT:queries:LAST:Questions  \: %5.1lf (cur)
   GPRINT:queries:MAX: \: %5.1lf (max)
   GPRINT:queries:MIN: \: %5.1lf (min)
   GPRINT:queries:AVERAGE: \: %5.1lf (avg)\n

Show Code ⇲

Hide Code ⇱

#!/bin/sh

# Feed a list of mysql servers to the dedicated mysql checker script.
# 
# We don't do this with a simple port up/down check in xymonnet because we actually 
# need to successfully authenticate to prevent mysqld from (by default) penalizing our IP.

# If you are using a non-default mysql server list, dispense with xymongrep and
# just feed it in directly to the script.

#export DEBUG=1
#export HOSTSFILE='/etc/xymon/hosts.d/just-my-mysql-servers'
#export MAXMSGSPERCOMBO=150

if [ -n "$HOSTSFILE" ] ; then
	/etc/xymon/ext/mysql-checker.pl <"$HOSTSFILE"
else
	TMPFILE="$XYMONTMP/mysql-checker.tmp.$$"
	xymongrep mysql\* > "$TMPFILE"
	if [ -s "$TMPFILE" ] ; then
		/etc/xymon/ext/mysql-checker.pl <"$TMPFILE"
	fi
	rm -f "$TMPFILE"
fi

# Kill anything that stuck around (after the *previous* run) for some reason
# Threaded perl is weird and occasionally pipes don't close.
pgrep -P 1 -U xymon mysql-checker | xargs -r kill -SIGKILL

exit 0

Show Code ⇲

Hide Code ⇱

#!/usr/bin/perl

# mysql-checker.pl
# 
# Given a list, check for mysql service on a given port as efficently as possible.
# 
# WARNING: This is a multi-threaded script. If your DBD::mysql was built or linked 
# against something that wasn't thread safe (or your libmysql subsequently uses stuff
# like a openssl that wasn't thread safe) you will have probs. This includes RHEL 6.
# 
# perl-DBD-MySQL needs to be compiled with --libs=`mysql_config --libs_r` and your
# mysql-config needs to match.
# 
# We've had better luck with the official Oracle MySQL-shared-5.5 RPMs instead of 
# the RHEL 6 mysql-libs-5.1 RPM.
# 
# Includes some code from the 'threads' module's pool_example.pl script.
# 
# Copyright Japheth Cleaver <cleaver-xymon@terabithia.org>
# Licensed under GPLv2
# 
# 
# 


# Master debug flag...
use constant		DEBUG => $ENV{'DEBUG'} || 0;

# Autoflush
DEBUG and $|=1;

use threads 1.39;	# threading for fun and profit
use threads::shared;	# ... the right way
use Thread::Queue;	# ... don't set up and tear down, use queues!

use strict;		# meh
use warnings;		# double meh
use DBD::mysql;		# connect to mysql
use Time::HiRes qw(time usleep); # use hi-res timing info
use POSIX qw(strftime);	# auto-print 'date' -- we could just call `date` once at the
			# beginning and be done with it, but this is slightly more
			# accurate (each test comes back with the timestamp it was 
			# actually tested at) and I'm kind of particular like that.



# MySQL connect info  -- modify as appropriate
# host and IP address are added later on...
use constant {
	USER => 'youruseraccount',
	PASS => 'yourpasswordhere',
	CERT => '/some/path/here.crt',
	PRIVKEY => '/some/path/here.key',
};
use constant DSN => 'DBI:mysql:mysql_connect_timeout=5;';
# use constant DSN => 'DBI:mysql:mysql_ssl=1;mysql_ssl_client_key=' . PRIVKEY . ';mysql_ssl_client_cert=' . CERT . ';mysql_connect_timeout=5;';

# Hard code these for lookup speed
use constant {
	XYMON => $ENV{'XYMON'},
	XYMSRV => $ENV{'XYMSRV'},
	OKSTRING => 'MySQL connection ok',
	COMBO_SIZE  => $ENV{'MAXMSGSPERCOMBO'} || 200,
	COMBO_HEADER => "combo\n",
	CHECK_INT   => (defined $ENV{'mysql_CHECK_INT'} and $ENV{'mysql_CHECK_INT'}) || 1.1,
	MAX_THREADS => (defined $ENV{'mysql_MAX_THREADS'} and $ENV{'mysql_MAX_THREADS'}) || 84,
	MAX_LOOPS   => (defined $ENV{'mysql_MAX_LOOPS'} and $ENV{'mysql_MAX_LOOPS'}) || 55,
	STACK_SIZE  => (defined $ENV{'mysql_STACK_SIZE'} and $ENV{'mysql_STACK_SIZE'}) || 65536,
};


our %downHosts;
%downHosts = ();
if ($ENV{'EXCLUSIONLIST'} && -f $ENV{'EXCLUSIONLIST'} && -s _ ) {
   # A list of hosts to exclude from alerting -- typically due to being conn down, but could be for any reason
   if (open (CONNDOWN, $ENV{'EXCLUSIONLIST'})) {
	chomp and $downHosts{$_}=1 while (<CONNDOWN>);
	close CONNDOWN;
	DEBUG and print " - found ", scalar(keys %downHosts), " hosts to be mooted (ping down)\n";
   };
};

#######################################################################
### Signal Handling ###
#######################################################################
# Flag to inform all threads that application is terminating
  my $TERM :shared = 0;

# Threads add their ID to this queue when they are ready for work
# Also, when app terminates a -1 is added to this queue
  my $IDLE_QUEUE :shared = Thread::Queue->new();

# Gracefully terminate application on ^C or command line 'kill'
  $SIG{'INT'} = $SIG{'TERM'} = sub {
	warn ">>> Terminating early, per signal <<<\n";
	$TERM = 1;
	# Add -1 to head of idle queue to signal termination
	$IDLE_QUEUE->insert(0, -1);
  };

# We open and write in to xymon with perl's open function, meaning we need
# to deal with PIPE signals properly. Same with our children.
  $SIG{PIPE} = 'IGNORE';
  $SIG{CHLD} = 'IGNORE';


#######################################################################
### Begin -- Set up our threads and queues ###
#######################################################################
  threads->set_stack_size(STACK_SIZE);
  our $drh = DBI->install_driver('mysql');
  our $checkerStart = time;
  our %workerQueues :shared;
  our %responses :shared;

  for (1 .. MAX_THREADS) {
     # create the queue
	my $workQueue = Thread::Queue->new();
     # create the thread, pointing it to the queue
	my $thr = threads->create('threadedChecker', $workQueue);
	DEBUG > 1 and print " - made thread ", $thr->tid(), "\n";
     # save the thread's info
	$workerQueues{$thr->tid()} = $workQueue;
  };


#######################################################################
### Main event loop ###
#######################################################################
  my %seen;
  my $comboMessage = my $code = '';
  my $hosts = my $count = 0;
  my $nextHosts = COMBO_SIZE;
  my $nextCheck = time;

DEBUG and print "- Work starting at $nextCheck\n";
$nextCheck += CHECK_INT;


while (<STDIN>) {

	# We're looking for things like:
	# 192.0.2.11	db10011.example.net	# <whatever> mysql <whatever>
	# 192.0.2.12	db10012.example.net	# <whatever> mysql:3306 <whatever>
	# 192.0.2.13	db10013.example.net	# <whatever> mysql:5678 <whatever>
	  DEBUG > 2 and print " -> saw: " . $_;

	# Isolate the interesting fields
	  next unless m/^([\d\.]+)\s+(\S+)\s+#.*(mysql:?(\d+)?)/;

	# my $ip = $1; my $hostname = $2; my $port = $4 || 3306;
	# "ip \t hostname \t port \t isdownflag" is used as a key value and passed to the thread
	  $code = join("\t", $1, $2, ($4 || 3306), (defined($downHosts{$2}) ? 1 : 0) );
	  next if $seen{$code}++;
	
	# Retrieve the first idle thread
	  my $tid = $IDLE_QUEUE->dequeue();
	  last if ($tid < 0);	# none left, exit

	# Enqueue this key
	  DEBUG > 1 and print "Submitting: " . $code . " to " . $tid . "...\n";
	  $workerQueues{$tid}->enqueue($code);
	  $hosts++;


	# Once every set of COMBO_SIZE hosts and CHECK_INT seconds while 
	# processing STDIN, pick up results. This helps spread out message sending 
	# load to xymon instead of just trying to pound it when we're done.
	  next unless ($hosts > $nextHosts && time > $nextCheck);
	  last if $TERM;	# if told to quit

	DEBUG and print "Checking for thread results...\n";
	while ((my @codes = keys %responses) >= COMBO_SIZE) {
      	    DEBUG and print "- see " . @codes . " total results, " . scalar threads->list() . " threads active at " . (time - $checkerStart) . "s\n";
	    $_ and $comboMessage .= $responses{$_} and delete $responses{$_} foreach (@codes[0 .. COMBO_SIZE]);

	    # Anything to send?
	    last unless $comboMessage;

	    # Off to report!
	    if (DEBUG <= 2) {
		DEBUG and print " -- sending combo message --\n";
		
		# Fork, so that we can send to xymon asynchronously?
		if (! fork() ) {

		     # in the child...
		     DEBUG and print "Forked child...\n";
		     close(STDIN) or warn "Can't close STDIN in child ($$): $!\n";

		     # Send to xymon over its STDIN
		     open (XYMFH, '|-', XYMON, '--timeout=30', XYMSRV, '@') or die "Unable to open xymon in child ($$): $!";
		     print XYMFH COMBO_HEADER, $comboMessage;
		     close (XYMFH); # or warn "Unable to close xymon in child ($$); message possibly dropped? ($!)";

		     # Clean up and go away
		     close(STDOUT); close(STDERR);
		     exit 0;

		};

	    } else {
		print "would run XYMON\n==Output==\n$comboMessage\n=== end ===\n";
	    };
	    $comboMessage = '';

	}; # end WHILE >= COMBO_SIZE

	# Don't check again until these thresholds are crossed
	$nextCheck = time + CHECK_INT;
	$nextHosts = $hosts + COMBO_SIZE;

}; # end while STDIN


# We've sent all the work off... Begin "final" response collection..
# 

# Signal all threads that there is no more work
  $workerQueues{$_}->enqueue('') foreach keys(%workerQueues);


# Loop while we wait for threads to complete their processing... if no
# responses are present (no @codes), sleep 
# 
DEBUG and print "Checking for final results...\n";
while ((my @codes = keys %responses) || (sleep 1 && threads->list()) ) {
    DEBUG and print "- see " . @codes . " total results, " . scalar threads->list() . " threads active (wait loop $count)\n";
    $_ and $comboMessage .= $responses{$_} and delete $responses{$_} foreach (@codes[0 ... (@codes < COMBO_SIZE ? ($#codes + 1) : COMBO_SIZE)]);

    # Anything to send? (note the continue block below)
    next unless $comboMessage;

    # Off to report!
    if (DEBUG <= 2) {
	DEBUG and print " -- sending combo message --\n";

	# Fork, so that we can send to xymon asynchronously?
	if (! fork() ) {

	     # in the child...
	     DEBUG and print "Forked child...\n";
	     close(STDIN) or warn "Can't close STDIN in child ($$): $!\n";

	     # Send to xymon over its STDIN
	     open (XYMFH, '|-', XYMON, '--timeout=30', XYMSRV, '@') or die "Unable to open xymon in child ($$): $!";
	     print XYMFH COMBO_HEADER, $comboMessage;
	     close (XYMFH); # or warn "Unable to close xymon in child ($$); message possibly dropped? ($!)";

	     # Clean up and go away
	     close(STDOUT); close(STDERR);
	     exit 0;

	};

    } else {
	print "would run XYMON\n==Output==\n$comboMessage\n=== end ===\n";
    };
    $comboMessage = '';

   } continue {
      # Collect finished threads
      $_->join() foreach threads->list(threads::joinable);
      last if $count++ > MAX_LOOPS;
};

# One way or another, we're done
  DEBUG and threads->list(threads::all) and print "- Killing " . threads->list(threads::all) . " thread(s)\n";
  $_->kill('KILL')->detach() foreach threads->list(threads::all);

  print '- Run complete; did ', $hosts, ' hosts in ', (time - $checkerStart), " seconds\n";

exit 0;



##########################################
# The Actual MySQL Checker
##########################################

sub threadedChecker {

    # This thread's queue and ID
    my ($workQueue) = @_;
    my $tid = threads->tid();

    # Allocate memory once for each thread... no need to increase the chance of things going awry
    my ($ip, $hostname, $port, $down);
    my ($color, $status, $checks, $data, $stats, $info);
    my $starttime = my $elapsed = my $date = 0;
    my $dbh;

    # Work loop
    WORK: { do {

	# Indicate that were are ready to do work
	#DEBUG > 2 and print "Idle     -> $tid\n";
        $IDLE_QUEUE->enqueue($tid);

	# Set variables - by default things are good (we hope)
	$color = 'green'; $status = OKSTRING; $checks = '&green Connected to a MySQL '; $stats = '';

	# pop the stack
	# we're getting a "ip \t hostname \t port \t down" from the main queue
	($ip, $hostname, $port, $down) = split (/\t/, my $work = $workQueue->dequeue(), 4);
	DEBUG > 1 and print("	-> thread $tid checking $work\n");

	# Did we get anything weird or invalid? close out the thread
	last WORK unless $port;


	$starttime = time;
	# If we're not actually doing any work at all (in DEBUG mode), simulate 
	# by adding a small delay here
	(DEBUG > 2 and usleep(150000)) or do {

	 # Check for a connection error...
	 unless ($dbh = DBI->connect(DSN . "host=$ip;port=$port", USER, PASS, { RaiseError => 0, PrintError => DEBUG ? 1 : 0, mysql_no_autocommit_cmd => 1, mysql_auto_reconnect => 0 }) ) {
		$elapsed = time - $starttime;
		$color = 'red'; $checks = '&red ' . $DBI::errstr . "\n"; $status = 'CRITICAL: Unable to connect';

		# Comment this line out once you're sure you've got thread-safe libraries
		die "Got latin1 multi-threading error... $tid is bailing out" if index ($DBI::errstr, "Can't initialize character set latin1", 0) != -1;

	 } else {
	    {
		$elapsed = time - $starttime;

		$info = $dbh->{'mysql_serverinfo'};
		do { $color='yellow'; $checks = '&yellow ' . $dbh->errstr . "\n"; $status = 'WARNING: Unexpected response getting info'; last; } if defined $dbh->err;
		$checks .= $info . " server\n";

		$stats = $dbh->{'mysql_stat'};
		do { $color='yellow'; $checks .= '&yellow ' . $dbh->errstr . "\n"; $status = 'WARNING: Unexpected response getting stats'; last; } if defined $dbh->err;
		$stats =~ s/\b\s+(\w[a-zA-Z ]+)\:/\n$1:/g;

	    };

	    # the 'last's above take us here.... this is cheap error exiting
	    $dbh->disconnect();
	    do { $color='yellow'; $checks .= '&yellow ' . $dbh->errstr . "\n"; $status = 'WARNING: Unexpected response closing connection'; } if defined $dbh->err;

	 }; # end unless connection failed

	}; # end do unless DEBUG block

	# Send the result back
	$date = strftime("%a %b %d %H:%M:%S %Z %Y", localtime($starttime));
        $hostname =~ s/\./,/g;

	# Modify as needed
	DEBUG > 1 and $down and $color eq 'red' and print("	-> thread $tid checking $work, is down but also ping down ... setting to clear\n");
	$color = 'clear' if ($down && $color eq 'red');
	$responses{$work} = "status/group:mysql ${hostname}.mysql $color $date - $status\n\n$checks\n$stats\n\nSeconds: $elapsed\n\n\n";


	# Loop back to idle state if not told to terminate
    } while (! $TERM);	# end of do block

    }; # end of WORK (label used to quit for good without modifying $TERM)

    # All done
    DEBUG > 1 and print "Finished -> $tid, size " . threads->self()->get_stack_size() . "\n";

};

Threads on perl involve a slight amount of black magic. Occasional segfaults may occur, but if you're receiving them on every run then there's probably an issue with your mysql libraries or the DBD::mysql module.

Rather than generate combo messages textually, a future version should use future xymon capabilities to combo-ize for us.

The current script returns the 'STATUS' output in NCV and can be used for basic mysql graphing of queries per second and a few other metrics.

Certain commands like 'SHOW VARIABLES' and 'SHOW GLOBAL STATUS' may also be able to be run and, if present, be payloaded back in a separate message set (eg, 'data' or 'client'). This script is primarily designed for fast TCP checking, however, and not full health evaluation.

Includes some code from the 'threads' module on CPAN's pool_reuse.pl script, which was a great inspiration.

Thank you to ServiceNow for providing an environment where this could be deployed.

  • 2015-10-16
    • Initial public release
  • monitors/mysql-checker.1446792097.txt.gz
  • Last modified: 2015/11/06 06:41
  • by jccleaver