urlplus.pl

Compatibility	Xymon 4.2+
Requirements	Perl, unix, curl
Download	None
Last Update	2008-04-30
Description

Provides a more robust form of URL content checking than Xymon does out of the box. Per-host user-configuration is provided.
Client side

N/A
Server side

Copy urlplus.pl to $BBHOME/ext
Create a new configuration file: $BBHOME/etc/cont-check.cfg
- Add the “urlplus” option to the applicable hosts in the bb-hosts file
- Create an entry in cont-check.cfg for each of the hosts to be monitored
Edit hobbitlaunch.cfg to run urlplus.pl
- [urlplus]
  ENVFILE $BBHOME/etc/hobbitserver.cfg
  NEEDS hobbitd
  CMD $BBHOME/ext/urlplus.pl
  LOGFILE $BBSERVERLOGS/urlplus.log
  INTERVAL 5m
Show Code ⇲
Hide Code ⇱
#!/usr/bin/perl
################################################################################
# Author:       Gary Baluha
# Created On:   1-02-2008
# Description:  Provides a more robust form of URL checking than Hobbit itself
#               does.  Includes HTTP response code checking, simple content and
#               reverse content checking, and form submission content checking.
#
# USAGE AND NOTES:
# ================
#               *) The host must be defined in bb-hosts before any
#               definition defined in cont-check.cfg will be processed.
#
#               *) Hosts defined in bb-hosts with the "urlplus" option, but not
#               configured on cont-check.cfg will be ignored.
#
#               *) Lines starting with "#" in cont-check.cfg are comments.
#
#               *) Adding a URL test
#               --------------------
#               1) Add the following keyword to the the appropriate host in
#                  the bb-hosts file: urlplus
#               2) Use the following format in cont-check.cfg
#                  <TEST NAME>;<TYPE>;<OPTIONS>;<URL>[;<CONT_OPT>]
#               Where:
#               <TEST NAME> = The name of the test name (host) as defined in
#                             the bb-hosts file
#               <TYPE>      = find  =Good if <CONTENT> found
#                             nofind=Good if <CONTENT> NOT found
#                             both  =Good if <CONTENT> found AND <NOCONTENT>
#                                    not found
#                             none  =Check only for a valid HTTP response.
#               <URL>       = The URL to test
#               <CONT_OPT>   = One (or more) of the below:
#                       <CONTENT>   = Used with <TYPE>=find,nofind,both.
#                                     The PCRE content to check for
#                       <NOCONTENT> = Used with <TYPE>=both.
#                                     The "good if NOT matched" pattern
#                       <POST>      = Used with <TYPE>=post.
#                                     This is a 3-part field:
#                                     <CONTENT>;<POST>;<CONTENT>
#                                     Where <POST> is the full HTTP post data
#               <OPTIONS>   = Optional flags separated by ,
#                       Available flags are:
#                       *) tNN  <- URL timeout, in seconds
#                       *) p    <- Use the default (http) proxy
#                       *) P    <- Use the default secure (https) proxy
#                       *) g    <- Default <TYPE>=post is to POST.
#                                  This changes it to GET.
#                       *) r    <- Follow page redirects (internal to curl)
#                       *) R    <- Follow page redirects (new curl session)
#                                  (Currently assumes to use the http proxy)
#                       *) u    <- Use an alternate User-Agent string
#                                  (Currently hard-coded for Mozilla/Firefox)
#
# EXAMPLES:
# =========
# () Simple HTTP valid-response check:
#       http-resp;none;;http://www.mytest.com/index.html
# () HTTP valid-response with https proxy:
#       http-resp-proxy;none;P;https://www.mytest.com/index.html
#
# () Basic content check:
#       my-contcheck;find;;http://www.testsite.net/index.html;.title.Content.*
# () Content check with custom timeout and http proxy:
#       my-contcheck2;find;t20,p;http://www.testsite.net/index.html;Good.Content
# () Reverse content check:
#       my-revcont;nofind;;https://www.securetest.org/index.jsp;Bad.If.Found
# () Combined "find" and "nofind" content check, with custom timeout:
#       my-combo;both;t5;http://doubletest.com;Good.If.Found;Bad.If.Found
#
# () Simple one-page form submission content check:
#       my-post;post;;http://www.postme.com;First.Page;name=test&clickhere=submit;All's.OK
# () Multi-page form submission content check:
#       my-post2;post;;https://post.com;First.Page;name=test&click=submit;All's.OK;name=test2&post=submit;Also.OK
#
# REVISION HISTORY:
# =================
# Ver.  Date            Author  Notes
# ----  ----            ------  -----
# v1.0  1-2-2008        GMB     Initial creation.
# v1.1  1-3-2008        GMB     First working version.
# v1.3  1-3-2008        GMB     Fixed a few minor oversights.
# v1.4  1-7-2008        GMB     Modified the output to be more useful.
#                               Added logic to deal with URLs containing ".
# v1.5  1-8-2008        GMB     showUrl() is not called only once per host.
#                               Minor change to the debugging flags.
# v1.6  1-10-2008       GMB     Added better whitespace handling.
#                               Changed the format of cont-check.cfg slightly.
# v1.7  1-16-2008       GMB     Added user-configurable (per host) timeout.
# v1.8  1-18-2008       GMB     Added an option to just check for valid http
#                               responses, ignoring content, and rearranged a
#                               good portion of the code to allow it cleanly
#                               work with the new feature.
#                               Also added a proxy option.
# v1.9  1-23-2008       GMB     Corrected the proxy option.
# v1.10 1-28-2008       GMB     Added additional information in the output of
#                               decodeErr(). First version posted to The Shire.
# v1.11 1-30-2008       GMB     Added a POST and GET option. Also cleaned up
#                               the above comment section for readability, and
#                               added additional comments to be more friendly
#                               to others using/modifying this script. Added
#                               an "examples" and "todo" comment section.
# v1.12 3-24-2008       GMB     Added two options to follow redirects.
#                               One method has curl follow the redirect itself,
#                               the other opens a new curl session to the
#                               redirected page.
# v1.21 4-29-2008       GMB     Added an option to send a specific
#                               User-Agent string. (Currently hard-coded)
# v1.22 4-30-2008       GMB     Added cookie support, and cleaned up a few
#                               bits of code that I thought ugly ways of doing
#                               things. Also, all constant variables have been
#                               CAPITALIZED.
#
# TODO:
# =====
# 1 -   Currently, the <TYPE>=post feature only works with a 2-level page
#       (i.e. content check initial page, submit data, content check resulting
#       page)
# 2 -   With the addition of the <TYPE>=post feature, it will probably
#       be good to make this script threadable to improve performance.
# 3 -   The User-Agent option should be configuratable via the config
#       file.
################################################################################
use strict;
 
# USER-MODIFIABLE CONSTANTS:
# ==========================
# Default Values
use constant DEFAULT_TO => 15;          # Default http timeout in seconds
use constant HTTP_PROXY => "http://http-proxy.cuc.com:80";
use constant HTTPS_PROXY => "https://http-proxy.cuc.com:80";
use constant COOKIE_DIR => "/tmp/cookies";
use constant ALT_USER_AGENT => "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14";
 
# Hobbit Paths And Options
my $BBDISP     = $ENV{'BBDISP'};
my $BBPROG     = $ENV{'BB'};
my $BBHOSTGREP = $ENV{'BBHOME'}."/bin/bbhostgrep";
my $CFGPATH    = $ENV{'BBHOME'}."/etc"; #} It's possible someone might want
my $CFGFILE    = "cont-check.cfg";      #| this config file in a non-standard
                                        #} location...
use constant COLUMN => "URL_Plus";      #} No need to change these. They are
use constant BBTAG => "urlplus*";       #} simply available for reference.
 
################################################################################
# No User-Servicable Parts Below!
# Removal of seal voids warranty!
# Qualified service personel only!
################################################################################
# CONSTANTS:
# ==========
# Function Return Codes
use constant BAD_ARGS => -2;    # Bad arguments, or not enough (<TYPE>=post)
use constant PASS => -1;        # The primary matching objective succeeded
use constant UNKNOWN => 0;      # An unknown error occurred
use constant FAIL => 1;         # The primary matching objective failed
use constant PROXY_ERR => 5;    # Error with the proxy
use constant DNS_ERR => 6;      # Err, we couldn't resolve the host...
use constant CONN_ERR => 7;     # Failed to connect to the host
use constant TIMEDOUT => 28;    # Http timeout while retrieving URL
use constant POST_ERR => 34;    # Problem sending a POST to the server
# Bitwise Encoding Kung-Fu
use constant MAX_ERR => 512;    # Max error number (largest 9-bit number)
use constant SHIFT => 10;       # Shift by 10 bits
use constant MASK => 1023;      # 2^10-1 = 10-bit number
use constant BASE => -1;        # The no. encoded in the lower 9 bits
use constant HIDDEN => 1;       # The no. encoded from the 10th bit
# Urlplus Options
use constant NONE => "none";
use constant FIND => "find";
use constant NOFIND => "nofind";
use constant BOTH => "both";
use constant POST => "post";
#
use constant PROXY => -1;
use constant NO_PROXY => 0;
use constant SECURE_PROXY => 1;
#
use constant POST_FORM => 0;    # Submit form data using "post" method
use constant GET_FORM => 1;     # Submit form data using "get" method
#
use constant SEPREDIR => -1;    # Follow redirects (new session)
use constant NOREDIR => 0;      # Don't follow redirects (default)
use constant REDIRECT => 1;     # Follow redirects (within curl)
# Miscellaneous
use constant TOK_SEP => ";";
use constant TRUE => 0;
use constant FALSE => -1;
 
# GLOBAL VARIABLES:
# =================
# "use constant" variable aliases
my $TOK_SEP=TOK_SEP;
my $HTTP_PROXY=HTTP_PROXY;
my $HTTPS_PROXY=HTTPS_PROXY;
my $BBTAG=BBTAG;
my $COLUMN=COLUMN;
my $COOKIE_DIR=COOKIE_DIR;
 
my $user_agent=undef;   # I'm lazy, so some day I'll make this neater
 
my $hostline = undef;   # The bb-hosts line for the current machine
my $opts = undef;
my $url = undef;
my @post = undef;       # <TYPE>=post can have multiple fields
my $cont = undef;
my $nocont = undef;
my $errStep = undef;
#
my $option = undef;     # Current urlplus option
my $timeout = undef;    # Timeout option
my $proxy = undef;      # Proxy option
my $gp_method = undef;  # Get/Post method option
my $redirect = undef;   # Follow/don't follow redirects
#
my $urlhosts;           # List of hosts defined in bb-hosts using urlplus
my $color;              # The color to return to Hobbit for the URL test
my $msg = undef;        # The message to return to the Hobbit server
my $debug1 = undef;     # Debugging enabled?  (defaults to "false")
my $debug2 = undef;     # For verbose debugging (2)
my $debug3 = undef;     # For verbose debugging (3)
 
 
# ==========
# MAIN LOOP:
# ==========
# "secret" simple debugging program arguments.
# These are only useful when modifying this script.
my $arg1 = $ARGV[0];
if ($arg1) {
        if ( $arg1 eq "-v" ) { $debug1 = "true"; }
        elsif ( $arg1 eq "-vv" ) { $debug1 = $debug2 = "true"; }
        elsif ( $arg1 eq "-vvv" ) { $debug1 = $debug2 = $debug3 = "true"; }
        else { print "Unknown argument: $arg1\n"; }
}
 
 
# High-level outline:
# -------------------
# 1) Get a list of the hosts using this script
# 2) Process the hosts using this script
#       2A) Get all the urlplus arguments
#       2B) Make sure the host was defined in bb-hosts
#       2Bb) Parse any optional flags
#       2C) Perform the content/http check
#       2D) Report back to the Hobbit server
 
# 1) Get a list of the hosts using this script
my @hostgrepin = qx($BBHOSTGREP $BBTAG);
foreach $hostline (@hostgrepin)
{
        # Get all the applicable hosts
        my( undef, $machine, undef, $tag ) = split / /, $hostline;
        $urlhosts->{$machine} = $machine;
}
 
# 2) Process the hosts using this script
my @hostgrepin = qx(cat $CFGPATH/$CFGFILE);
foreach $hostline (@hostgrepin)
{
        # 2A) Get all the urlplus arguments
        # ---------------------------------
        chomp($hostline);
        my( $machine, $type ) = split /$TOK_SEP/, $hostline;
        if ( length($machine) <= 1 ) { next; }  # ignore blank lines
        if ( $machine =~ /\s*#.*/ ) { next; }   # ignore comment lines
 
        # Ignore unknown content check types
        if ($type ne FIND && $type ne NOFIND && $type ne BOTH &&
                $type ne POST && $type ne NONE)
        {
                # ignore unknown content check types
                print "$machine: \"$type\" unknown content check - skipping\n";
                next;
        }
 
        # Get all the tokens, depending on the content check type
        if ( $type eq NONE ) {
                ( undef,undef,$opts,$url ) = split /$TOK_SEP/, $hostline;
        } elsif ( $type eq BOTH ) {
                ( undef,undef,$opts,$url,$cont,$nocont ) = split /$TOK_SEP/, $hostline;
                chomp($cont); chomp($nocont);
        } elsif ( $type eq POST ) {
                ( undef,undef,$opts,$url,@post ) = split /$TOK_SEP/, $hostline;
                chomp($cont);
        } else {
                ( undef,undef,$opts,$url,$cont ) = split /$TOK_SEP/, $hostline;
                chomp($cont);
        }
        chomp($opts); chomp($url); $url = escapeString($url);
 
        # 2B) Make sure the host was defined in bb-hosts
        # ----------------------------------------------
        if (!$urlhosts->{$machine}) {
                print "$machine not defined in bb-hosts - skipping\n";
                next;
        }
 
        # Debugging
        if ( $debug2 ) {
                print "\nTest name=$machine\n\ttype=$type\n\toptions=$opts";
                if ( $debug3 ) { print "\n\tURL = $url"; }
                if ( $type eq FIND ) {
                        print "\n\tcont = $cont";
                } elsif ( $type eq NOFIND ) {
                        print "\n\tnegcont = $cont";
                } else  {
                        print "\n\tcont = $cont\n\tnegcont = $nocont";
                        }
                print "\n";
        }
 
        # 2Bb) Parse any optional flags
        # -----------------------------
        # Reset the options to default
        $timeout = DEFAULT_TO;
        $proxy = NO_PROXY;
        $gp_method = POST_FORM;
        $redirect = NOREDIR;
        $user_agent = undef;
 
        # Update the options (if any)
        if ( $opts ) {
                # Get the defined options
                $option = undef;
                my @opts = split /,/, $opts;
                foreach $option (@opts) {
                        # Option tNN <- Timeout in seconds
                        if ( $option =~ /t[0-9][0-9]/ ) {
                                $timeout = substr $option, 1;
                        }
 
                        # Option p/P <- Use the http/https proxy
                        if ( $option =~ /p/ ) {
                                $proxy = PROXY;
                        } elsif ( $option =~ /P/ ) {
                                $proxy = SECURE_PROXY;
                        }
 
                        # Option g <- Form get/post method
                        if ( $option =~ /g/ ) {
                                if ($type ne POST) {
                                        # Only <TYPE>=post can use this option
                                        print "$machine is not a \"post\"-style check, but is using the \"multi-test\" option - skipping\n";
                                        next;
                                }
                                $gp_method = GET_FORM;
                        }
 
                        # Option R <- Follow redirects
                        if ( $option =~ /R/ ) {
                                $redirect = SEPREDIR;
                        } elsif ( $option =~ /r/ ) {
                                $redirect = REDIRECT;
                        }
 
                        # Option u <- Use alternate User-Agent string
                        if ( $option =~ /u/ ) {
                                $user_agent = ALT_USER_AGENT;
                        }
                }
        }
 
        # 2C) Perform the content check
        # -----------------------------
        $color = "green";
        $msg = "<a href='$url'>URL</a>\n\n";
 
 
        # Get the URL output
        my $output = showUrl($url, $timeout, $proxy, $redirect);
        $output = escapeString($output);
        $cont = escapeString($cont);
        my $contRet = undef;
        my $postRet = undef;
 
        # Check for non-green situations
        if ( $type eq FIND ) {
                # --- <TYPE>=FIND ---
                $contRet = contFind($output, $cont);
                if ( $contRet == FAIL ) {
                        $color="red";
                        $msg.="Content match failed: Couldn't find \'$cont\'\n";
                } elsif ( $contRet != PASS ) {
                        $color="red";
                        $msg.=decodeErr($contRet, $output)."\n";
                }
        } elsif ( $type eq NOFIND ) {
                # --- <TYPE>=NOFIND ---
                $contRet = contNofind($output, $cont);
                if ( $contRet == FAIL ) {
                        $color="red";
                        $msg.="Reverse content match failed: Found \'$cont\'\n";
                } elsif ( $contRet != PASS ) {
                        $color="red";
                        $msg.=decodeErr($contRet, $output)."\n";
                }
        } elsif ( $type eq BOTH ) {
                # --- <TYPE>=BOTH ---
                $contRet = contFind($output, $cont);
                if ( $contRet == PASS ) {
                        $contRet = contNofind($output, $nocont);
                        if ( $contRet == FAIL ) {
                                $color="red";
                                $msg.="Content match passed, but reverse content match failed: Found \'$nocont\'\n";
                        } elsif ( $contRet != PASS ) {
                                $color="red";
                                $msg.=decodeErr($contRet, $output)."\n";
                        }
                } elsif ( $contRet == FAIL ) {
                        $color="red";
                        $msg.="Content match failed: Couldn't find \'$cont\'\n";
                } else {
                        $color="red";
                        $msg.=decodeErr($contRet, $output)."\n";
                }
        } elsif ( $type eq POST ) {
                # --- <TYPE>=POST ---
                $postRet = contPost($url, $timeout, $proxy, $redirect, $gp_method, @post);
                if ( $postRet > MAX_ERR ) {
                        # IF we're here, the error code & the test
                        # step the error occurred on are encoded.
                        $contRet = bitwiseDecode($postRet, BASE);
                        $errStep = bitwiseDecode($postRet, HIDDEN);
                } else { $contRet = $postRet; $errStep = 1; }
                if ( $contRet == BAD_ARGS ) {
                        # Yellow, because there wasn't enough
                        # info to actually do the content check.
                        $color="yellow";
                        $msg.="Not enough arguments for form content check!\n";
                } elsif ( $contRet == FAIL ) {
                        $color="red";
                        $msg.="Failure on step $errStep: Content test failed\n";
                } elsif ( $contRet != PASS ) {
                        $color="red";
                        $msg.="Failure on step $errStep: ".decodeErr($contRet, $output)."\n";
                }
        } else {
                # --- <TYPE>=NONE ---
                $contRet = curlError($output);
                if ( $contRet != PASS ) {
                        $color="red";
                        $msg.=decodeErr($contRet, $output)."\n";
                }
        }
 
        # If we enter this block, everything's OK
        if ( $color eq "green" ) {
                if ($type eq FIND) {
                        $msg.="Content match successful: Found \'$cont\'";
                } elsif ($type eq NOFIND) {
                        $msg.="Reverse content match successful: Didn't find \'$cont\'";
                } elsif ($type eq BOTH) {
                        $msg.="Compound content check successful: Found \'$cont\' without finding \'$nocont\'";
                } elsif ($type eq POST) {
                        $msg.="Form submission check successful\n";
                } else { # $type eq NONE
                        $msg.="Http response OK\n";
                }
        }
 
        # 2D) Report back to the Hobbit server
        # ------------------------------------
        qx($BBPROG $BBDISP "status $machine.$COLUMN $color\n$msg\n");
        if ( !$opts ) { $opts = "NONE"; }
        print "status $machine.$COLUMN $color ($opts)\n";
        if ( $debug1 ) { print "\t$msg"; }
}
 
 
# =============
# SUB-ROUTINES:
# =============
# High-level outline:
# -------------------
# GET url, searchString, searchNoString
# output= curlError(escapeString(showUrl(url)))
# SWITCH (<TYPE>)
#       none:   result= output
#       find:   result= contFind(output, searchString)
#       nofind: result= contNofind(output, searchString)
#       both:   IF result= contFind(output, searchString) OKAY THEN
#               result= contNofind(output, searchNoString)
#       post:   ...
# END
# IF result OKAY THEN
#       GREEN-HOBBIT
# ELSE
#       decodeErr(result)
#       RED-HOBBIT
# ENDIF
 
#
# The below are helper/utility functions
#
sub escapeString
# Purpose:      URL encodes the provided string.
# Inputs:       $str  = The string to escape
# Outputs:      $str, url-encoded
{
        my($str) = @_;
 
        # URL Encoding:
        # "     -> %22
        $str =~ s/"/%22/g;
 
return $str;
}
 
 
sub curlError
# Purpose:      Checks if curl returned an error, and if the http response
#               is a valid response code
# Inputs:       $urlout = The URL content to check
# Outputs:      PASS if $urlout isn't a curl error code, or is undef
#               TIMEDOUT if the response indicates curl timed out
{
        my($urlout) = @_;
        my $ret = PASS;
 
        # CURLcode errors that we check for
        # (http://curl.haxx.se/libcurl/c/libcurl-errors.html)
        #
        # 0     <= CURLE_OK
        #               No problems here, carry on
        # 5     <= CURLE_COULDNT_RESOLVE_PROXY
        #               Couldn't resolve the proxy (another DNS error)
        # 6     <= CURLE_COULDNT_RESOLVE_HOST
        #               Yep, couldn't resolve the host (i.e. DNS error)
        # 7     <= CURLE_COULDNT_CONNECT
        #               Curl couldn't connect to the host/proxy
        # 28    <= CURLE_OPERATION_TIMEDOUT
        #               Curl timed out while trying to get the URL. This
        #               probably means we exceeded the time we manually set.
        # 34    <= CURLE_HTTP_POST_ERROR
        #               Something Bad(tm) happened while sending a POST
        #               to the http server
        # 52    <= CURLE_GOT_NOTHING
        #               The web server didn't return any content
        # 56    <= CURLE_RECV_ERROR
        #               Failure with receiving network data (probably
        #               an issue with the proxy).
 
        if ( !$urlout || $urlout =~ /curl: \(0\).*/ ||
                        $urlout =~ /curl: \(52\).*/ ) {
                return PASS;
        } else {
                if    ( $urlout =~ /curl: \(5\).*/ ||
                        $urlout =~ /curl: \(56\).*/ ) { $ret=PROXY_ERR; }
                elsif ( $urlout =~ /curl: \(6\).*/ )  { $ret=DNS_ERR; }
                elsif ( $urlout =~ /curl: \(7\).*/ )  { $ret=CONN_ERR; }
                elsif ( $urlout =~ /curl: \(28\).*/ ) { $ret=TIMEDOUT; }
                elsif ( $urlout =~ /curl: \(34\).*/ ) { $ret=POST_ERR; }
                elsif ( $urlout =~ /curl: no URL specified!/ ) { $ret=UNKNOWN; }
        }
 
return $ret;
}
 
 
sub decodeErr
# Purpose:      Decodes the responses from curlError() to a human-readable
#               string.
# Inputs:       $err    = The error code to decode
#               $urlout = The full error message
# Outputs:      A string explaining the error code
{
        my($err, $urlout) = @_;
 
        if ( $err == PROXY_ERR ) { return "Proxy error: $urlout"; }
        elsif ( $err == DNS_ERR ) { return "DNS error"; }
        elsif ( $err == CONN_ERR ) { return "Timeout or Couldn't connect"; }
        elsif ( $err == TIMEDOUT ) { return "Http request timed out"; }
        elsif ( $err == POST_ERR ) { return "Error during http POST"; }
 
return "Unknown error or problem";      # we should never get here
}
 
 
sub bitwiseEncode
# Purpose:      Bitwise-encodes two values together.  The limitation is
#               that the base (lower bit-boundary) number must be less
#               than MAX_ERR (9-bit number).  Also, this algorithm only
#               works with unsigned integers.
# Inputs:       $base   = The number to encode in the lower 9-bits
#               $hidden = The number to encode starting in the 10th bit
# Outputs:      $base & $hidden bitwise-encoded into a single 32-bit integer.
{
        my($base, $hidden) = @_;
return ($base | ($hidden << SHIFT));
}
 
 
sub bitwiseDecode
# Purpose:      Decodes an integer encoded with bitwiseEncode()
# Inputs:       $encoded = The bitwise-encoded integer.
#               $decPos  = BASE to decode out the base number,
#                          HIDDEN to decode out the "upper" number.
# Outputs:      Either the "lower" or "upper" value decoded.
{
        my($encoded, $decPos) = @_;
        my $base = $encoded & MASK;
 
        if ( $decPos eq HIDDEN ) { return (($encoded - $base) >> SHIFT); }
        # else
return $base;
}
 
 
#
# All of the above code is essentially a wrapper
# for the below functions.
#
sub showUrl
# Purpose:      Displays the output of the URL.
# Inputs:       $url    = The URL to display
#               $tout   = Timeout for retrieving the URL
#               $proxy  = The proxy to use (if any)
#               $redir  = Follow redirects?
#               $submit = Optional: Form submission argument string
#               $method = Optional: If set, form method=get
# Outputs:      The output from curl
{
        my($url, $tout, $proxy, $redir, $submit, $method) = @_;
        my $ret = undef;
        my $pstr = "";
        my $form = "";
        my $other = "";
 
        # This technically isn't needed, but
        # it's useful fault-tolerance
        if ( !$tout ) { $tout = DEFAULT_TO; }
 
        # Get the proxy
        # (curl option: -x <proxy>)
        if ( $proxy == PROXY )
                { $pstr="-x $HTTP_PROXY"; }
        elsif ( $proxy == SECURE_PROXY )
                { $pstr="-x $HTTPS_PROXY"; }
 
        # Are we following redirects?
        if ( $redir == REDIRECT ) {
                $other.=" -L";
        }
 
        # Use an alternate User-Agent string?
        if ( $user_agent ) {
                $other.=" -A \"$user_agent\"";
        }
 
        # Get the form submission (if any)
        if ( $submit && $method == GET_FORM ) {
                $form = "-d \"$submit\" -G";
        } elsif ( $submit ) {
                # This does graceful error handling
                # if $method is an invalid value
                $form = "-d \"$submit\"";
        }
 
        #Curl options:
        #=============
        #These are for all tests:
        # -s    <- Don't output a progress bar
        # -k    <- Trust all SSL certificates
        # -S    <- Shows an error message if curl fails
        #          ( 2>&1 : redirect stderr to stdout so we can
        #          actually get whatever error curl returns)
        #These are user-configurable:
        # -m NN <- Don't take longer than $tout seconds to run
        # $pstr <- The proxy option (or blank if no proxy used)
 
        if ( $redir == SEPREDIR ) {
            # -I <- Only display the headers (to get the "Location:" URL)
            $url=qx(curl -S -s -k -m $tout -b $COOKIE_DIR $other -I $pstr $form "$url"|grep Location|sed "s/Location: //" 2>&1);
            $ret=qx(curl -S -s -k -m $tout -b $COOKIE_DIR $other -L -x $HTTP_PROXY $form "$url" 2>&1);
            #print "*** $url\n$ret";
        } else {
                $ret=qx(curl -S -s -k -m $tout -b $COOKIE_DIR $other $pstr $form "$url" 2>&1);
        }
 
return $ret;
}
 
 
sub contFind
# Purpose:      Performs a positive content match (i.e. The content
#               must exist for the status to be green).
# Inputs:       $urlout = The URL content to check
#               $cont   = The PCRE content to match for
# Outputs:      PASS if $cont is found in the result of $url,
#               FAIL if $cont isn't found,
#               curlError output otherwise
{
        my($urlout, $cont) = @_;
        my $ceRet = curlError($urlout);
 
        if ( $ceRet == PASS ) {
                # If we're here, we at least didn't have any
                # problems retrieving the URL
                if ( $urlout =~ m/$cont/ ) { return PASS; }
                else { return FAIL; }
        }
 
return $ceRet;
}
 
 
sub contNofind
# Purpose:      Performs a reverse content match (i.e. The content
#               must not exist for the status to be green).
# Inputs:       $urlout = The URL content to check
#               $cont   = The PCRE content to match for
# Outputs:      PASS if $cont ISN'T found in the result of $url,
#               FAIL if $cont IS found,
#               curlError output otherwise
{
        my($urlout, $cont) = @_;
        my $ceRet = curlError($urlout);
 
        if ( $ceRet == PASS ) {
                # If we're here, we at least didn't have any
                # problems retrieving the URL
                if ( $urlout =~ m/$cont/ ) { return FAIL; }
                else { return PASS; }
        }
 
return $ceRet;
}
 
 
sub contPost
# Purpose:      Processes all steps of a post-style check, including any
#               multi-page testing.
# Inputs:       $url    = The starting URL to check
#               $tout   = Timeout for retrieving the URL
#               $proxy  = The proxy to use (if any)
#               $method = Are we using the GET or POST method?
#               @cont   = The PCRE contents to match for
# Outputs:      .
{
        my($url, $tout, $proxy, $redirect, $method, @cont) = @_;
        my $len = $#cont + 1;   # Size of @cont
        my $output = undef;
        my $contRet = undef;
        my $submission = undef;
        my $shiftCont;
        my $cont = undef;
 
        # --START-- DEBUG
        if ( $debug2 ) {
                my $temp;
                if ($method eq GET_FORM) { $temp="GET"; } else {$temp="POST";}
                print "method=$temp\n   url=$url\n";
                my $cnt = 0;
                foreach $cont (@cont) {
                        print "   cont #$cnt|$cont|\n";
                        $cnt = $cnt + 1;
                }
        }
        # --END-- DEBUG
 
        # Make sure we have enough arguments:
        # Minimum 3 arguments (<CONTENT>;<POST>;<CONTENT>),
        #   or any even number after that (;<POST>;<CONTENT>;...)
        if ( $len < 3 || ($len-3)%2 != 0 ) { return BAD_ARGS; }
 
        # Do the first-page content check:
        #print "level=1\n";
        #print "   url=$url\n";
        $output = showUrl($url, $tout, $proxy, $redirect);
        $output = escapeString($output);
        $shiftCont = shift @cont;
        $cont = escapeString($shiftCont);
        #print "   content=|$cont|\n\n";
        $contRet = contFind($output, $cont);
        if ( $contRet != PASS ) { return $contRet; };
 
        # Now hit submit on the first page, and check
        # the content of the resulting page:
        my $level = 2;
        #print "level=$level\n";
        $submission = shift @cont;
        #print "   url=$url\n   submit=|$submission|$method\n";
        $output = showUrl($url, $tout, $proxy, $redirect, $submission, $method);
        $output = escapeString($output);
        $shiftCont = shift @cont;
        #print "   content=|$shiftCont|\n\n";
        $cont = escapeString($shiftCont);
        $contRet = contFind($output, $cont);
        if ( $contRet != PASS ) { return bitwiseEncode($contRet, $level); }
 
        # Finally, loop through each additional (optional) page to check:
        for ( $level++, $submission=shift @cont ; $submission ;
        $submission=shift @cont, $level++ ) {
                #print "level=$level\n";
                #print "   submit=|$submission|$method\n";
                $output = showUrl($url, $tout, $proxy, $redirect, $submission, $method);
                $output = escapeString($output);
                $shiftCont = shift @cont;
                #print "   content=|$shiftCont|\n\n";
                $cont = escapeString($shiftCont);
                $contRet = contFind($output, $cont);
 
                if ( $contRet != PASS ) {
                        # No need to keep checking if we've failed
                        return bitwiseEncode($contRet, $level);
                }
        }
 
return $contRet;        # In theory, the only value should be PASS
}
None at this time
Need more testing on a larger variety of hosts, mostly for inclusion of new features.
Integrate user-provided changes.
This was supposed to be just a simple script that quickly grew to be fairly feature-rich. As such, the code could use a make-over, particularly to support some of the more popular requests.
- WWW::Mech isn't as efficient as 'curl', but I have a separate perl program that mimics many of the features of URLPlus, but makes use of a more simpler coding style. I hope to do essentially a rewrite of URLPlus to make use of this simpler coding style.
Thanks to everyone who has made use of URLPlus and provided feedback on the Xymon mailing list. There are too many of you to name here, but suffice to say, you are providing inspiration for me to continue the development of this add-on.