Fri Jan 7 23:25:57 PST 2005
- Previous message: [Slony1-commit] By cbbrowne: Additions and modifications to comments
- Next message: [Slony1-commit] By cbbrowne: Use $SEQUENCE_ID rather than $SEQID, which will become an
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Log Message:
-----------
Added Perl/Bash scripts for testing how Slony-I replication is running.
Also added documentation on these scripts to the Admin guide.
Modified Files:
--------------
slony1-engine/doc/adminguide:
maintenance.sgml (r1.7 -> r1.8)
Added Files:
-----------
slony1-engine/tools:
log.pm (r1.1)
run_rep_tests.sh (r1.1)
test_slony_replication.pl (r1.1)
-------------- next part --------------
Index: maintenance.sgml
===================================================================
RCS file: /usr/local/cvsroot/slony1/slony1-engine/doc/adminguide/maintenance.sgml,v
retrieving revision 1.7
retrieving revision 1.8
diff -Ldoc/adminguide/maintenance.sgml -Ldoc/adminguide/maintenance.sgml -u -w -r1.7 -r1.8
--- doc/adminguide/maintenance.sgml
+++ doc/adminguide/maintenance.sgml
@@ -43,7 +43,7 @@
<para>You might want to run them...</para>
</sect2>
-<sect2><title>Alternative to Watchdog: generate_syncs.sh</title>
+<sect2><title>Parallel to Watchdog: generate_syncs.sh</title>
<para>A new script for <productname>Slony-I</productname> 1.1 is
<application>generate_syncs.sh</application>, which addresses the following kind of
@@ -71,6 +71,74 @@
<para>Note that if <command>SYNC</command>s <emphasis>are</emphasis> running
regularly, this script won't bother doing anything.</para>
</sect2>
+
+<sect2><title>Replication Test Scripts </title>
+
+<para> In the directory <filename>tools</filename> may be found four scripts
+that may be used to do monitoring of <productname> Slony-I
+</productname> instances:
+
+<itemizedlist>
+
+<listitem><para> <command>test_slony_replication.pl</command> is a Perl script
+to which you can pass connection information to get to a
+<productname>Slony-I</productname> node. It then queries <envar>sl_path</envar> and other
+information on that node in order to determine the shape of the
+requested replication set.</para>
+
+<para> It then injects some test queries to a test table called
+<envar>slony_test</envar> which is defined as follows, and which needs to be
+added to the set of tables being replicated:
+
+<programlisting>
+CREATE TABLE slony_test (
+ description text,
+ mod_date timestamp with time zone,
+ "_Slony-I_testcluster_rowID" bigint DEFAULT nextval('"_testcluster".sl_rowid_seq'::text) NOT NULL
+);
+</programlisting></para>
+
+<para> The last column in that table was defined by
+<productname>Slony-I</productname> as one lacking a primary key...</para>
+
+<para> This script generates a line of output for each
+<productname>Slony-I</productname> node that is active for the requested
+replication set in a file called
+<filename>cluster.fact.log</filename>.</para>
+
+<para> There is an additional <option>finalquery</option> option that allows
+you to pass in an application-specific SQL query that can determine
+something about the state of your application.</para></listitem>
+
+<listitem><para><command>log.pm</command> is a Perl module that manages logging
+for the Perl scripts.</para></listitem>
+
+<listitem><para><command>run_rep_tests.sh</command> is a <quote>wrapper</quote> script
+that runs <command>test_slony_replication.pl</command>.</para>
+
+<para> If you have several <productname>Slony-I</productname> clusters, you might
+set up configuration in this file to connect to all those clusters.</para></listitem>
+
+<listitem><para><command>nagios_slony_test.pl</command> is a script
+that was constructed to query the log files so that you might run the
+replication tests every so often (we run them every 6 minutes), and
+then a system monitoring tool such as <ulink
+url="http://www.nagios.org/"> <productname>Nagios</productname>
+</ulink> can be set up to use this script to query the state indicated
+in those logs.</para>
+
+<para> It seemed rather more efficient to have a
+<application>cron</application> job run the tests and have
+<productname>Nagios</productname> check the results rather than having
+<productname>Nagios</productname> run the tests directly. The tests
+can exercise the whole <productname>Slony-I</productname> cluster at
+once rather than <productname>Nagios</productname> invoking updates
+over and over again.</para></listitem>
+
+</itemizedlist></para>
+
+</sect2>
+
<sect2><title> Log Files</title>
<para><link linkend="slon"> <application>slon</application></link> daemons
--- /dev/null
+++ tools/run_rep_tests.sh
@@ -0,0 +1,11 @@
+#!/usr/bin/bash
+# $Id: run_rep_tests.sh,v 1.1 2005/01/07 23:25:51 cbbrowne Exp $
+# Run Slony-I Replication Tests
+
+PASS="some secret"
+TESTQUERY="select name, created_on from some_table order by id desc limit 1;"
+PERL=/usr/bin/perl
+SCRIPT=test_slony_replication.pl
+HOST=localhost
+USER=postgres
+$PERL $SCRIPT -database=mydb -host=$HOST -user=$USER -cluster=test -port=5432 -password="$PASS" --finalquery="$TESTQUERY"
--- /dev/null
+++ tools/test_slony_replication.pl
@@ -0,0 +1,280 @@
+#!perl # -*- perl -*-
+# $Id: test_slony_replication.pl,v 1.1 2005/01/07 23:25:51 cbbrowne Exp $
+# Christopher Browne
+# Copyright 2004
+# Afilias Canada
+
+# This script, given DSN parameters to access a Slony-I cluster,
+# submits insert, update, and delete requests and sees how they
+# propagate through the system.
+
+use Pg;
+use Getopt::Long;
+#use strict;
+
+my $sleep_seconds = 4;
+
+my $goodopts = GetOptions("help", "database=s", "host=s", "user=s", "cluster=s",
+ "password=s", "port=s", "set=s", "finalquery=s");
+if (defined($opt_help)) {
+ show_usage();
+}
+my ($database,$user, $port, $cluster, $host, $password, $set, $finalquery);
+
+$database = $opt_database if (defined($opt_database));
+$port = 5432;
+$port = $opt_port if (defined($opt_port));
+$user = $opt_user if (defined($opt_user));
+$password = $opt_password if (defined($opt_password));
+$host = $opt_host if (defined($opt_host));
+$cluster = $opt_cluster if (defined($opt_cluster));
+$set = 1;
+$set = $opt_set if (defined($opt_set));
+$finalquery = $opt_finalquery if (defined($opt_finalquery));
+
+require 'log.pm';
+initialize_flog($cluster);
+
+#DBI: my $initialDSN = "dbi:Pg:dbname=$database;host=$host;port=$port";
+my $initialDSN = "dbname=$database host=$host port=$port";
+$initialDSN = $initialDSN . " password=$password" if defined($opt_password);
+
+print "DSN: $initialDSN\n";
+
+# DBI: my $dbh = DBI->connect($initialDSN, $user, $password,
+# {RaiseError => 0, PrintError => 0, AutoCommit => 1});
+# die "connect: $DBI::errstr" if ( !defined($dbh) || $DBI::err );
+my $dbh = Pg::connectdb($initialDSN);
+
+# Query to find the "master" node
+my $masterquery = "
+ select sub_provider
+ from _$cluster.sl_subscribe s1
+ where not exists (select * from _$cluster.sl_subscribe s2
+ where s2.sub_receiver = s1.sub_provider and
+ s1.sub_set = $set and s2.sub_set = $set and
+ s1.sub_active = 't' and s2.sub_active = 't')
+ and s1.sub_set = $set
+ group by sub_provider;
+";
+
+my $tq = $dbh->exec($masterquery);
+
+#print "Rummage for master - $masterquery\n";
+my $masternode;
+while (my @row = $tq->fetchrow) {
+ ($masternode) = @row;
+ print "Found master: $masternode\n";
+}
+
+print "Rummage for DSNs\n";
+# Query to find live DSNs
+my $dsnsquery =
+"
+ select p.pa_server, p.pa_conninfo
+ from _$cluster.sl_path p
+ where exists (select * from _$cluster.sl_subscribe s where
+ s.sub_set = $set and
+ (s.sub_provider = p.pa_server or s.sub_receiver = p.pa_server) and
+ sub_active = 't')
+ and p.pa_conninfo not like '%32.85.68.246%'
+ group by pa_server, pa_conninfo;
+";
+
+print "Query:\n$dsnsquery\n";
+$tq = $dbh->exec($dsnsquery);
+my %DSN;
+while (my @row = $tq->fetchrow) {
+ my ($node, $dsn) = @row;
+ if (($node == $masternode) || check_node_for_subscription($node, $dsn)) {
+ $DSN{$node} = $dsn;
+ print "DSN[$node] = $dsn\n";
+ } else {
+ print "Skip node $node / DSN:$DSN - not yet subscribed\n";
+ }
+}
+#$tq->finish();
+
+my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
+my $time_date = sprintf ("%d-%.2d-%.2d %.2d:%.2d:%.2d", $year+1900, $mon+1, $mday, $hour, $min, $sec);
+
+my $insert_text = "INSERT Replication Test";
+my $insert_query = "insert into slony_test (description, mod_date) values ('$insert_text', '$time_date');";
+submit_to_master($masternode, $insert_query, 1);
+
+sleep $sleep_seconds;
+
+my $select_query = "select description, mod_date from slony_test where mod_date = '$time_date' and description = '$insert_text'";
+foreach my $node (keys %DSN) {
+ if ($node != $masternode) {
+ my $result=submit_to_slave($node, $select_query, 1);
+ }
+}
+
+my $update_text = "UPDATE Replication Test";
+
+my $update_qry = "update slony_test set description = '$update_text' where description = '$insert_text' and mod_date = '$time_date'";
+submit_to_master($masternode, $update_qry, 1);
+
+sleep $sleep_seconds;
+
+$select_query = "select description, mod_date from slony_test where mod_date = '$time_date' and description = '$update_text'";
+
+foreach my $node (keys %DSN) {
+ if ($node != $masternode) {
+ my $result=submit_to_slave($node, $select_query, 1);
+ find_slave_backwardness($node);
+ }
+}
+
+# We delete old data...
+my $delete_qry = "delete from slony_test where mod_date < now() - '7 days'::interval;";
+submit_to_master($masternode, $delete_qry, -1);
+finalize_flog($cluster);
+
+sub submit_to_master {
+ my ($node, $query, $expected_count) = @_;
+ my $dsn = $DSN{$node};
+ if ($dsn =~ /password=/) {
+ # got a password
+ } else {
+ $dsn .= " password=$password";
+ }
+ print "Connecting to master node $node - DSN:[$dsn]\n";
+ my $master = Pg::connectdb($dsn);
+ my $status_conn = $master->status;
+ if ($status_conn ne PGRES_CONNECTION_OK) {
+ alog($source, "", "Master connection Query failed - " . $master->errorMessage, $status_conn);
+ report_failed_conn($conn_info);
+ return -1;
+ }
+ my $result = $master->exec($query);
+ if ($expected_count < 0) {
+ print "Final query to master\n";
+ if (defined($opt_finalquery)) {
+ my $lastlogquery = $opt_finalquery;
+ my $result = $master->exec($lastlogquery);
+ my @row = $result ->fetchrow;
+ my ($name, $creation) = @row;
+ log_fact($cluster, "Replication for Cluster: [$cluster] Node: [$node] Behind by: [00:00:00] Last Log:[$name] Created: [$creation]");
+ } else {
+ log_fact($cluster, "Replication for Cluster: [$cluster] Node: [$node] Behind by: [00:00:00]");
+ }
+ } elsif ($result->ntuples != $expected_count) {
+ alog($source, "", "Master $query failed - unexpected tuple count", $result->cmdTuples);
+ return -2;
+ } else {
+ alog($source, "", "Master $query succeeded", 0);
+ return 0;
+ }
+}
+
+sub find_slave_backwardness {
+ my ($node) = @_;
+ my $dsn = $DSN{$node};
+ if ($dsn =~ /password=/) {
+ # got a password
+ } else {
+ $dsn .= " password=$password";
+ }
+ print "Connecting to slave node $node - DSN:[$dsn]\n";
+ my $slave = Pg::connectdb($dsn);
+ my $status_conn = $slave->status;
+ if ($status_conn ne PGRES_CONNECTION_OK) {
+ alog($source, $dest, "Connection Query failed!", -1);
+ return "Connection Failed";
+ }
+ my $behindnessquery = "select coalesce( date_trunc('seconds', now() - max(mod_date)), '999999h'::interval) from slony_test;";
+ my $result = $slave->exec($behindnessquery);
+ if ($result->resultStatus != 2) {
+ alog($source, $dest, "Slave query $query failed", $result->resultStatus);
+ report_failed_conn($conn_info);
+ log_fact($cluster, "Replication Test Failed: Cluster: [$cluster] Node: [$node]");
+ return "Query Failed";
+ } else {
+ my $age;
+ while (my @row = $result ->fetchrow) {
+ ($age) = @row;
+ }
+ if (defined($opt_finalquery)) {
+ my $lastlogquery = $opt_finalquery;
+ my $result = $slave->exec($lastlogquery);
+ while (my @row = $result ->fetchrow) {
+ my ($name, $creation) = @row;
+ log_fact($cluster, "Replication for Cluster: [$cluster] Node: [$node] Behind by: [$age] Last Log:[$name] Created: [$creation]");
+ return;
+ }
+ } else {
+ log_fact($cluster, "Replication for Cluster: [$cluster] Node: [$node] Behind by: [$age]");
+ return;
+ }
+ }
+}
+
+sub submit_to_slave {
+ my ($node, $query, $expect_count)=@_;
+ my $dsn = $DSN{$node};
+ if ($dsn =~ /password=/) {
+ # got a password
+ } else {
+ $dsn .= " password=$password";
+ }
+ print "Connecting to slave node $node - DSN:[$dsn]\n";
+ my $slave = Pg::connectdb($dsn);
+ my $status_conn = $slave->status;
+ if ($status_conn ne PGRES_CONNECTION_OK) {
+ alog($source, $dest, "Connection Query failed!", -1);
+ return "Connection Failed";
+ }
+ my $result = $slave->exec($query);
+ if ($result->resultStatus != 2) {
+ alog($source, $dest, "Slave query $query failed", $result->resultStatus);
+ report_failed_conn($conn_info);
+ return "Query Failed";
+ } elsif ($result->ntuples != $expect_count) {
+ alog($source, $dest, "Slave query failed - $query - slave is behind master", -3);
+ # This indicates that the slave is behind - issue message!
+ return "Slave Behind";
+ } else {
+ alog($source, $dest, "Query $query succeeded", 0);
+ return "OK";
+ }
+}
+
+sub check_node_for_subscription {
+ my ($node, $dsn) = @_;
+ if ($dsn =~ /password=/) {
+ # got a password
+ } else {
+ $dsn .= " password=$password";
+ }
+ my $slave = Pg::connectdb($dsn);
+ my $status_conn = $slave->status;
+ if ($status_conn eq PGRES_CONNECTION_BAD) {
+ print "Status: PGRES_CONNECTION_BAD\n";
+ return;
+ }
+ my $livequery = qq{ select * from _$cluster.sl_subscribe s1 where sub_set = $set and sub_receiver = $node and sub_active;};
+ print "Query: $livequery\n";
+ my $result = $slave->exec($livequery);
+ while (my @row = $result->fetchrow) {
+ print "Found live set!\n";
+ return 1;
+ }
+ print "No live set found\n";
+}
+
+sub report_failed_conn {
+ my ($ci) = @_;
+ $ci =~ s/password=.*$//g;
+ print "Failure - connection to $ci\n";
+}
+
+sub show_usage {
+ my ($inerr) = @_;
+ if ($inerr) {
+ chomp $inerr;
+ print $inerr, "\n";
+ }
+ die "$0 --host --database --user --cluster --port=integer --password --set=integer --finalquery=SQLQUERY";
+}
--- /dev/null
+++ tools/log.pm
@@ -0,0 +1,49 @@
+#!/usr/bin/perl
+# $Id: log.pm,v 1.1 2005/01/07 23:25:51 cbbrowne Exp $
+
+$LOGDIR="/opt/logs/general";
+`mkdir -p $LOGDIR`;
+$HOST = `hostname`;
+$IDENT = "-";
+$USER = `whoami`;
+$SIZE = "-";
+$FLOG = "facts.log";
+chomp($HOST, $IDENT, $USER, $SIZE);
+my $sleep_seconds = 10;
+
+sub initialize_flog {
+ my ($cluster) = @_;
+ open(OUTPUT, ">$LOGDIR/$cluster.$FLOG.tmp");
+ print OUTPUT "# Initialize\n";
+ close OUTPUT;
+}
+sub log_fact {
+ my ($cluster, $fact) = @_;
+ chomp $fact;
+ open(OUTPUT, ">>$LOGDIR/$cluster.$FLOG.tmp");
+ print OUTPUT $fact, "\n";
+ close OUTPUT;
+}
+
+sub finalize_flog {
+ my ($cluster) = @_;
+ `mv $LOGDIR/$cluster.$FLOG.tmp $LOGDIR/$cluster.$FLOG`;
+}
+
+sub alog {
+ my ($source, $dest, $message, $rc) = @_;
+ chomp ($source, $dest, $message, $rc);
+
+ #print"Master DB: $source Slave DB: $dest Message: [$message] RC=$rc\n";
+ apache_log("replication.log", "Master DB: $source Slave DB: $dest Message: [$message]", $rc);
+}
+
+sub apache_log {
+ my ($logfile, $request, $status) = @_;
+ my $date = `date`;
+ chomp($request, $status, $date);
+ my $LOGENTRY ="$HOST $IDENT $USER [$date] \"$request\" $status $SIZE";
+ open(OUTPUT, ">>$LOGDIR/$logfile");
+ print OUTPUT $LOGENTRY, "\n";
+ close OUTPUT;
+}
- Previous message: [Slony1-commit] By cbbrowne: Additions and modifications to comments
- Next message: [Slony1-commit] By cbbrowne: Use $SEQUENCE_ID rather than $SEQID, which will become an
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-commit mailing list