(Newbie) RE: [Slony1-general] Diagram of internal workings

Thu Sep 30 02:28:20 PDT 2004

> Hi All,
>
> The overview is great!
>
> I am evaluating Slony as a potential replication solution for our
> production environment (Linux / Postgres 7.4.2) and I've been creeping
> around the past couple of weeks reading all the postings.  I have been
> successful (I think) in configuring our environment for a simple master
> --> single slave replication scenario.  We would ultimately want to be
> able to add slave nodes and possibly cascade the replication for disaster
> recovery purposes; most likely replicate to a backup machine also.
>
> Where I'm running into a little trouble is actually having a process for
> starting and monitoring the replication.  Someone had mentioned in a
> previous posting a request for Nagios monitoring, which (coincidentally)
> is what we're using here :-).  If I understand correctly, it is really
> just a matter of returning one of three values indicating service status:
> ok, warning, or fatal results.

I have got a trio of scripts that we are using to allow Nagios to monitor
a bunch of Slony=I instances...

1.  There's a process that, given the identity of one Slony node, looks
for all the active nodes in a set there, and then injects a test update to
go to all the nodes.  That runs every few minutes, and generates a report
consisting of a line for each node.

2.  There's a "controller" script that runs #1 several times, as we have
several Slony-I clusters.

3.  Lastly, there's a Nagios-oriented script where you specify the cluster
and the node, and it pulls back results for that node.  You'd set up one
of these for each node you want Nagios to monitor.

#1 and #3 would be of general interest; I'll see about adding this to CVS
some time soon.  I have two challenges there, namely to

 a) Clean up the code a bit;

 b) Come up with some alternative to some local application dependancies.

    Pointedly, since we're doing domain management, it seems a useful
    thing to report back the latest domain created by our application,
    as that boosts Nagios users' confidence in the output, and gives
    Nagios something honestly useful to display.  For someone at a
    library, it would be logical to report back something
    about the latest lending transaction.  My query won't be your
    query.

I'm busy on 1.0.3 matters 'til Friday; hopefully I can touch on this
then, and perhaps get it into 1.0.3.

> Can anyone give me some guidance in configuring the environment from an
> operations point-of-view?

Possibly I can draft a new staff member to do some secretarial work in
documenting some of what we're doing.

I suspect that the right answer on this might be if someone can put
together an outline for a document on this, and turn it into a Wiki so
that questions can be added and so that people can add tidbits rather than
feeling forced to write an 8 page report that they haven't time for right
now.  Furthermore, that allows people to add questions that need answers.

I've got a wiki running on my firewall at home that I could point people
to, though that, being an old P200, is probably not the most robust
location for it.  (And the portsentry's pretty paranoid, so you can easily
lock yourself out :-).)