Christopher Browne cbbrowne at ca.afilias.info
Mon Mar 24 08:01:47 PDT 2008
"Henry" <henry at zen.co.za> writes:
> Hello once again,
>
> Let's say you have many slaves being rep'd from a master.  Sometimes, one
> of these slaves will fall behind in a big way.  Even stopping all activity
> on all systems to allow it to catch up doesn't resolve the problem.
>
> My question is the following:  from an admin point of view in trying to
> resolve this kind of issue, what slony tables should I poke around in (and
> what flag/s should I take note of), and what errors/footprints should I
> look for in the slony logs which might be contributing to the node in
> question never catching up?
>
> My (horribly noob) solution so far has been to stop everything, drop
> replication systems from all nodes, and start again (a process which can
> throw a week in the drain).
>
> Pointers and/or suggestions would be welcomed.  I've plodded through the
> docs, but the obvious isn't jumping out at me, and my stupid approach to
> solving the problem is wasting eons of time each time this occurs.

Step 0.

Run test_slony_state.pl / test_slony_state-dbi.pl (depending on
whether you prefer Perl Pg or Perl DBI::Pg).

That does quite a bit of analysis that should be helpful in figuring
out where problems may lie.

[This is step 0, not step 1, because Best Practices indicate that you
should set up every replication cluster to run
test_slony_state-dbi.pl/test_slony_state.pl frequently, likely
hourly.]

If you look at that script, and look at the tests that it runs to
check the health of the cluster, that should be at least somewhat
helpful in figuring out some of the things that can go wrong.
-- 
let name="cbbrowne" and tld="acm.org" in String.concat "@" [name;tld];;
http://cbbrowne.com/info/finances.html
Would-be National Mottos:
Switzerland: "You wouldn't hit a country that's neutral, would you?"


More information about the Slony1-general mailing list