Christopher Browne cbbrowne at afilias.info
Mon Mar 7 09:31:55 PST 2011
On Mon, Mar 7, 2011 at 9:36 AM, Jeff Amiel <jamiel at istreamimaging.com> wrote:
> PostgreSQL 8.4.6 on x86_64-pc-solaris2.11, compiled by GCC gcc (GCC) 3.4.3
> (csl-sol210-3_4-20050802), 64-bit
> Slony version 2.0.6
>
> Re-set up replication from scratch with one master node and 2 subscriber
> nodes.
> All went as expected except that the second subscriber node never lists the
> replication set (nor does it replicate) and the sl_event table has a bunch
> of old entries in it from the time the replication was started.
> The first subscriber node is caught up and replicating nicely...sl_status
> shows it all caught up and up to date...but 21K lag events for the second
> subscriber (node 3).
>
> No errors in the logs....paths all look correct
>
> Any thoughts ?

I'll point you first to test_slony_state...

http://slony.info/documentation/2.0/monitoring.html

http://git.postgresql.org/gitweb?p=slony1-engine.git;a=blob;f=tools/test_slony_state-dbi.pl

This should give some interesting indications about cluster health,
and it sure sounds like something is off.

I'd be suspicious that that third node isn't communicating properly
with the other nodes.  test_slony_state may offer some ideas.

The thing to generally check is that event propagation is working.

One place I could see things getting "stuck" is that if there's an
outrageous number of outstanding events, the query that pulls events
might time out before completing, with the consequence that the node
can *never* get through event evaluation, and will never catch up.  A
browse of bugzilla isn't very helpful; the condition that I recall
mayn't not have been documented too clearly :-(.


More information about the Slony1-general mailing list