Christopher Browne cbbrowne
Wed Feb 22 13:34:38 PST 2006
lkv <lkv at defx.org> writes:
>
> I'm observing something odd, on the master I see a huge chunk of
> st_lag_num_events (~190000) and on the same slave thats behind the
> number is about 3200 and growing. I increased the sync interval to
> about 60 seconds and increased the group max size to about 1000.
>
> These numbers are not going down, even after I restarted the slon daemons.
>
> However these sync events are still outstanding. Is there anyway I can
> flush them? And can anyone give me a hint what might have caused that?
> (all the outstanding events are of type SYNC)

"Flushing" outstanding SYNC events would amount to abandonment of the
node.  You MUST apply ALL (that is, "each and every") SYNC event from
the origin to each of the subscribers.  So unless you're planning on
abandoning replication of that node, you should put "flush" thoughts
out of your head.

The *real* question is whether or not that slave node is actually
processing SYNC events at all.  

If it isn't, then you need to know why it isn't; that's some sort of
problem that is altogether preventing replication.  We don't know what
that problem is...  Perhaps some authentication problem is preventing
connections from going thru; that's easy to fix, if you know that's
the case.

If SYNC events *are* being processed, but not fast enough, there are a
few reasons why this can happen that might be resolvable.  (e.g. -
pg_listener has grown big, which is solved by doing a VACUUM FULL on
it, and verifying that it is small again.)  It is also possible that
the node is behind by too much, it might be more effective to abandon
the node and recreate it from scratch.

But you'll need to dig into the logs in order to figure any of this
out.
-- 
(format nil "~S@~S" "cbbrowne" "ca.afilias.info")
<http://dba2.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)



More information about the Slony1-general mailing list