[Slony1-general] Feature Idea: improve performance when a large sl_log backlog exists

Tue Nov 23 06:48:33 PST 2010

On Tue, Nov 23, 2010 at 9:31 AM, Steve Singer <ssinger at ca.afilias.info> wrote:
> Slony can get into a state where it can't keep up/catch up with
> replication because the sl_log table is so large.
>
>
> Does this problem bite people often enough in the real world for us to
> devote effort to fixing?
>

It used to happen to me a lot when I had my origin running on spinning
media.  Ever since I moved to an SSD, it doesn't really happen.  At
worst when I do a large delete I fall behind by a few minutes but it
catches up quickly.  For me, it didn't even require taking the DB down
for any extended period.. just running a large update or delete that
touched many many rows (ie, generated a lot of events in sl_log) could
send the system into a tailspin that would take hours or possibly days
(until we hit a weekend) to recover.

I am not sure it was caused by the log being too big... because
sometimes reindexing the tables on the replica would clear up the
backlog quickly too.  But I may be sniffing down the wrong trail.