[Slony1-general] Replication problem

Thu Dec 8 14:31:52 PST 2005

> On 12/7/2005 9:23 PM, Peter Davie wrote:
>> Hi All,
>>
>> Using Slony1 version 1.1.0 at a customer site, the customer has had the
>> slon daemons fall over on one of their slave servers (and didn't
>> notice!)  On restarting the slon processes, there is now an error being
>> generated because it is attempting to malloc memory to record all of the
>> outstanding transactions and the slon daemon is running out of memory.
>> Is there any way forward to resolve this, or will I just have to
>> uninstall the slave and resubscribe (which is my current plan).
>
> This node must have been down for quite some time. A SYNC event in the
> remote_worker queue takes about 200 bytes or so. How many million events
> is this node behind? You could tell from looking at sl_status.
>
> And don't forget to VACUUM FULL ANALYZE that database after you've
> dropped that node.

Based on the symptoms, two things come to my mind:

1.  Did the slon controlling the origin die?  That would be the classic
way for a SYNC to encompass a Very Long Period Of Time and hence a LOT of
transactions.

There's a script in ~/tools that will generate SYNCs if you run it as a
cron job.  We run this in production so as to avoid this particular
problem...

2.  Is it possible that the subscriber is trying to process a whole bunch
of SYNCs in one fell group?

If you add the "-g 1" option, it'll go one SYNC at a time, which would
somewhat alleviate the problem.

3.  I guess there are 3 things :-).  If you do 2., then make sure there
are TWO indexes on sl_log_1, not just one.  (See
archives/FAQ/slony1_base.sql for details on the second index...)

If you just have 1 index on sl_log_1, it often doesn't get used, as it
often doesn't have the right "shape" for the query done on sl_log_1...