[Slony1-general] how to troubleshoot slony replication lag?

Mon Nov 15 05:41:17 PST 2010

On Mon, 15 Nov 2010, Aleksey Tsalolikhin wrote:

> On Sun, Nov 14, 2010 at 4:41 PM, Aleksey Tsalolikhin
> <atsaloli.tech at gmail.com> wrote:
>> Dear Steve,
>>
>> Thanks for your reply, here is the information you requested.
>
> message did not go through due to large size (over 40 KB)
>
> Here's a few seconds of master/slave logs (from the same time window):
>
> http://pastebin.com/JxmrPfdY
>
> looks like the slave is not handling the events from the master?

I think your slave is trying to process a very lage SYNC from the master. 
We see the remote worker fetch groups of 100 rows and the remote worker 
appears to be processing them (based on lines like

DEBUG4 remoteWorkerThread_1: returning lines to pool

How many rows would be involved in teh largest transaction you (or your 
application) would have executed on the master?

>
> at this point replication lag is 35+ hours and the sl_log_1 and
> sl_log_2 total 43 GB (on an originally 23 GB database, now 65 GB in
> size including sl_log_1 and sl_log_2)

The fact that the sl_log's grow to such a large size makes implies that you 
have a very active database.

>
> we are going to take an emergency maintenance window to upgrade from
> slony-I 1.2.20 to 1.2.21 and resync, see how it looks then.
>

I don't think the upgrade is going to help you per say.  Resyncing will at 
least get your replication caught up again.

> Aleksey
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general
>