[Slony1-general] Slony Tuning, heavy slon backup every AM

Thu Aug 4 04:52:04 PDT 2016

> From: Tory M Blue <tmblue at gmail.com>
>To: Jan Wieck <jan at wi3ck.info> 
>Cc: slony <slony1-general at lists.slony.info>
>Sent: Wednesday, 3 August 2016, 21:45
>Subject: Re: [Slony1-general] Slony Tuning, heavy slon backup every AM
> 
>
>
>
>
>
>
>On Wed, Aug 3, 2016 at 12:30 PM, Jan Wieck <jan at wi3ck.info> wrote:
>
>
>>
>>
>>
>>On Wed, Aug 3, 2016 at 11:14 AM, Tory M Blue <tmblue at gmail.com> wrote:
>>
>>Hey folks
>>>
>>>Running into some more issues and i'm not finding a definitive tuning guide nor can I tell why i'm backing up and why it takes so long to catch up (so long is relative).
>>>
>>>
>>>Running Slony 2.2.3 (but this has been an issue for a long time).
>>>
>>>
>>>Image to show you where I am this AM , but this is a constant every AM.  I have large updates about 2.5 million between 12am and 4am, the job is done but it will take Slon a couple of hours to catch up.
>>
>>
>>Are there long running transactions that start around the time, the sl_log starts
>>growing? Any long running transaction prevents a log switch from finishing. 
>>
>>
>>
>>
>>Jan
>
>
>Hi Jan,
>
>
>There are really not many long queries during the big import, but things start getting longer when the slon logs have 10m+ rows in them.
>
>
>I actually log any transaction that takes over a second and there is nothing that stands out, nothing that runs all that long. During this period, I move my reporting to point to the master, because of the delay in the standby, so there are some longer report queries, but the longest is 16 minutes 
>

Do you see any long running copy statements from the slon on the subscriber?  If so one avenue to investigate is if there are any replicated tables with sub-optimal indexes on them.

I mention it purely because I've been bitten by this myself and seen very similar symptoms to what you're reporting here.  There's every chance this is way off the mark, but I think it's worth mentioning.

The issue I saw was down to a bunch of pretty ugly indexes created by an application, and the result was inaccurate selectivity estimates causing the planner to choose one of the ugly indexes instead of the primary key when applying the changes on the subscriber.  Multivariate planner statistics that PostgreSQL doesn't yet have might have mitigated this, but that's besides the point.

For example I'd see something like 200,000 records getting updated by a single update statement on the origin taking about 20 seconds, but when applied as 200,000 separate update statements on the subscriber it took more like 200 minutes.  An update using an index scan on the ugly index would take 62ms vs 0.1ms on the primary key, but outwardly all I saw until I enabled autoexplain was a long running copy.

Glyn