[Slony1-general] Slony Tuning, heavy slon backup every AM

Thu Aug 4 12:02:58 PDT 2016

On Thu, Aug 4, 2016 at 10:29 AM, Tory M Blue <tmblue at gmail.com> wrote:
> On Thu, Aug 4, 2016 at 8:39 AM, Jan Wieck <jan at wi3ck.info> wrote:
>> Another problem we recently ran into is that after bulk operations, the
>> queries selecting the log data on the data provider can degrade. A
>> workaround
>> in one case was to ALTER the slony user and set work_mem to 512MB. The
>> work_mem of 100MB, used by the application, was causing the scans on
>> the sl_log table to become sequential scans.
>>
>> The problem is amplified if the tables are spread across many sets.
Merging
>> the sets into few larger ones causes fewer scans.
>>
>>
>> Regards, Jan
>>
>
> Unfortunately, I've never been able to migrate to a slony user, so
> slony runs as postgres and my setting are
>
> work_mem = 2GB
>
> maintenance_work_mem = 2GB
>
> So don't think I can test this theory.
>
> Also of note, we have 3 sets
>
> Same today, just don't see any long running queries where slony has no
> time to truncate the table, so I'm not clear what is preventing the
> slony table from being truncated. ( it really sounds like something
> that Jan has run into before, just a huge table and for some reason
> the truncate can't complete before another job or request comes in?
> Even now the slon table is over 12Million and I expect it to be able
> to truncate/clear in the next 30 minutes or so, but why.. the big bulk
> operation finishes at 5am, not sure why it takes another almost 6
> hours to truncate that table..
>
> Thanks again!
> Tory

Also interesting today, is that it's getting worse :) but the same pattern,
other then it's taking longer and today I saw something really weird.

slon ran a truncate but only cleared 50% of the sl_log1, How is that
possible, I mean a truncate is a truncate, I have no idea how this table
went from 14M to 7M

2016-08-04 10:43:03 PDT clsdb postgres 10.13.200.231(37864) 58874
2016-08-04 10:43:03.237 PDTNOTICE:  Slony-I: could not lock sl_log_2 -
sl_log_2 not truncated

2016-08-04 10:53:43 PDT clsdb postgres 10.13.200.231(37864) 58874
2016-08-04 10:53:43.573 PDTNOTICE:  Slony-I: log switch to sl_log_1
complete - truncate sl_log_2

2016-08-04 11:05:41 PDT clsdb postgres 10.13.200.231(37864) 58874
2016-08-04 11:05:41.750 PDTNOTICE:  Slony-I: Logswitch to sl_log_2 initiated

*2016-08-04 11:16:54 PDT clsdb postgres 10.13.200.231(37864) 58874
2016-08-04 11:16:54.783 PDTNOTICE:  Slony-I: log switch to sl_log_2
complete - truncate sl_log_1   <--- didn't actually "truncate" the table,
there is still 7.5M rows and yet it's moving ahead with logswitch to
sl_log_1 (which again was just supposedly truncated, had 14 million rows
but has 7.5 Million rows still)..*

2016-08-04 11:29:06 PDT clsdb postgres 10.13.200.231(37864) 58874
2016-08-04 11:29:06.332 PDTNOTICE:  Slony-I: Logswitch to sl_log_1 initiated

2016-08-04 11:40:43 PDT clsdb postgres 10.13.200.231(37864) 58874
2016-08-04 11:40:43.048 PDTNOTICE:  Slony-I: log switch to sl_log_1
complete - truncate sl_log_2

2016-08-04 11:51:44 PDT clsdb postgres 10.13.200.231(37864) 58874
2016-08-04 11:51:44.660 PDTNOTICE:  Slony-I: Logswitch to sl_log_2 initiated

So what I'm also noticing and maybe not related, but my standby node has
lots of locks and long running reports, I'm wondering if it's causing some
of the backup seen on the master.  So many questions still :))

Thanks for listening and being my wall to bounce stuff off of

Tory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.slony.info/pipermail/slony1-general/attachments/20160804/6d429798/attachment.htm