Bug 167

Summary: Slow down of slon with large backlog
Product: Slony-I Reporter: Jan Wieck <jan>
Component: slonAssignee: Jan Wieck <jan>
Status: RESOLVED FIXED    
Severity: enhancement CC: jan.varga, slony1-bugs
Priority: low    
Version: devel   
Hardware: All   
OS: All   
URL: https://github.com/wieck/slony1-engine/commit/cfdf5c511e390775fdeb43dc37b07dffc61a3b20
Bug Depends on:    
Bug Blocks: 175    

Description Jan Wieck 2010-12-02 08:25:37 PST
During catch up of a major backlog, the receiving slon slows down further and
further, sometimes to a point where it cannot keep up any longer with the rate
of new changes being logged.

This is caused by a problem in the query from the remoteWorker selecting from
sl_log. The lower boundary for the log tuples is the min_xid of the first
snapshot OR the xip-vector of the first snapshot. That OR clause causes the
planner to fail to create a lower scankey for the index scan against sl_log. It
therefore always scans from the beginning with a filter condition. At some
point this becomes so expensive that it even switches to a seqscan.

The attempted fix for this will be to extract those transactions, that were in
progress at the first snapshot and are no longer in progress at the second, out
into another UNION. This will allow the plan to use a lower bound scankey for
the bulk of the log rows and pull the few transactions from the current OR
clause in with a nested loop of index scans.
Comment 1 Jan Wieck 2010-12-08 07:20:44 PST
I have combined the EXPLAIN_INTERVAL feature with the new log selection query
into one commit for review here:

https://github.com/wieck/slony1-engine/commit/cfdf5c511e390775fdeb43dc37b07dffc61a3b20

The new query seems to be completely independent of the sl_log_* size with
respect to the delay for the first row. Since the plan ALWAYS requires a Sort
step at the top and that sort does not include wide rows, we can assume that
the entire result set at that moment is in local memory of the backend unless
there is a mismatch between workmem and max sync group settings.

The delay for the first row is now consistently only a small fraction of the
entire processing time for a sync group. This completely obsoletes any need for
sync pipelining. 

As a side note, I was curious if Londiste/SkyTools is suffering from the same
problem and was surprised that their logic already resembles what I came up
with. They have a stored PL/pgSQL procedure that assembles the query while we
do it in C, but the query structure is very similar.
Comment 2 Christopher Browne 2010-12-08 12:26:17 PST
Relevant Skytools code:
https://github.com/cbbrowne/skytools-dev/blob/master/sql/pgq/functions/pgq.batch_event_sql.sql
Comment 3 Christopher Browne 2010-12-08 15:08:01 PST
Per the verbal discussion, I'd be keen to see an alternative implementation
that pushes generation of the query(queries?) into a pl/pgsql function.  

That would eliminate a lot of "string munging" on the C side (and be analagous
to what the Skytools code does), which has a fair likelihood of shortening the
resultant code and making it more maintainable.

It's possible it won't be an improvement, but it sure would be nice to try to
head down the "more maintainable" path.
Comment 4 Steve Singer 2010-12-14 10:56:08 PST
(In reply to comment #1)
> I have combined the EXPLAIN_INTERVAL feature with the new log selection query
> into one commit for review here:
> 
> https://github.com/wieck/slony1-engine/commit/cfdf5c511e390775fdeb43dc37b07dffc61a3b20
> 

My initial review is here

remote_worker.c:4008
1) This switch statement , switch(provider->log_status) 
a) Inside of it we reach 5 levels of indentation. I think the code
will be easier to read/follow if we some of this into a separate
function

b) From what I can tell we have a switch(provier->log_status) with
the same code for 3 case statements (0,2,3) and those are the only
case statements in the switch block that ends at line 4144.

There is then a second switch(provider->log_status) with three more
statements of the same form.

if(provider->log_status==0 || provider->log_status==2 ||
provider->log_status==3)
would be a lot easier to read/follow. This might even make the 
formatting easier to read to avoid (a)

However I thik you can go further and create a function that takes in it
an argument (1 or 2) along with a pointer to provider_query and
appends the parts to it using either sl_log_%d (sl_log_1 or sl_log_2)
as required.  There is a lot of code duplication here that I think can
be eliminated.

Once the duplication has been eliminated I'll take another look.

I agree with Chris that a plpgsql implementation might be worth trying.
Comment 5 Jan Wieck 2010-12-22 13:33:13 PST
I've created a new version of this patch that eliminates the duplicate code by
using a for() loop that processes both log tables.

https://github.com/wieck/slony1-engine/tree/explain_and_bug167_v2
Comment 6 Christopher Browne 2011-01-04 08:13:04 PST
(In reply to comment #5)
> I've created a new version of this patch that eliminates the duplicate code by
> using a for() loop that processes both log tables.
> 
> https://github.com/wieck/slony1-engine/tree/explain_and_bug167_v2

I ran my suite of regression tests against this branch, successfully.

I was *not* testing for backlog; didn't impose heavy load.  

Rather, I was looking to see if any regressions were visible, and, happily,
none were found.
Comment 7 Steve Singer 2011-01-04 12:50:56 PST
I've looked at the 3a7def328cd0c7b9fef956a4d27f12213639de49 version.


The code is much clearer compared to the previous version.


The only issue I've been able to see is 

remote_worker.c:3874
provider_query should be freed in the if that returns on tuples not okay. The
same thing a few lines below in the second if on PQntuples if block.
Comment 8 Jan Wieck 2011-01-27 05:46:29 PST
Actually no, the provider_query points to the dstring helper_query inside the
provider struct, which is initialized when the provider struct is allocated and
only reset as long as that exists. Freeing that dstring would cause a dangling
pointer problem.
Comment 9 Jan Wieck 2011-01-27 06:53:08 PST
Version 2 of the patch including new EXPLAIN feature committed to HEAD.