During catch up of a major backlog, the receiving slon slows down further and further, sometimes to a point where it cannot keep up any longer with the rate of new changes being logged.
This is caused by a problem in the query from the remoteWorker selecting from sl_log. The lower boundary for the log tuples is the min_xid of the first snapshot OR the xip-vector of the first snapshot. That OR clause causes the planner to fail to create a lower scankey for the index scan against sl_log. It therefore always scans from the beginning with a filter condition. At some point this becomes so expensive that it even switches to a seqscan.
The attempted fix for this will be to extract those transactions, that were in progress at the first snapshot and are no longer in progress at the second, out into another UNION. This will allow the plan to use a lower bound scankey for the bulk of the log rows and pull the few transactions from the current OR clause in with a nested loop of index scans.
I have combined the EXPLAIN_INTERVAL feature with the new log selection query into one commit for review here:
The new query seems to be completely independent of the sl_log_* size with respect to the delay for the first row. Since the plan ALWAYS requires a Sort step at the top and that sort does not include wide rows, we can assume that the entire result set at that moment is in local memory of the backend unless there is a mismatch between workmem and max sync group settings.
The delay for the first row is now consistently only a small fraction of the entire processing time for a sync group. This completely obsoletes any need for sync pipelining.
As a side note, I was curious if Londiste/SkyTools is suffering from the same problem and was surprised that their logic already resembles what I came up with. They have a stored PL/pgSQL procedure that assembles the query while we do it in C, but the query structure is very similar.
Relevant Skytools code:
Per the verbal discussion, I'd be keen to see an alternative implementation that pushes generation of the query(queries?) into a pl/pgsql function.
That would eliminate a lot of "string munging" on the C side (and be analagous to what the Skytools code does), which has a fair likelihood of shortening the resultant code and making it more maintainable.
It's possible it won't be an improvement, but it sure would be nice to try to head down the "more maintainable" path.
(In reply to comment #1)
> I have combined the EXPLAIN_INTERVAL feature with the new log selection query
> into one commit for review here:
My initial review is here
1) This switch statement , switch(provider->log_status)
a) Inside of it we reach 5 levels of indentation. I think the code
will be easier to read/follow if we some of this into a separate
b) From what I can tell we have a switch(provier->log_status) with
the same code for 3 case statements (0,2,3) and those are the only
case statements in the switch block that ends at line 4144.
There is then a second switch(provider->log_status) with three more
statements of the same form.
if(provider->log_status==0 || provider->log_status==2 || provider->log_status==3)
would be a lot easier to read/follow. This might even make the
formatting easier to read to avoid (a)
However I thik you can go further and create a function that takes in it
an argument (1 or 2) along with a pointer to provider_query and
appends the parts to it using either sl_log_%d (sl_log_1 or sl_log_2)
as required. There is a lot of code duplication here that I think can
Once the duplication has been eliminated I'll take another look.
I agree with Chris that a plpgsql implementation might be worth trying.
I've created a new version of this patch that eliminates the duplicate code by using a for() loop that processes both log tables.
(In reply to comment #5)
> I've created a new version of this patch that eliminates the duplicate code by
> using a for() loop that processes both log tables.
I ran my suite of regression tests against this branch, successfully.
I was *not* testing for backlog; didn't impose heavy load.
Rather, I was looking to see if any regressions were visible, and, happily, none were found.
I've looked at the 3a7def328cd0c7b9fef956a4d27f12213639de49 version.
The code is much clearer compared to the previous version.
The only issue I've been able to see is
provider_query should be freed in the if that returns on tuples not okay. The same thing a few lines below in the second if on PQntuples if block.
Actually no, the provider_query points to the dstring helper_query inside the provider struct, which is initialized when the provider struct is allocated and only reset as long as that exists. Freeing that dstring would cause a dangling pointer problem.
Version 2 of the patch including new EXPLAIN feature committed to HEAD.