[Slony1-general] VALUES () ... ();

Fri Oct 26 07:33:27 PDT 2007

Christopher Browne a écrit :
> Simon Riggs <simon at 2ndquadrant.com> writes:
>   
>> On Thu, 2007-10-25 at 19:59 -0700, David Fetter wrote:
>>
>>     
>>> Does Slony-I 2-to-be use INSERT ... VALUES (...), ..., (...); to pass
>>> INSERTs?  It seems to me this would be quicker than a flock of
>>> single-tuple INSERTs, and since we're having flag day with 8.3, we're
>>> guaranteed the feature is there. :)
>>>       
>> It's not easily possible to do this because each invocation of the log
>> trigger has a different set of values and they can't see each others
>> values.
>>
>> It could be possible to do this on the apply side by saving up data from
>> log table row to row until we can issue a single mega-INSERT. But if we
>> were going to do that, why not initiate a COPY, which would handle the
>> bulk cases even better? The difficulty would be in deciding what the
>> logic should be to invoke the special case SQL construction.
>>
>> Another thought: I notice that Slony writes out all of the column names
>> for each insert e.g. (col1, col2, col3) VALUES (x, y, x)
>>     
>
> I put together a rough proposal a while back as to handling this more
> efficiently.
>
> The idea would be to detect consecutive updates to the same table, and
> rewrite INSERTs to use multiple VALUE subclauses, as well as to detect
> multiple UPDATEs involving the same SET clause, and fold together the
> WHERE clauses.  DELETE has an obvious optimization too.
>
> It adds a fair bit of complexity, and proably won't help OLTP traffic
> terribly much (where it is common to intersperse tables heavily).  I
> don't think it's the "big Win," though.
>
>   
>> It would be straightforward to remove the (col1, col2, col3) text from
>> each INSERT statement since that is optional. That would reduce the
>> overhead of each INSERT row and reduce the parsing time on the other
>> side also.
>>     
>
> Very unsafe.  What if the subscriber decides to consider the columns
> to be in a different order?  Do we need to go back and ask for the
> "reordering columns" feature that periodically pops up?
>
> I see a much bigger win in Jan's idea to use COPY to get sl_log_n data
> to the subscriber As Fast As Parsing Allows, and then use rules on
> sl_log_n to generate INSERT/UPDATE/DELETE requests on the subscriber
> to do the work.  That would take a lot of the load off the provider,
> and COPY seems likely to be way faster than other rewritings.
>   

AFAIS, what really dammage replication is big update.
Because of a lot of row are added on sl_log and there is some 
seqscan=off in slony code wich burn the server (in 
src/slon/remote_worker.c). (I have also try to change the cursor size 
(get more or less lines a time, is there some doc about that ?)

Can it be possible to catch when an update/insert/delete change more 
than XX lines, and then apply different strategy ? (insert values () () 
() is not as fast as copy, but is several times quickest than insert 
values () ).