[Slony1-general] VALUES () ... ();

Fri Oct 26 07:44:33 PDT 2007

David Fetter <david at fetter.org> writes:
> On Fri, Oct 26, 2007 at 10:04:31AM -0400, Christopher Browne wrote:
>> Simon Riggs <simon at 2ndquadrant.com> writes:
>> > On Thu, 2007-10-25 at 19:59 -0700, David Fetter wrote:
>> >
>> >> Does Slony-I 2-to-be use INSERT ... VALUES (...), ..., (...); to
>> >> pass INSERTs?  It seems to me this would be quicker than a flock
>> >> of single-tuple INSERTs, and since we're having flag day with
>> >> 8.3, we're guaranteed the feature is there. :)
>> >
>> > It's not easily possible to do this because each invocation of the
>> > log trigger has a different set of values and they can't see each
>> > others values.
>> >
>> > It could be possible to do this on the apply side by saving up
>> > data from log table row to row until we can issue a single
>> > mega-INSERT. But if we were going to do that, why not initiate a
>> > COPY, which would handle the bulk cases even better? The
>> > difficulty would be in deciding what the logic should be to invoke
>> > the special case SQL construction.
>> >
>> > Another thought: I notice that Slony writes out all of the column
>> > names for each insert e.g. (col1, col2, col3) VALUES (x, y, x)
>> 
>> I put together a rough proposal a while back as to handling this
>> more efficiently.
>> 
>> The idea would be to detect consecutive updates to the same table,
>> and rewrite INSERTs to use multiple VALUE subclauses, as well as to
>> detect multiple UPDATEs involving the same SET clause, and fold
>> together the WHERE clauses.  DELETE has an obvious optimization too.
>
> You mean piling together the PKs and doing a DELETE .. WHERE (pk,
> fields) = ANY(...) ?

Right.

>> It adds a fair bit of complexity, and proably won't help OLTP traffic
>> terribly much (where it is common to intersperse tables heavily).  I
>> don't think it's the "big Win," though.
>> 
>> > It would be straightforward to remove the (col1, col2, col3) text from
>> > each INSERT statement since that is optional. That would reduce the
>> > overhead of each INSERT row and reduce the parsing time on the other
>> > side also.
>> 
>> Very unsafe.  What if the subscriber decides to consider the columns
>> to be in a different order?  Do we need to go back and ask for the
>> "reordering columns" feature that periodically pops up?

FYI, there is a further downside to the removal: It means you cannot
have any case where you "hack" a subscriber to have an extra column.
It also breaks the case where you use log shipping to generate a
temporal database, where there are additional temporal columns.

>> I see a much bigger win in Jan's idea to use COPY to get sl_log_n
>> data to the subscriber As Fast As Parsing Allows, and then use rules
>> on sl_log_n to generate INSERT/UPDATE/DELETE requests on the
>> subscriber to do the work.  That would take a lot of the load off
>> the provider, and COPY seems likely to be way faster than other
>> rewritings.
>
> That's a very interesting idea, and kinda orthogonal to the
> INSERT/UPDATE/DELETE speedups above.  How big a change would it be?

Fairly substantial.

It requires a pretty sophisticated stored procedure to run on the
subscriber that does the remapping of sl_log_n data into actual
updates.

It would have the interesting feature of meaning that the
configuration that controls table names lives on the subscriber, not
the provider, and perhaps give some extra flexibility there.  

Log shipping nodes would need to be aware of tables (e.g. - it would
need to track sl_table).

At this point, it's just a "gedanken experiment."
-- 
"cbbrowne","@","cbbrowne.com"
http://linuxfinances.info/info/sap.html
"The Linux  philosophy is laugh in  the face of  danger.  Oops.  Wrong
One.  'Do it yourself.'  That's it."  -- Linus Torvalds