Christopher Browne cbbrowne at ca.afilias.info
Tue Jul 3 14:52:33 PDT 2007
Jeff Davis wrote:
> On Tue, 2007-07-03 at 12:33 -0400, Christopher Browne wrote:
>   
>> I'll see about doing an experiment on this to see if, for the DELETE
>> case, it seems to actually help.  It may be that the performance
>> effects are small to none, so that the added code complication isn't
>> worthwhile.
>>
>>     
>
> In a simple test I ran, DELETE of the entire 5M record table using
> sequential scan was MUCH faster (9.41s) than 5M individual DELETE
> statements in a single transaction (552.49s).
>
> 5M records is small enough to fit into memory. I expect the difference
> would be even greater when the index and table can't both fit into
> memory and the deletes are distributed randomly over the table.
>
> I think it is worth exploring ways of solving this problem. Right now
> slony is great for small inserts, updates, and deletes. But any large
> update/delete on the origin can cause the subscribers to fall way
> behind.
>   
The "handy" alternative test (which would be a good "smoke test" for 
whether it's worth bothering to put *any* effort into this) would be to 
try to do some partial groupings to see if they'd help.

Thus, if the whole delete goes across the range id = 0 thru id = 
5000000, then things to try would be (each case involving 1 transaction):

1.  Delete with the 5M individual DELETE statements.  (Which you found 
took 552.49s)
2.  Delete with 50K DELETE statements, each having a WHERE clause with 
100 items in it.
3.  Delete with 5K DELETE statements, each having a WHERE clause with 1K 
items in it.

If 2. or 3. come *way* closer to 9.41s, then it may be worth exploring 
the complexity of folding together adjacent deletes on the same table.  
There could also be a case made for trying sequential versus random 
orderings (e.g. - in the former case, each DELETE statement takes on a 
specific range of items whereas in the latter, each selects items more 
or less at random).

I'll see about constructing a series of tests like this; won't be 
running before I send this :-).  If you have time to generate 2. and/or 
3., on your system and get timings there, I'd be much obliged.

;;;; Here's some relevant code :-)
(format t "begin;")
(loop for i from 0 to 49999
  do (format t "delete from foo where ")
  (loop for j from 0 to 99
       do (format t "id=~D or " (+ j (* i 100))))
  do (format t "id=~D;~%" (* i 100)))
(format t "commit;")

I don't think we can save the full 543 seconds, but if we could save a 
good portion of it, it's worth trying to pursue...

>> Remember, the thought we started with was:
>>    "What if we could do something that would make mass operations less
>>     expensive?"
>>
>> I don't want to introduce anything that can materially increase
>> processing costs.
>>
>> The more intelligent we try to get, the more expensive the
>> logtrigger() function gets, and if the price is high enough, then we
>> gain nothing.
>>
>> The only "win" I see is if we can opportunistically join some
>> statements together.  If we have to make the log trigger function
>> universally *WAY* more expensive, well, that's a performance loss :-(.
>>     
>
> Is there any way that we could detect (even without 100% confidence)
> that a transaction is "big" and we should spend more effort trying to
> optimize it?
>
>   
Regrettably, no.  For us to switch over to a sort of log trigger that 
supports "doing something smarter" requires that we add in logic that 
will have some (definitely non-zero) cost any time it *isn't* 
worthwhile.  And "usual sorts of OLTP activity" will fall into the 
category where performance would be injured.
> I know that's a big project (in the general case), but there might be
> some simple things that would work.
>   
Well, the cases I suggested (2. and 3.) would fall into "simple cases".


More information about the Slony1-general mailing list