[Slony1-general] delete behavior

Sat Jul 29 21:23:06 PDT 2006

> Hi Everyone,
>
> Has any thought had gone into improving performance with large table
> deletes?  Let me explain our daily operation:
>
> 1)  Run throughout the day accumulating data in several tables that are
> replicated for disaster recovery.
> 2)  After the system shuts down, the data is archived.
> 3)  The tables are then cleared out, i.e. delete from table;
>
> Currently, because of delete being triggered on each row, you end up with
> a ton of individual row deletes that need to propagate.  Is there any way
> to detect a large group of deletes from the sl_log tables and possibly
> propagate the delete with a range?
>
> Thanks!

That sounds like a pretty neat idea.  I wouldn't think we could readily
get *ALL* the way there, e.g. for a deletion to become, on the subscriber,
"delete from foo where id in ([full set of values]);"

Supposing we had a single column primary key, it ought to be possible for
the slon to recognize:

 - I'm issuing "delete from foo where id = x;"

 - Hmm.  The next statement is "delete from foo where id = y;"

 - That could be folded into the statement:
     delete from foo where id in (x, y);

 - If we keep deferring, we might discover that the next statement is
     delete from foo where id = z, which could turn that into...

     delete from foo where id in (x,y,z);

Note that none of that requires *enormous* intelligence, and it does NOT
assume that x, y, and z are consecutive values, just that they were
requested consecutively.

Consider also that the "win" gets smaller each time.  Grouping 2 instead
of 1 should be HIGHLY helpful.  The "wins" would get more marginal each
time; for a big benefit, you have to double again.  This would suggest
that if the set size were allowed to "max out" at 32 or 64, that's
probably harvesting most of the possible benefit from the technique.

Alas, I'm not sure I know how to do this if a composite primary key is in
use, or, more particularly, if the big series of OR clauses that would
result will "play well" the way the "in clause" should.

e.g...  Is it any use to submit:

  delete from foo where (id1 = x1 and id2 = y1) or (id1 = x2 and id2 = y2)
or (id1 = x3 and id2 = y3) or ...
???

This would be a fairly significant restructuring of the update submission
loop, as this needs to defer all deletes.  Are you keen on prototyping
that?