Jacques Caron jc at oxado.com
Mon Nov 26 08:35:30 PST 2007
At 16:55 26/11/2007, Christopher Browne wrote:
>I also think it would be a necessary thing to, *before* adding it,
>have an outline of a test case where it allows things to work where,
>at present, things are broken.

At the moment, locking order is purely random in many cases (in the 
"EXECUTE SCRIPT" case in particular), as there is no ORDER BY clause 
whatsoever. And since it takes a very a heavy-duty lock on all 
tables, anything doing a select on two or more tables while the locks 
are being taken has a 50% chance of generating a deadlock.

A simple test case is probably: define a set with two tables and a 
few lines in each, replicate them, run two scripts that continuously 
do "BEGIN; SELECT * FROM table1; SELECT * FROM table2; COMMIT" and 
"BEGIN; SELECT * FROM table2; SELECT * FROM table1; COMMIT", and 
attempt an EXECUTE SCRIPT to add a column to table1. Repeat until it 
fails. If you don't get a deadlock at some point you're pretty darn 
lucky (or I'm pretty much unlucky, and so are a few other people who 
posted the same kind of issues on the list).

The patch only makes the order be at least defined (tab_id), or 
user-definable (tab_lock_order), so that one can control it, and make 
sure it is compatible with the applications lock order (the above 
test would still fail, but the two selects in reverse order are just 
there because one can't know which of the two orders fails: only one 
is actually needed to generate the deadlock).

>The downside to it, as I see it, is that it adds a fair bit of
>additional complexity when we don't have clear cases (heading back to
>the "test case" thing) where it is known to help.

I'm not sure I understand how a bunch of "ORDER BY"s in existing 
queries represent "a fair bit of additional complexity"?

>My sense is that this may be fixing a really narrow problem, and it
>would be unfortunate to add the fair bit of complexity to the codebase
>to solve a truly rare problem.  I may be wrong; that's why "broad
>consensus" represents more than just two opinions :-).

I think anyone who has attempted to use EXECUTE SCRIPT on a fairly 
busy database (even if read-only) can testify that it's not an easy task.

One thing that would actually add more complexity, but would probably 
be quite useful: add options to EXECUTE SCRIPT to only lock (and 
un-trigger/re-trigger) specific tables (i.e. the ones that are 
actually affected, but manually specified in the options of EXECUTE 
SCRIPT). No sense locking everybody, removing and restoring trigger 
all over the place, if you only change one table with no side effects 
(no constraints, triggers, etc.).

As I understand it, the real issue is just that HEAD is geared 
towards Postgresql 8.3 which has some feature that allows slony to be 
"more intelligent" regarding how schema updates are performed (though 
I haven't quite looked at how this is done yet, I really should), and 
Jan just is a bit concerned about changing the current release branch 
which he considers "stable" and hence should not get additional features.

My concern is that I feel Postgresql 8.3 is still weeks/months away 
from being stable enough to put in critical production environments 
(those where you actually need Slony), and that not doing any further 
developments other than bug fixes in the current release branch may 
be problematic for many people in the same situation.

Also, I consider adding missing "order by"s a bug fix rather than a 
new feature :-)

Jacques.



More information about the Slony1-general mailing list