[Slony1-general] Patch to have a defined table lock order

Mon Nov 26 09:31:04 PST 2007

Jacques Caron <jc at oxado.com> writes:
> At 16:55 26/11/2007, Christopher Browne wrote:
>>I also think it would be a necessary thing to, *before* adding it,
>>have an outline of a test case where it allows things to work where,
>>at present, things are broken.
>
> At the moment, locking order is purely random in many cases (in the
> "EXECUTE SCRIPT" case in particular), as there is no ORDER BY clause
> whatsoever. And since it takes a very a heavy-duty lock on all tables,
> anything doing a select on two or more tables while the locks are
> being taken has a 50% chance of generating a deadlock.

I changed this in CVS HEAD last month; I had indeed noticed the
absence of an ORDER BY clause...

http://lists.slony.info/pipermail/slony1-commit/2007-October/002059.html

http://main.slony.info/viewcvs/viewvc.cgi/slony1-engine/src/backend/slony1_funcs.sql?r1=1.121&r2=1.122

> A simple test case is probably: define a set with two tables and a few
> lines in each, replicate them, run two scripts that continuously do
> "BEGIN; SELECT * FROM table1; SELECT * FROM table2; COMMIT" and
> "BEGIN; SELECT * FROM table2; SELECT * FROM table1; COMMIT", and
> attempt an EXECUTE SCRIPT to add a column to table1. Repeat until it
> fails. If you don't get a deadlock at some point you're pretty darn
> lucky (or I'm pretty much unlucky, and so are a few other people who
> posted the same kind of issues on the list).
>
> The patch only makes the order be at least defined (tab_id), or
> user-definable (tab_lock_order), so that one can control it, and make
> sure it is compatible with the applications lock order (the above test
> would still fail, but the two selects in reverse order are just there
> because one can't know which of the two orders fails: only one is
> actually needed to generate the deadlock).

I have no disagreements with this...

>>The downside to it, as I see it, is that it adds a fair bit of
>>additional complexity when we don't have clear cases (heading back to
>>the "test case" thing) where it is known to help.
>
> I'm not sure I understand how a bunch of "ORDER BY"s in existing
> queries represent "a fair bit of additional complexity"?

Ah, but the thing is, the patch is only about 1/3 done, possibly less
than that.

-> The new column needs to be added to src/backend/slony1_base.sql so
   that a new instance gets created with the new column.

-> It's not obvious how this column is supposed to get configured.
   We presumably need to extend the slonik grammar to accomodate it in
   STORE TABLE.

-> There isn't any existing Slonik command to *alter* table
   configuration; at present, there isn't anything meaningful to do
   manipulate a table once it is replicating aside from moving the
   table from one set to another.  This seems to require a new command
   altogether.

-> It may be arguable that this could be manipulated totally via
   hand-crafted SQL, and therefore no modifications to Slonik are
   necessary.

-> Documentation?

   How the facility is intended to be used needs to be documented
   somewhere.

To some degree, I'm nit-picking on the src/backend/slony1_base.sql
part, but we can't just let half-done things into the code base.
Features do have to be sufficiently complete.  That's part of where
"peer review" comes in; the above 5 things are 5 places where I see
something missing that may be necessary.

- This is a "user-serviced" set of functionality, so we definitely do
  need to expose the functionality to users.  Gotta figure out how.

- We need to figure out whether slonik changes are necessary or not.

- And the functionality won't be visible to users until it is added to
  documentation.  I'll be "anal retentive" and claim that it's not
  done until it's documented.

>>My sense is that this may be fixing a really narrow problem, and it
>>would be unfortunate to add the fair bit of complexity to the codebase
>>to solve a truly rare problem.  I may be wrong; that's why "broad
>>consensus" represents more than just two opinions :-).
>
> I think anyone who has attempted to use EXECUTE SCRIPT on a fairly
> busy database (even if read-only) can testify that it's not an easy
> task.
>
> One thing that would actually add more complexity, but would probably
> be quite useful: add options to EXECUTE SCRIPT to only lock (and
> un-trigger/re-trigger) specific tables (i.e. the ones that are
> actually affected, but manually specified in the options of EXECUTE
> SCRIPT). No sense locking everybody, removing and restoring trigger
> all over the place, if you only change one table with no side effects
> (no constraints, triggers, etc.).

It would be a plenty interesting idea to try to make EXECUTE SCRIPT
'more intelligent' so that you could configure it to only lock/modify
those tables that need modifying.  Unfortunately, I'm not at all sure
how to describe this "intelligence," and getting from that to a usable
way of describing the necessary configuration.

The *real* answer to this is the move to PostgreSQL 8.3, with "Slony-I
CVS HEAD / Version 2," because it eliminates the need for EXECUTE
SCRIPT to drop/re-add the replication triggers.

Yes, that predicates a migration to PG 8.3, but in the absence of a
specific description of the EXECUTE SCRIPT options, it is not at all
obvious that getting to a "smart enough Slony-I" will be ready sooner
than getting to the "new, better way of Slony-I 2.0 + PG 8.3."

I don't have a good model in mind yet as to how to describe how to
control this "less-locking EXECUTE SCRIPT;" by the time we get there,
it may be way simpler to just say "let's migrate to 8.3 + 2.0."

> As I understand it, the real issue is just that HEAD is geared towards
> Postgresql 8.3 which has some feature that allows slony to be "more
> intelligent" regarding how schema updates are performed (though I
> haven't quite looked at how this is done yet, I really should), and
> Jan just is a bit concerned about changing the current release branch
> which he considers "stable" and hence should not get additional
> features.
>
> My concern is that I feel Postgresql 8.3 is still weeks/months away
> from being stable enough to put in critical production environments
> (those where you actually need Slony), and that not doing any further
> developments other than bug fixes in the current release branch may be
> problematic for many people in the same situation.

> Also, I consider adding missing "order by"s a bug fix rather than a
> new feature :-)

If you think so, then I'd call that an argument in favor of my
applying the following to the 1.2 branch...

<http://lists.slony.info/pipermail/slony1-commit/2007-October/002059.html>

I'd certainly be prepared to go that far in 1.2 :-).
-- 
let name="cbbrowne" and tld="linuxfinances.info" in String.concat "@" [name;tld];;
http://linuxfinances.info/info/sap.html
There are no "civil aviation for  dummies" books out there and most of
you would probably  be scared and spend a lot of  your time looking up
if there was one. :-) -- Jordan Hubbard in c.u.b.f.m