[Slony1-general] Slony & Locking

Thu Aug 19 23:57:40 PDT 2010

> Robert Collins wrote
>
>> To provide a little more context here; we take up to an hour to
>> gracefully quiesce all our DB clients, apply a DDL change, and bring
>> our systems back up. Thats a lot of downtime for something that we'd,
>> in principle, like to do when code changes are ready; because it takes
>> so long (vs say, 5-10 seconds) we need to schedule downtime weeks in
>> advance, and that pushes complexity into our development process as
>> well because we have to choose between several week latency, large
>> branches, or having two trunks (the one with the next rollout schema,
>> and the one with the current schema).
>
> I think you first need to figure out what slony version your going to be
> using.  If you're using 2.0.x then you are not in bad shape.   If you are
> using 1.2 then I think your efforts are better spent thinking about what can
> get you onto 2.0.

We just finished getting ourselves to 1.2.21 (from 1.2.15) as a step
in our Ubuntu 10.04 LTS (Lucid) and PostgreSQL 8.4 upgrades. Once we
are on Lucid, we can look at 2.0. Back at the dawn of time when this
process started, 2.0 wasn't stable enough for me and would have
involved back porting Debian experimental packages to both Lucid and
Ubuntu 8.04 LTS (Hardy).

> With 2.0.x  you can do DDL changes to tables via execute script without any
> slony introduced locking.  The caveat being that if your application is
> performing insert/update/delete operations on the tables your DDL is
> changing then you might be in trouble (see the bug # Chris referenced for
> details)

We previously had much worse locking problems we eventually fixed by
migrating one of our replication sets to separate hardware - Slony 1.2
locking all tables rather than just the tables in the replication set
I was messing with caused me great trauma (much more than just letting
me shoot myself in the foot if I was silly enough to mess with tables
I hadn't told slony I was messing with).

> Slony 2.0.x has no safeguard to prevent/detect this situation, for 2.1 we
> would like to introduce something to either detect the conflict or at least
> make it more obvious to the DBA but this hasn't been thought of yet.

So the problem with needing locks on busy tables is they might never
be granted. A busy table will always have several connections with
open transactions that have read from it, and the exclusive lock will
never be granted because there are always conflicting locks in the
way. I suspect this needs extensions to PostgreSQL, so we can request
an exclusive lock in such a way that requests for other locks are
blocked. This isn't a slony issue even - the same issue applies when I
ALTER TABLE in an busy non-replicated environment.

As a work around, maybe there should be some way for my application to
detect that Slony wants to open some locks and block? If slony grabbed
an exclusive advisory lock before grabbing its exclusive table locks,
and released it when the table locks are granted, my application could
simply attempt at the start of the transaction to grab the same
exclusive advisory lock and release it straight away.

I do like the idea (later in this thread) of me explicitly listing
which resources to lock. Pulled from the top of the DDL or specified
separately is fine by me.

If you want to make my life easier let me specify the DDL inline in
the slonik script rather than needing a separate file. Oh - and an
option to only echo DDL to stdout if it failed (my daily staging
rebuild logs are very noisy). And a pony.

-- 
Stuart Bishop <stuart at stuartbishop.net>
http://www.stuartbishop.net/