Hannu Krosing hannu
Wed Jul 20 18:55:50 PDT 2005
On T, 2005-07-19 at 14:59 -0400, cbbrowne at ca.afilias.info wrote:
> > On Tuesday 19 July 2005 08:04, Hannu Krosing wrote:
> >> can I downgrade slony 1.1 to slony 1.0.5 just by installing the 1.0.5
> >> and running "update functions"
> >
> > That is a process I have never tried, my gut tells me that the only way to
> > downgrade is to uninstall node(s) and redo the whole shebang.
> 
> Actually, I can't think of a specific reason why it wouldn't be possible.
> The use of data structures hasn't changed in any material way.
> 
> I'd certainly want to test it before trying it "for real," but I don't
> remember anything making it downright implausible.
> 
> >> I'm getting massive lock contention on pg_listener on some busy
> >> databases.
> >
> > Can you provide a bit more detail on this ?  if this is a problem I'd like
> > to
> > see this resolved properly as 1.1 should be an improvement all the way
> > around.
> 
> That's an interesting claim, indeed.
> 
> I can't think of a reason for 1.1 to be way more aggressive at grabbing
> locks on pg_listener than 1.0.5.
> 
> The only thing that comes to mind is the "adaptive grouping," where, if
> things are running behind, things start with fairly small groups of SYNCs
> and work their way up.  That would lead to there being somewhat more
> events generated on subscriber nodes.  On the other hand, the smaller
> grouping should mean that locks are applied for shorter periods of time,
> which should make that more or less a "wash."
> 
> I'd be very interested in hearing how changes in 1.1 lead to "massive lock
> contention" that didn't take place in 1.0.5.

this is the situation at the moment:

db=# select relation,count(*) from pg_locks where not granted group by 1;
 relation  | count
-----------+-------
           |     1
 201110517 |     1
     16414 |    44
(3 rows)

relation 16414 is pg_listener

I have two slony daemons servicing this node, one of the daemons does
simple there and back replication (set1 is replicated from db1 to db2
and set2 from db2 to db1), another is doing more complicated things,
replicating about 10 sets between different configurations of five nodes
in that cluster.

This happens only after some load level is reached - on other nodes
slony1.1 runs mostly fine.

There are other factors that cause this load, but it manifests itself as
lots of locks on pg_listener and the top longes running queries are :

                                    query
|     runtime
---------------------------------------------------------------------------------+-----------------
listen "_balance_cluster_Event"; listen "_balance_cluster_Confirm";
listen "_bal | 00:01:40.370531
notify "_balance_cluster_Event"; notify "_balance_cluster_Confirm";
insert into  | 00:00:45.452506
notify "_balance_cluster_Event"; notify "_balance_cluster_Confirm";
insert into  | 00:00:43.700555
notify "_balance_cluster_Event"; notify "_balance_cluster_Confirm";
insert into  | 00:00:10.078385
notify "_balance_cluster_Event"; notify "_balance_cluster_Confirm";
insert into  | 00:00:07.619348
select "_balance_cluster".createEvent('_balance_cluster', 'SYNC', NULL);
| 00:00:06.956092
commit transaction;
| 00:00:06.941277

this is from pg_stat_activity, ordered by now()-query_start

It is possible that somehow slons using several sets (or several slon
demons) block each other out when doing listen/notify together with
something else in the same transaction

-- 
Hannu Krosing <hannu at skype.net>



More information about the Slony1-general mailing list