Mon Mar 3 05:02:20 PST 2008
- Previous message: [Slony1-general] After patching, still getting compilation error
- Next message: [Slony1-general] failover problems with 3 nodes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello. > >> [...] > >> Hey, I should test failover before updating to 1.2.13... > > > > I have some strange periodic problems with 'ACCEPT_SET - MOVE_SET or > > FAILOVER_SET not received yet - sleep' on 1.2.12 and 1.2.13. Looks > > similar to this one. > > > > I should try to downgrade to 1.2.11 and try if my 'move set' problems > > will disappear. Here is the initial problem description: > > http://lists.slony.info/pipermail/slony1-general/2008-February/007445.html > > There's something about this that isn't making sense... > > I just did a CVS diff between 1.2.11 and REL_1_2_STABLE, and didn't > see anything that ought to have anything to do with this. > > I haven't yet done any testing of this case, out of the samples > described; I intend to do so; but it's not making sense that changing > between 1.2.11 and 1.2.13 should make any difference in this... Sorry,I should have checked more carefully. I think this problem is not the difference of the version but "remoteWorkerThread" When the problem of 'ACCEPT_SET - MOVE_SET or FAILOVER_SET not received yet - sleep' occurs, the pg_lock table is as following. ---- testdb=# SELECT relname,granted,pid,mode from pg_locks as l , pg_class as c where c.oid = l.relation and locktype='relation'; relname | granted | pid | mode ----------------------------+---------+-------+--------------------- pg_class_oid_index | t | 15778 | AccessShareLock pg_class_relname_nsp_index | t | 15778 | AccessShareLock pg_locks | t | 15778 | AccessShareLock pg_class | t | 15778 | AccessShareLock sl_event | t | 15771 | AccessShareLock sl_event-pkey | t | 15771 | AccessShareLock sl_config_lock | f | 15770 | AccessExclusiveLock <-- attention! sl_config_lock | t | 15771 | AccessExclusiveLock ---- Next,I examined why two lock table sl_config_lock was executed. In the case of failover or move set, two events are generated. The one is "FAILOVER/MOVE_SET",the other is "ACCEPT_SET". Furthermore, "FAILOVER/MOVE_SET" event is executed by remoteWorkerThread_1 which INSERT INTO sl_event table. and "ACCEPT_SET" event is executed by remoteWorkerThread_2 which SELECT ev_type FROM sl_event. Both events lock sl_config_lock table as following. --- "begin transaction; set transaction isolation level serializable; lock table "_testdbcluster".sl_config_lock; --- if it is executed in order of remoteWorkerThread_1(INSERT) and remoteWorkerThread_2(SELECT), the problem doesn't occur as following. ----this is postgresql SQL-log SUCCESS CASE: attention pid=15407 --- 2008-03-03 18:56:15 JST[15407]LOG: statement: begin transaction; set transaction isolation level serializable; /* FAILOVER_SET */ lock table "_testdbcluster".sl_config_lock; 2008-03-03 18:56:15 JST[15408]LOG: statement: begin transaction; set transaction isolation level serializable; /* ACCEPT_SET */ lock table "_testdbcluster".sl_config_lock; 2008-03-03 18:56:15 JST[15407]LOG: statement: select "_testdbcluster".failoverSet_int(1, 2, 1, 16); notify "_testdbcluster_Event"; insert into "_testdbcluster".sl_event (ev_origin, ev_seqno, ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type , ev_data1, ev_data2, ev_data3 ) values ('1', '16', '2008-03-03 18:56:14.173481', '798269', '798271', '''798270''', 'FAILOVER_SET', '1', '2', '1'); insert into "_testdbcluster".sl_confirm (con_origin, con_received, con_seqno, con_timestamp) values (1, 3, '16', now()); commit transaction; ------------------------------- But, if it is executed in order of remoteWorkerThread_2(SELECT) and remoteWorkerThread_2(INSERT), we have 'ACCEPT_SET - MOVE_SET or FAILOVER_SET not received yet - sleep' loops. -- this is postgresql SQL-log FAILED CASE: attention pid = 15771 --- 2008-03-03 19:13:51 JST[15771]LOG: statement: begin transaction; set transaction isolation level serializable; /* ACCEPT_SET */ lock table "_testdbcluster".sl_config_lock; 2008-03-03 19:13:51 JST[15770]LOG: statement: begin transaction; set transaction isolation level serializable; /* FAILOVER_SET */ lock table "_testdbcluster".sl_config_lock; 2008-03-03 19:13:51 JST[15771]LOG: statement: select 1 from "_testdbcluster".sl_event where (ev_origin = 1 and ev_seqno = 22 and ev_type = 'MOVE_SET' and ev_data1 = '1' and ev_data2 = '1' and ev_data3 = '2') or (ev_origin = 1 and ev_seqno = 22 and ev_type = 'FAILOVER_SET' and ev_data1 = '1' and ev_data2 = '2' and ev_data3 = '1'); ---------------------------------------------- Because of "lock table sl_config_lock", remoteWorkerThread_1 cannot insert "FAILOVER/MOVE_SET" event into sl_event!! I think this is big bug. my env is Cent OS x86_64, DUAL-CORE cpu. Regards, -- SRA OSS, Inc. Japan Yoshiharu Mori <y-mori at sraoss.co.jp> http://www.sraoss.co.jp/
- Previous message: [Slony1-general] After patching, still getting compilation error
- Next message: [Slony1-general] failover problems with 3 nodes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list