Thu Mar 9 12:51:37 PST 2006
- Previous message: [Slony1-general] recommended version (postgres)
- Next message: [Slony1-general] Slon not catching up
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
slony1-1.1.5-rc2 postgres 8.1.2 With multiple replication sets defined (typically 2-4), in testing failover, I am consistently getting deadlocks on the sl_setsync table between the slonik failover connection and the local connection used by remote_worker.c in the sync_event method. I have node A as the origin of, say, 3 sets, and node B has a subscription to each set. The deadlock errors are showing up in the db log on node B after I attempt run a slonik failover on node B. Is anybody else using multiple replication sets? We have to dynamically add schemas to our database, which need to be replicated. Rather than do a MERGE SET for each schema added, we created a separate replication set for each one. This now appears to have been a big mistake, and we are re-working the code to do a MERGE SET when we add new tables, but it seems that multiple replication sets should work. Here is yet another example from the log: 2006-03-09 10:01:36 EST-[19314] LOG: duration: 0.148 ms statement: begin transaction; set transaction isolation level serializable; 2006-03-09 10:01:36 EST-[19314] LOG: duration: 0.534 ms statement: select SSY.ssy_setid, SSY.ssy_seqno, SSY.ssy_minxid, SSY.ssy_maxxid, SSY.ssy_xip, SSY.ssy_action_list from "_tzreplic".sl_setsync SSY where SSY.ssy_seqno < '180' and SSY.ssy_setid in (271,388,400,412); 2006-03-09 10:01:36 EST-[19314] LOG: duration: 1.144 ms statement: select T.tab_id, T.tab_set, "_tzreplic".slon_quote_input('"' || PGN.nspname || '"."' || PGC.relname || '"') as tab_fqname from "_tzreplic".sl_table T, "pg_catalog".pg_class PGC, "pg_catalog".pg_namespace PGN where T.tab_set = 388 and PGC.oid = T.tab_reloid and PGC.relnamespace = PGN.oid; 2006-03-09 10:01:36 EST-[19314] LOG: duration: 0.908 ms statement: select T.tab_id, T.tab_set, "_tzreplic".slon_quote_input('"' || PGN.nspname || '"."' || PGC.relname || '"') as tab_fqname from "_tzreplic".sl_table T, "pg_catalog".pg_class PGC, "pg_catalog".pg_namespace PGN where T.tab_set = 400 and PGC.oid = T.tab_reloid and PGC.relnamespace = PGN.oid; 2006-03-09 10:01:36 EST-[19314] LOG: duration: 0.886 ms statement: select T.tab_id, T.tab_set, "_tzreplic".slon_quote_input('"' || PGN.nspname || '"."' || PGC.relname || '"') as tab_fqname from "_tzreplic".sl_table T, "pg_catalog".pg_class PGC, "pg_catalog".pg_namespace PGN where T.tab_set = 412 and PGC.oid = T.tab_reloid and PGC.relnamespace = PGN.oid; 2006-03-09 10:01:36 EST-[19314] LOG: duration: 4.225 ms statement: select T.tab_id, T.tab_set, "_tzreplic".slon_quote_input('"' || PGN.nspname || '"."' || PGC.relname || '"') as tab_fqname from "_tzreplic".sl_table T, "pg_catalog".pg_class PGC, "pg_catalog".pg_namespace PGN where T.tab_set = 271 and PGC.oid = T.tab_reloid and PGC.relnamespace = PGN.oid; 2006-03-09 10:01:36 EST-[19316] LOG: duration: 0.246 ms statement: start transaction;set transaction isolation level serializable;select last_value from "_tzreplic".sl_action_seq; 2006-03-09 10:01:36 EST-[19316] LOG: duration: 0.047 ms statement: rollback transaction; 2006-03-09 10:01:37 EST-[19316] LOG: duration: 0.267 ms statement: start transaction;set transaction isolation level serializable;select last_value from "_tzreplic".sl_action_seq; 2006-03-09 10:01:37 EST-[19316] LOG: duration: 0.050 ms statement: rollback transaction; 2006-03-09 10:01:37 EST-[19445] NOTICE: failedNode: set 388 has no other direct receivers - move now 2006-03-09 10:01:37 EST-[19445] STATEMENT: select "_tzreplic".failedNode(1, 2); 2006-03-09 10:01:38 EST-[19316] LOG: duration: 0.320 ms statement: start transaction;set transaction isolation level serializable;select last_value from "_tzreplic".sl_action_seq; 2006-03-09 10:01:38 EST-[19316] LOG: duration: 0.049 ms statement: rollback transaction; 2006-03-09 10:01:38 EST-[19445] ERROR: deadlock detected 2006-03-09 10:01:38 EST-[19445] DETAIL: Process 19445 waits for ShareLock on transaction 5609; blocked by process 19314. Process 19314 waits for ShareLock on transaction 5596; blocked by process 19445. 2006-03-09 10:01:38 EST-[19445] CONTEXT: SQL statement "delete from "_tzreplic".sl_setsync where ssy_setid = $1 " PL/pgSQL function "failednode" line 115 at SQL statement 2006-03-09 10:01:38 EST-[19445] STATEMENT: select "_tzreplic".failedNode(1, 2); 2006-03-09 10:01:38 EST-[19314] LOG: duration: 2599.338 ms statement: update "_tzreplic".sl_setsync set ssy_seqno = '180', ssy_minxid = '6111', ssy_maxxid = '6112', ssy_xip = '', ssy_action_list = '' where ssy_setid in (271,388,400,412) and ssy_seqno < '180'; 2006-03-09 10:01:38 EST-[19445] LOG: duration: 0.064 ms statement: rollback transaction; 2006-03-09 10:01:38 EST-[19314] LOG: duration: 0.238 ms statement: insert into "_tzreplic".sl_seqlog (seql_seqid, seql_origin, seql_ev_seqno, seql_last_value) values (0, 1, '180', '1000000000000000'); 2006-03-09 10:01:38 EST-[19314] LOG: duration: 3.842 ms statement: notify "_tzreplic_Event"; notify "_tzreplic_Confirm"; insert into "_tzreplic".sl_event (ev_origin, ev_seqno, ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type ) values ('1', '180', '2006-03-09 10:01:36.232092', '6111', '6112', '', 'SYNC'); insert into "_tzreplic".sl_confirm (con_origin, con_received, con_seqno, con_timestamp) values (1, 2, '180', now()); commit transaction; My interpretation of this log is that the sync_event in the remote_worker thread started running, then a slonik failover command came in, deadlocked and got kicked out, and the sync_event transaction was allowed to finish. Does that make sense? Thanks. - DAP ====================================================== David Parker Tazz Networks -------------- next part -------------- An HTML attachment was scrubbed... URL: http://gborg.postgresql.org/pipermail/slony1-general/attachments/20060309/85d3ad8e/attachment.html
- Previous message: [Slony1-general] recommended version (postgres)
- Next message: [Slony1-general] Slon not catching up
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list