Tue Mar 7 14:30:37 PST 2006
- Previous message: [Slony1-general] "_Slony-I_test_cluster_rowID" column does not exist
- Next message: [Slony1-general] slon process fail to restart when dropping other node.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Using slony1-1.1.5-RC2, in testing failover, we are getting deadlock errors like the following from the db log: [ copy table statements ] 2006-03-07 15:06:28 EST-[29406] NOTICE: truncate of "monkeys"."session" succeeded 2006-03-07 15:06:28 EST-[29406] STATEMENT: select "_tzreplic".prepareTableForCopy(151); copy "monkeys"."session" ("id","accessadress","accessid","entryaddress","entryid","properties","s tarttime","updatetime") from stdin; 2006-03-07 15:06:28 EST-[29406] NOTICE: truncate of "monkeys"."vc" succeeded 2006-03-07 15:06:28 EST-[29406] STATEMENT: select "_tzreplic".prepareTableForCopy(152); copy "monkeys"."vc" ("id","used","capacity") from stdin; 2006-03-07 15:06:28 EST-[29406] NOTICE: truncate of "monkeys"."vp" succeeded 2006-03-07 15:06:28 EST-[29406] STATEMENT: select "_tzreplic".prepareTableForCopy(153); copy "monkeys"."vp" ("id","used","capacity") from stdin; 2006-03-07 15:06:31 EST-[29438] NOTICE: failedNode: set 1 has no other direct receivers - move now 2006-03-07 15:06:31 EST-[29438] STATEMENT: select "_tzreplic".failedNode(2, 3); 2006-03-07 15:06:37 EST-[29438] NOTICE: failedNode: set 118 has no other direct receivers - move now 2006-03-07 15:06:37 EST-[29438] STATEMENT: select "_tzreplic".failedNode(2, 3); 2006-03-07 15:06:38 EST-[29438] ERROR: deadlock detected 2006-03-07 15:06:38 EST-[29438] DETAIL: Process 29438 waits for ShareLock on transaction 3856; blocked by process 29406. Process 29406 waits for ShareLock on transaction 3851; blocked by process 29438. 2006-03-07 15:06:38 EST-[29438] CONTEXT: SQL statement "delete from "_tzreplic".sl_setsync where ssy_setid = $1 " PL/pgSQL function "failednode" line 115 at SQL statement 2006-03-07 15:06:38 EST-[29438] STATEMENT: select "_tzreplic".failedNode(2, 3); So, we establish replication for 2 sets, then we issue a slonik FAILOVER (pid 29438). The deadlock seems to be between the slonik connection and the remote_worker connection in the slon. Another example of the same type of problem: 2006-03-07 14:26:39 EST-[2629] NOTICE: failedNode: set 74 has no other direct receivers - move now 2006-03-07 14:26:39 EST-[2629] STATEMENT: select "_tzreplic".failedNode(1, 2); 2006-03-07 14:26:42 EST-[2629] NOTICE: failedNode: set 111 has no other direct receivers - move now 2006-03-07 14:26:42 EST-[2629] STATEMENT: select "_tzreplic".failedNode(1, 2); 2006-03-07 14:26:42 EST-[2629] NOTICE: failedNode: set 123 has no other direct receivers - move now 2006-03-07 14:26:42 EST-[2629] STATEMENT: select "_tzreplic".failedNode(1, 2); 2006-03-07 14:26:43 EST-[2629] NOTICE: failedNode: set 135 has no other direct receivers - move now 2006-03-07 14:26:43 EST-[2629] STATEMENT: select "_tzreplic".failedNode(1, 2); 2006-03-07 14:26:43 EST-[2635] NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=974 2006-03-07 14:26:43 EST-[2635] STATEMENT: select "_tzreplic".cleanupNodelock(); insert into "_tzreplic".sl_nodelock values ( 2, 0, "pg_catalog".pg_backend_pid()); 2006-03-07 14:26:44 EST-[2629] NOTICE: failedNode2(): faking FAILOVER_SET event 2006-03-07 14:26:44 EST-[2629] STATEMENT: select "_tzreplic".failedNode2(1,2,74,'650','651'); 2006-03-07 14:26:46 EST-[2629] NOTICE: failedNode2(): faking FAILOVER_SET event 2006-03-07 14:26:46 EST-[2629] STATEMENT: select "_tzreplic".failedNode2(1,2,111,'650','652'); 2006-03-07 14:26:46 EST-[2629] NOTICE: failedNode2(): faking FAILOVER_SET event 2006-03-07 14:26:46 EST-[2629] STATEMENT: select "_tzreplic".failedNode2(1,2,123,'650','653'); 2006-03-07 14:26:46 EST-[2629] NOTICE: failedNode2(): faking FAILOVER_SET event 2006-03-07 14:26:46 EST-[2629] STATEMENT: select "_tzreplic".failedNode2(1,2,135,'650','654'); 2006-03-07 14:26:48 EST-[2636] ERROR: deadlock detected 2006-03-07 14:26:48 EST-[2636] DETAIL: Process 2636 waits for ShareLock on transaction 613918; blocked by process 2629. Process 2629 waits for ExclusiveLock on relation 20809 of database 16384; blocked by process 2636. 2006-03-07 14:26:48 EST-[2636] STATEMENT: notify "_tzreplic_Event"; notify "_tzreplic_Confirm"; insert into "_tzreplic".sl_event (ev_origin, ev_seqno, ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type ) values ('1', '651', '2006-03-07 14:26:33.770196', '867522', '867523', '', 'SYNC'); insert into "_tzreplic".sl_confirm (con_origin, con_received, con_seqno, con_timestamp) values (1, 2, '651', now()); commit transaction; 2006-03-07 14:26:48 EST-[2636] LOG: unexpected EOF on client connection We didn't have the these sorts of deadlock issues with slony1-1.0.5, but we were also only using one replication set at that point, and postgres 7.4.5, so too many variables have changed, obviously. The common theme appears to be the interaction of the slonik failover command and the slon running on the node being failed-over-to.... Has anybody else run into this kind of behavior? - DAP ====================================================== David Parker Tazz Networks -------------- next part -------------- An HTML attachment was scrubbed... URL: http://gborg.postgresql.org/pipermail/slony1-general/attachments/20060307/ea3b8b25/attachment.html
- Previous message: [Slony1-general] "_Slony-I_test_cluster_rowID" column does not exist
- Next message: [Slony1-general] slon process fail to restart when dropping other node.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list