A tester has reported in In 2.1.0 beta3: A slonik script such as create set(id=2, origin=2 ,comment='second set'); create set(id=3, origin=3 ,comment='second set'); set add table(set id=2,origin=2, tables='disorder.do_item_*'); ---- can hang slonik where slonik keeps waiting for an event confirmation that it never sees. The issue seems to be that the add table code will: 1. Open a transaction and obtain a lock on sl_config_lock on the origin node (node 2) 2. perform an implicit 'wait for event' to ensure that the last event submitted to node 3 (the previous origin) has propagated. However because slonik has a lock on sl_config_lock on node 2, the slon for node 2 (the remote worker) might not be able to replicate events to node 2 (the store node) because applying a many events to node 2 requires obtaining the sl_config_lock (and sometimes the sl_event_lock). slonik should not be invoking the implicit 'wait for' behaviour when it is holding an open transaction.
Created attachment 118 [details] test case Test case bug
Created attachment 119 [details] fix for bug
Please arrange for a review to this fix
I don't know if this is related to this change, another recent change but I am occasionally (as in 1 or two tests will hit this through a full disorder run ) getting something like @5ef4b65d - 2011-08-08 14:37:58 EDTDEBUG2 remoteWorkerThread_4: Received event #4 from 5000000002 type:STORE_PATH 2011-08-08 14:37:58,028 [slonik add pathsstderr] ERROR info.slony.clustertest.testcoordinator.slony.SlonikScript slonik add pa ths:java.lang.UNIXProcess@5d79a22d - <stdin>:19: PGRES_FATAL_ERROR lock table "_disorder_replica".sl_event_lock, "_disorder_re plica".sl_config_lock;select "_disorder_replica".storePath(5, 2, 'dbname=test5 host=localhost port=5432 user=slon password=slo n', 10); - ERROR: duplicate key value violates unique constraint "sl_node-pkey" 2011-08-08 14:37:58,028 [slonik add pathsstderr] ERROR info.slony.clustertest.testcoordinator.slony.SlonikScript slonik add pa ths:java.lang.UNIXProcess@5d79a22d - CONTEXT: SQL statement "insert into "_disorder_replica".sl_node (no_id, no_active, no_co mment) values ( $1 , 'f', $2 )" 2011-08-08 14:37:58,028 [slonik add pathsstderr] ERROR info.slony.clustertest.testcoordinator.slony.SlonikScript slonik add pa ths:java.lang.UNIXProcess@5d79a22d - PL/pgSQL function "storenode_int" line 22 at SQL statement as I test this change.
Unless the whole sequence is done in a try block, I am wondering if the automatic wait caused by switching event nodes is happening at the wrong moment here. It appears that it takes place AFTER the transaction for the "set add table" has taken out the lock.
(In reply to comment #5) > Unless the whole sequence is done in a try block, I am wondering if the > automatic wait caused by switching event nodes is happening at the wrong moment > here. It appears that it takes place AFTER the transaction for the "set add > table" has taken out the lock. The point of the patch is supposed to avoid that. With this patch the intended behaviour is 1. Open a transaction against the new origin and lock sl_event_lock 2. Get ready to submit a SYNC event to the next origin. This triggers an implicit wait for waiting on the old origin until it sees the sl_confirm. 3. Once the old origin is caught up the SYNC is submtted to the new origin. 4. The transaction commits 5. Open a new transaction getting ready for the set add table. Steps 1-4 should ensure that everything is caught up before the transaction in step 5 is opened. Does my patch not do what I think? Or should I commit it
I'll take a peek.
Having some trouble "taking a look" - the patch refers to clustertest/disorder/tests/WaitForTest.js, which doesn't seem to be in any tree I can see. Possibly you've got a private branch that's not visible?
Created attachment 122 [details] create the WaitForTest
Sorry I have added the missing patch. You can also see this at https://github.com/ssinger/slony1-engine/tree/bug229
(In reply to comment #10) > Sorry I have added the missing patch. > You can also see this at https://github.com/ssinger/slony1-engine/tree/bug229 I had success with that... - Failures of the test without commit #7a8aa60e9a881e7df337ddf6c662a87801f6372b - Test succeeded with that patch.
This was fixed in 2.1.0 with commit 95d82dc387deaaa6d47bf97087dac32176abf648