Glyn Astill glynastill at yahoo.co.uk
Mon Apr 28 06:56:16 PDT 2014
Hi All,

I'm testing the changes to failover in 2.2.2 and seem to be running into issues passing multiple nodes to failover.  In the following scenario with 4 nodes, node 2 is the origin of all sets and node 3 is a forwarding provider to node 4, i.e.

1 <---- 2 ----> 3 ----> 4

I'm attempting to fail over in a scenario where both nodes 2 and 3 have failed, so postgres is stopped for both of those nodes.  I'm running the following script:

CLUSTER NAME = test_replication;
NODE 1 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5432 user=slony';
NODE 2 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5433 user=slony';
NODE 3 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5434 user=slony';
NODE 4 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5435 user=slony';
FAILOVER (
    NODE = (ID = 2, BACKUP NODE = 1),
    NODE = (ID = 3, BACKUP NODE = 1)
);

However it would appear that slonik will wait indefinitely for node 4 to catch up via failed node 3:

$ slonik test.scr
test.scr:3: could not connect to server: Connection refused
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 5433?
test.scr:4: could not connect to server: Connection refused
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 5434?
executing preFailover(2,1) on 1
NOTICE: executing "_test_replication".failedNode2 on node 1
test.scr:6: NOTICE:  calling restart node 2
test.scr:6: waiting for event (1,5000000157).  node 4 only on event 5000000156
test.scr:6: waiting for event (1,5000000157).  node 4 only on event 5000000156
test.scr:6: waiting for event (1,5000000157).  node 4 only on event 5000000156
test.scr:6: waiting for event (1,5000000157).  node 4 only on event 5000000156
test.scr:6: waiting for event (1,5000000157).  node 4 only on event 5000000156
test.scr:6: waiting for event (1,5000000157).  node 4 only on event 5000000156
test.scr:6: waiting for event (1,5000000157).  node 4 only on event 5000000156
test.scr:6: waiting for event (1,5000000157).  node 4 only on event 5000000156
test.scr:6: waiting for event (1,5000000157).  node 4 only on event 5000000156
test.scr:6: waiting for event (1,5000000157).  node 4 only on event 5000000156
test.scr:6: waiting for event (1,5000000157).  node 4 only on event 5000000156
test.scr:6: waiting for event (1,5000000157).  node 4 only on event 5000000156
test.scr:6: waiting for event (1,5000000157).  node 4 only on event 5000000156
test.scr:6: waiting for event (1,5000000157).  node 4 only on event 5000000156
test.scr:6: waiting for event (1,5000000157).  node 4 only on event 5000000156


It'll only complete if I bring node 3 back up, which of course I couldn't do if it was really dead:

NOTICE: executing "_test_replication".failedNode3 on node 1

Have I totally got the wrong end of the stick here?

Thanks
Glyn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.slony.info/pipermail/slony1-general/attachments/20140428/f845cc3c/attachment-0001.htm 


More information about the Slony1-general mailing list