Fri Oct 3 10:40:45 PDT 2014
- Previous message: [Slony1-general] Slony 2.1.4 - Issues re-subscribing provider when origin down
- Next message: [Slony1-general] Slony 2.1.4 - Issues re-subscribing provider when origin down
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 10/03/2014 08:27 AM, Glyn Astill wrote:
> Hi All,
>
> I'm looking at a slony setup using 2.1.4, with 4 nodes in the following
> configuration:
>
> Node 1 --> Node 2
> Node 1 --> Node 3 --> Node 4
>
> Node 1 is the origin of all sets, and node 3 is a provider of all to
> node 4. What I'm looking to do is fail over to node 2 when both nodes 1
> and 3 have gone down.
>
> Is this possible?
Improvements with dealing with multiple nodes failing at once was one of
the big changes with 2.2
You might want to try something like
NODE 1 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5432
NODE 2 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5433
NODE 3 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5434
FAILOVER (
ID = 1, BACKUP NODE = 2);
SUBSCRIBE SET (ID = 1, PROVIDER = 2, RECEIVER = 4, FORWARD = YES);
DROP NODE (ID = 3, EVENT NODE = 2);
DROP NODE (ID = 1, EVENT NODE = 2);
But I haven't tried to setup a cluster in this configuration so I can't
say for sure if it will work or not. As a general comment I think
trying to reshape the cluster before the FAILOVER command will be
problematic.
When I started doing a lot of failover tests with 2.1 I discovered a lot
of cases that wouldn't work, or wouldn't work reliably. That lead to
major changes in the 2.2 for failover.
>
> In both a live environment that I've not had chance to move to 2.2 and
> my test environment I'm seeing the same issues, for my test environment
> the slonik script is:
>
> CLUSTER NAME = test_replication;
>
> NODE 1 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5432
> user=slony';
> NODE 2 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5433
> user=slony';
> NODE 3 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5434
> user=slony';
> NODE 4 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5435
> user=slony';
>
> SUBSCRIBE SET (ID = 1, PROVIDER = 2, RECEIVER = 4, FORWARD = YES);
> WAIT FOR EVENT (ORIGIN = 2, CONFIRMED = 4, WAIT ON = 2);
> SUBSCRIBE SET (ID = 2, PROVIDER = 2, RECEIVER = 4, FORWARD = YES);
> WAIT FOR EVENT (ORIGIN = 2, CONFIRMED = 4, WAIT ON = 2);
> SUBSCRIBE SET (ID = 3, PROVIDER = 2, RECEIVER = 4, FORWARD = YES);
> WAIT FOR EVENT (ORIGIN = 2, CONFIRMED = 4, WAIT ON = 2);
>
> DROP NODE (ID = 3, EVENT NODE = 2);
>
> FAILOVER (
> ID = 1, BACKUP NODE = 2
> );
>
> DROP NODE (ID = 1, EVENT NODE = 2);
>
> slonik is failing at the first subscribe set line as follows:
>
> $ slonik test.scr
> test.scr:8: could not connect to server: Connection refused
> Is the server running on host "localhost" (127.0.0.1) and accepting
> TCP/IP connections on port 5432?
> test.scr:8: could not connect to server: Connection refused
> Is the server running on host "localhost" (127.0.0.1) and accepting
> TCP/IP connections on port 5434?
> test.scr:8: could not connect to server: Connection refused
> Is the server running on host "localhost" (127.0.0.1) and accepting
> TCP/IP connections on port 5432?
> Segmentation fault
>
> I get the same behaviour until I bring node 1 back up, then the script
> almost succeeds, but for an error
> stating that a record in sl_event already exists:
>
> $ slonik ~/test.scr
> ~/test.scr:8: could not connect to server: Connection refused
> Is the server running on host "localhost" (127.0.0.1) and accepting
> TCP/IP connections on port 5434?
> waiting for events (1,5000000172) only at (1,5000000162) to be
> confirmed on node 4
> executing failedNode() on 2
> ~/test.scr:17: NOTICE: failedNode: set 1 has no other direct
> receivers - move now
> ~/test.scr:17: NOTICE: failedNode: set 2 has no other direct
> receivers - move now
> ~/test.scr:17: NOTICE: failedNode: set 3 has no other direct
> receivers - move now
> ~/test.scr:17: NOTICE: failedNode: set 1 has other direct
> receivers - change providers only
> ~/test.scr:17: NOTICE: failedNode: set 2 has other direct
> receivers - change providers only
> ~/test.scr:17: NOTICE: failedNode: set 3 has other direct
> receivers - change providers only
> NOTICE: executing "_test_replication".failedNode2 on node 2
> ~/test.scr:17: waiting for event (1,5000000175). node 4 only on
> event 5000000162
> NOTICE: executing "_test_replication".failedNode2 on node 2
> ~/test.scr:17: PGRES_FATAL_ERROR lock table
> "_test_replication".sl_event_lock,
> "_test_replication".sl_config_lock;select
> "_test_replication".failedNode2(1,2,2,'5000000174','5000000176'); -
> ERROR: duplicate key value violates unique constraint "sl_event-pkey"
> DETAIL: Key (ev_origin, ev_seqno)=(1, 5000000176) already exists.
> CONTEXT: SQL statement "insert into "_test_replication".sl_event
> (ev_origin, ev_seqno, ev_timestamp,
> ev_snapshot,
> ev_type, ev_data1, ev_data2, ev_data3)
> values
> (p_failed_node, p_ev_seqfake, CURRENT_TIMESTAMP,
> v_row.ev_snapshot,
> 'FAILOVER_SET', p_failed_node::text, p_backup_node::text,
> p_set_id::text)"
> PL/pgSQL function
> _test_replication.failednode2(integer,integer,integer,bigint,bigint)
> line 14 at SQL statement
> NOTICE: executing "_test_replication".failedNode2 on node 2
> ~/test.scr:17: waiting for event (1,5000000177). node 4 only on
> event 5000000175
> ~/test.scr:21: begin transaction; -
>
> After this sl_set on node 4 still has node 1 as the origin for one of
> the sets
> (Is this possibly becasuse I'm not waiting properly or waiting on the
> wrong node?):
>
> TEST=# table _test_replication.sl_set;
> set_id | set_origin | set_locked | set_comment
> --------+------------+------------+-------------------
> 2 | 1 | | Replication set 2
> 1 | 2 | | Replication set 1
> 3 | 2 | | Replication set 3
> (3 rows)
>
> I've attached the slon logs if that would provide any better insight.
>
> Any help would be greatly appreciated.
>
> Thanks
> Glyn
>
- Previous message: [Slony1-general] Slony 2.1.4 - Issues re-subscribing provider when origin down
- Next message: [Slony1-general] Slony 2.1.4 - Issues re-subscribing provider when origin down
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list