Jan Wieck JanWieck
Tue Nov 21 14:57:03 PST 2006
On 11/21/2006 5:06 PM, gurkan at resolution.com wrote:
> Hi,
> I am having problem with failover.
> This is my initial slonik script for slony-I replication (node1 master, node2
> and node3 slaves)
> 
> ----------------------
> cluster name = $CLUSTERNAME;
> node 1 admin conninfo = 'dbname=$MASTERDBNAME host=$MASTERHOST
> port=$MASTERDBPORT user=$REPLICATIONUSER';
> node 2 admin conninfo = 'dbname=$SLAVEDBNAME1 host=$SLAVEHOST1
> port=$SLAVEDBPORT1 user=$REPLICATIONUSER';
> node 3 admin conninfo = 'dbname=$SLAVEDBNAME2 host=$SLAVEHOST2
> port=$SLAVEDBPORT2 user=$REPLICATIONUSER';
> init cluster ( id=1, comment = 'Master Node');
> create set (id=1, origin=1, comment='All development tables');
> --set tables here
> store node (id=2, comment = 'Slave node 1');
> store path (server = 1, client = 2, conninfo='dbname=$MASTERDBNAME
> host=$MASTERHOST port=$MASTERDBPORT user=$REPLICATIONUSER');
> store path (server = 2, client = 1, conninfo='dbname=$SLAVEDBNAME1
> host=$SLAVEHOST1 port=$SLAVEDBPORT1  user=$REPLICATIONUSER');
> store listen (origin=1, provider = 1, receiver =2);
> store listen (origin=2, provider = 2, receiver =1);
> 
> store node (id=3, comment = 'Slave node 2');
> store path (server = 1, client = 3, conninfo='dbname=$MASTERDBNAME
> host=$MASTERHOST port=$MASTERDBPORT user=$REPLICATIONUSER');
> store path (server = 3, client = 1, conninfo='dbname=$SLAVEDBNAME2
> host=$SLAVEHOST2 port=$SLAVEDBPORT2  user=$REPLICATIONUSER');
> store listen (origin=1, provider = 1, receiver =3);
> store listen (origin=3, provider = 3, receiver =1);
> -----------------------
> 
> ------slave node 2 and vice versa node3 -- subscriber
> cluster name = $CLUSTERNAME;
> node 1 admin conninfo = 'dbname=$MASTERDBNAME host=$MASTERHOST
> user=$REPLICATIONUSER';
> node 2 admin conninfo = 'dbname=$SLAVEDBNAME1 host=$SLAVEHOST1
> user=$REPLICATIONUSER';
> subscribe set ( id = 1, provider = 1, receiver = 2, forward = yes);
> -------------------------
> I figured subscribe should be forward=yes. 
> initial replication works fine. but I cannot do failover (node1-master
> crashed,stopped edb,disk die ...)
> 
> 
> -----failover running on node2
> cluster name = $CLUSTERNAME;
> node 2 admin conninfo = 'dbname=$SLAVEDBNAME1 host=$SLAVEHOST1
> port=$SLAVEDBPORT1 user=$REPLICATIONUSER';
> node 3 admin conninfo = 'dbname=$SLAVEDBNAME2 host=$SLAVEHOST2
> port=$SLAVEDBPORT2 user=$REPLICATIONUSER';
> 
> store path (server = 2, client = 3, conninfo='dbname=$SLAVEDBNAME1
> host=$SLAVEHOST1 port=$SLAVEDBPORT1 user=$REPLICATIONUSER');
> store path (server = 3, client = 2, conninfo='dbname=$SLAVEDBNAME2
> host=$SLAVEHOST2 port=$SLAVEDBPORT2 user=$REPLICATIONUSER');
> store listen (origin=2, provider = 2, receiver =3);
> store listen (origin=3, provider = 3, receiver =2);
> 
> failover (id=1, backup node=2);
> -------------------------------
> 
> Could you give detail steps with my example if I am still missing something.
> When I run the failover above I got the message below.
> ---------------------
> [enterprisedb at baba2 bin]$ ./failover_on_node1.sh
> <stdin>:19: PGRES_FATAL_ERROR select "_edb_replication_example".failedNode(1,
> 2);  - ERROR:  Slony-I: cannot failover - node 3 has no path to the backup node

Your script appears to run too fast ;-)

Since you don't have the "emergency" paths between 2 and 3 in place but 
rather store them only at the moment you want to failover, you'd need to 
wait until the slons have restarted and actually replicated all those 
store path events. The stored procedure initiating the failover runs on 
node 2. But the store path event for (server=2, client=3) runs on node 
3. Therefore, at the time that store path is done, it still needs to be 
replicated to node 2 in order to allow the failover procedure to succeed.

I suggest you instead store the paths between nodes 2 and 3 right from 
the start. This makes sense because if node 2 is your designated backup 
server, there has to be a possible communication path between 2 and 3.

If you can't or don't want to do this, then you definitely have to 
insert some "wait for event" here and there into your failover script.


Jan


> ---------------------
> 
> Is my scripts OK?
> 
> Thanks
> 
> -------------------------------------------------
> This mail sent through IMP: www.resolution.com
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at gborg.postgresql.org
> http://gborg.postgresql.org/mailman/listinfo/slony1-general


-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #



More information about the Slony1-general mailing list