[Slony1-general] Failover failures

Wed Sep 7 12:05:35 PDT 2005

I'm going to forward this to our internal QA folks, who are currently
doing extended tests on the 1.1 release.

A

On Tue, Sep 06, 2005 at 02:29:17PM -0700, Darcy Buskermolen wrote:
> On Monday 22 August 2005 17:12, elein wrote:
> > Slony 1.1.  Three nodes. 10 set(1) => 20 => 30.
> >
> > I ran failover from node10 to node20.
> >
> > On node30, the origin of the set was changed
> > from 10 to 20, however, drop node10 failed
> > because of the row in sl_setsync.
> >
> > This causes slon on node30 to quit and the cluster to
> > become unstable.  Which in turn prevents putting
> > node10 back into the mix.
> >
> > Please tell me I'm not the first one to run into
> > this...
> >
> > The only clean work around I can see is to drop
> > node 30. Re-add it. And then re-add node10.  This
> > leaves us w/o a back up for the downtime.
> >
> >
> > This is what is in some of the tables for node20:
> >
> > gb2=# select * from sl_node;
> >  no_id | no_active |       no_comment        | no_spool
> > -------+-----------+-------------------------+----------
> >     20 | t         | Node 20 - gb2 at localhost | f
> >     30 | t         | Node 30 - gb3 at localhost | f
> > (2 rows)
> >
> > gb2=# select * from sl_set;
> >  set_id | set_origin | set_locked |     set_comment
> > --------+------------+------------+----------------------
> >       1 |         20 |            | Set 1 for gb_cluster
> > gb2=# select * from sl_setsync;
> >  ssy_setid | ssy_origin | ssy_seqno | ssy_minxid | ssy_maxxid | ssy_xip |
> > ssy_action_list
> > -----------+------------+-----------+------------+------------+---------+--
> >--------------- (0 rows)
> >
> > This is what I have for node30:
> >
> > gb3=# select * from sl_node;
> >  no_id | no_active |       no_comment        | no_spool
> > -------+-----------+-------------------------+----------
> >     10 | t         | Node 10 - gb at localhost  | f
> >     20 | t         | Node 20 - gb2 at localhost | f
> >     30 | t         | Node 30 - gb3 at localhost | f
> > (3 rows)
> >
> > gb3=# select * from sl_set;
> >  set_id | set_origin | set_locked |     set_comment
> > --------+------------+------------+----------------------
> >       1 |         20 |            | Set 1 for gb_cluster
> > (1 row)
> >
> > gb3=# select * from sl_setsync;
> >  ssy_setid | ssy_origin | ssy_seqno | ssy_minxid | ssy_maxxid | ssy_xip |
> > ssy_action_list
> > -----------+------------+-----------+------------+------------+---------+--
> >--------------- 1 |         10 |       235 | 1290260    | 1290261    |      
> >   | (1 row)
> >
> > frustrated,
> > --elein
> Elein,
> I can share your frustration, I have just for the first time started to 
> investigate failover and I have yet to be able to have a clean failover 
> happen, no matter how I do a failover I end up with nodes that are no longer 
> in sync with other the nodes.  My time is fairly short this week, but I hope 
> to be able to spend some time on it. I've pushed all my other slony work to 
> the back burner to come to a solid resolution to this.
> 
> Jan/Chris are either of you able to reproduce stable failovers in a multi node 
> (more than a single origin/subscriber pair) ?
> 
> > _______________________________________________
> > Slony1-general mailing list
> > Slony1-general at gborg.postgresql.org
> > http://gborg.postgresql.org/mailman/listinfo/slony1-general
> 
> -- 
> Darcy Buskermolen
> Wavefire Technologies Corp.
> 
> http://www.wavefire.com
> ph: 250.717.0200
> fx: 250.763.1759
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at gborg.postgresql.org
> http://gborg.postgresql.org/mailman/listinfo/slony1-general

-- 
Andrew Sullivan  | ajs at crankycanuck.ca
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.
		--Alexander Hamilton