Joe Conway mail at joeconway.com
Mon Oct 15 18:55:36 PDT 2012
I have a client which is seeing something just like:
  http://www.slony.info/bugzilla/show_bug.cgi?id=130
which is a duplicate of
  http://www.slony.info/bugzilla/show_bug.cgi?id=80
The latter apparently was never fixed.

The comments in the bug say:

  "recommend not rushing to drop the node out of the
   cluster until you actually get the failover completed.

   As a first response, that's definitely what I'd
   recommend.

  When you drop it "too quickly," that introduces the
  risk, which you ran into, that some later node gets
  the DROP NODE event before receiving the FAILOVER
  event."

Here's what we do in a nutshell:
-----------------------
A == original master
B == slave1
C == new master
D == slave2

all commands run from C

* switchover from A to B
* clone A to make C
* switchback from B to A
* failover from A to C
* drop A
-----------------------

This works fine 90% of the time (using some scripts to ensure we are
doing it exactly the same each time).

When we do the failover (which is run on/from C), slonik completes the
failover "successfully" (at least no errors reported by slonik), but
hours later (i.e. it is not a matter of not waiting long enough I think)
the original master is still the set_origin in the slony catalog of the
new master (this is on a test cluster with no activity). Consequently
when we try to drop the old master it fails (which is probably a good
thing since the failover was not really successful).

 sl_path looks correct
 sl_subscribe has an extra row marked active=false with
   B as the provider (leftover from the switchback?)
 sl_set still has set_origin pointing to A
 sl_node still shows all 4 nodes as active=true

So questions:
1) Is bug 80 still open?
2) Any plan to fix it or even ideas how to fix it?
3) Anything obvious we are missing?
4) Is there a better/more reliable way to get C stood
   up as the new master without taking down the cluster
   longer than the sequence above would do?

Thanks,

Joe

-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support



More information about the Slony1-hackers mailing list