Mon Oct 15 18:55:36 PDT 2012
- Next message: [Slony1-hackers] Failover never completes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I have a client which is seeing something just like: http://www.slony.info/bugzilla/show_bug.cgi?id=130 which is a duplicate of http://www.slony.info/bugzilla/show_bug.cgi?id=80 The latter apparently was never fixed. The comments in the bug say: "recommend not rushing to drop the node out of the cluster until you actually get the failover completed. As a first response, that's definitely what I'd recommend. When you drop it "too quickly," that introduces the risk, which you ran into, that some later node gets the DROP NODE event before receiving the FAILOVER event." Here's what we do in a nutshell: ----------------------- A == original master B == slave1 C == new master D == slave2 all commands run from C * switchover from A to B * clone A to make C * switchback from B to A * failover from A to C * drop A ----------------------- This works fine 90% of the time (using some scripts to ensure we are doing it exactly the same each time). When we do the failover (which is run on/from C), slonik completes the failover "successfully" (at least no errors reported by slonik), but hours later (i.e. it is not a matter of not waiting long enough I think) the original master is still the set_origin in the slony catalog of the new master (this is on a test cluster with no activity). Consequently when we try to drop the old master it fails (which is probably a good thing since the failover was not really successful). sl_path looks correct sl_subscribe has an extra row marked active=false with B as the provider (leftover from the switchback?) sl_set still has set_origin pointing to A sl_node still shows all 4 nodes as active=true So questions: 1) Is bug 80 still open? 2) Any plan to fix it or even ideas how to fix it? 3) Anything obvious we are missing? 4) Is there a better/more reliable way to get C stood up as the new master without taking down the cluster longer than the sequence above would do? Thanks, Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support
- Next message: [Slony1-hackers] Failover never completes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-hackers mailing list