Wed Feb 2 18:35:52 PST 2005
- Previous message: [Slony1-general] Switching master-slave roles after a failover
- Next message: [Slony1-general] redundent conninfo
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
k-ohara at excite.co.jp wrote: >How can I manage to implement Step 4 in the following scenario: > > step #1. A was master; B was slave. > step #2. B detects a failure; promotes itself to a master. > step #3. The cause of the failure is resolved and removed by admin. > step #4. A becomes a new slave manually or automatically. > >I know I should rebuild server A `from scratch' >if the cause is disk error or something. >But in some cases (e.g. NIC error), A's disk is safe and sound. > >In such cases, I thought I could switch master-slave roles >even after the failover command, but manuals and mailing list >archives seemingly suggest not to do that. > >My idea was to kill all slon daemons, drop all slony schemata from >both servers, pg_dump/undump from B to A if needed, >then to re-install the schemata with reversed roles. > > The problem with NOT rebuilding A from scratch is that you may get things into an inconsistent state. Consider in a little more detail: Step 1. A was "origin", B was a subscriber Step 2. Network failure takes place so that B decides to take over via FAILOVER. Conditions at time of step 2: At the time of that takeover, the database on A has 25 committed replicable transactions that had never made it to B. FAILOVER treats those transactions as lost. But in fact, they are sitting on A, committed. You may resolve the cause of the failure, but this does not resolve those 25 transactions that are in a sort of "limbo," sitting committed on A, but not replicated anywhere else. Indeed, users may have re-attempted the transactions on B so that there are logical equivalents waiting to be replicated to subscribers. The systems are out of sync in a way that Slony-I is not equipped to rectify. At that point, you have a conflict that the replication system cannot correct for. The only thing to be safely done is to reconstruct A from scratch. That is why what FAILOVER does is to _abandon_ the failed node. If those 25 transactions represent business promises (e.g. - they involved transactions to promise shipping products to customers, or such), then you need to resolve this via taking a look at what was outstanding on node A at the end. Some of the 25 transactions may be irrelevant; others might be Really Important; evaluating that isn't something Slony-I can do. I have added some further discussion of this in the various appropriate places in the Admin Guide; checked into CVS, and probably soon to be published on some web site near you... -- <http://cbbrowne.com/info/failover.html>
- Previous message: [Slony1-general] Switching master-slave roles after a failover
- Next message: [Slony1-general] redundent conninfo
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list