[Slony1-general] Master or Salave Down

Thu Sep 22 22:21:47 PDT 2005

Ujwal S. Setlur wrote:

>>The fact that it requires discarding and
>>reinitializing the failed node
>>makes FAILOVER a pretty undesirable operation.
>>    
>>
>
>I have heard this mentioned a few times now, and I
>have always wondered about it...
>
>While replication has many uses such as back-up and
>disaster recover, failover is IMHO also a legitimate
>use. Nobody wants it to have happen, of course, but it
>will and does. In fact, I consider failover as part of
>disaster recovery.
>
>So what does "undesirable operation" actually mean?
>  
>
It is a "last resort" for two reasons:

1.  You may LOSE DATA, because any updates that have been committed on
the origin node but which have not yet been replicated to another node
will be lost.

This is a particularly bad thing if your applications have, on the basis
of the COMMIT on the origin, reported business transactions as COMMITted
to outside customers.

2.  Node abandonment.  After you perform...

 FAILOVER (id = 1, backup node = 2)

the node which was formerly the origin must be ABANDONED as far as
replication is concerned.

The failed node cannot be brought back into replication without
reinitializing it from scratch as a fresh, new Slony-I node, which
involves deleting all the data and setting up a subscription to one or
another of the nodes that had survived.

If you can get the database back up and accessible long enough to run
MOVE SET, neither of these problems need to occur.

If the disaster is "bad enough," failover is absolutely legitimately usable.

But you have to be prepared to accept both of the above caveats.