[Slony1-general] what to consider for failover policy?

Tue Jun 21 22:18:44 PDT 2005

Ok, I'd first like to thank all of you for your input, you have been
of great help so far. We have discussed the linux-HA project as a
solution, but they started complaining about a whole server just
sitting there doing nothing until a failure. I see many people talking
of network connections failing. Once we start getting more users, this
will all be redundant. The power supplies are redundant (triple setup)
harddrives are redundant, redundant UPS's, etc. etc..
As for clients, there will be a total of 2 other local servers
accessing the database, and the clients log into those terminal
servers. So, all the database stuff is local. They requested of me for
this to be pretty much completely automated, as I am only here for
another week or so. I have started telling them, that with the network
monitoring, we will be able to notify someone of a failure, but only a
human will be able to make the final decision of failover. I may
implement something to do a switchover if the load is a set value for
a set time or something along those lines. I am confirming now with my
thoughts, that this automatic failover is going to be a HUGE task, and
have started to tell them that it is not a good idea. We (or I rather)
could just miss too many things.

I was just wondering what you other guys were doing for failover setups.

Thanks a TON for the input!!
~Tyler

On 6/21/05, Daniel P. Berrange <dan at berrange.com> wrote:
> On Tue, Jun 21, 2005 at 10:04:13AM -0400, Vivek Khera wrote:
> >
> > On Jun 20, 2005, at 2:21 PM, 31337 .. wrote:
> >
> > >Are there any other easier ways to detect when the master node has
> > >gone down?
> >
> > Your first step is to define *precisely* what you consider "down".
> > Try enumerating all scenarios relative to each machine that could be
> > connecting to any of your db servers, and how you would notify all of
> > those hosts to switch to another "master", and how you would tell the
> > master it is no longer the master should it become undead.
> >
> > This will be very hard.
> 
> You might want to take a look at the capabilities offered by a project
> such as Linux HA (www.linux-ha.org) or the Red Hat Cluster Suite. There
> are many failure & failover scenarios & you don't really want to have
> thing of them all yourself, so better to leverage existing code. Ultimately
> the real key to a reliable failover is some sort of STONITH (Shoot The
> Other Node In The Head) capability to ensure that, when a failover
> occurrs, there is absolutely no way the original master can come back
> to life. Hardware power switches are the preferrable, but software
> NMI watchdogs could be used to do an automatic reboot of the failed
> node. The Linux-HA / RH Cluster Suite agents, also take care of issues
> such as quorum & split-brain to ensure optimal choice of slave to fail
> over too.
> 
> Regards,
> Dan.
> --
> |=-            GPG key: http://www.berrange.com/~dan/gpgkey.txt       -=|
> |=-       Perl modules: http://search.cpan.org/~danberr/              -=|
> |=-           Projects: http://freshmeat.net/~danielpb/               -=|
> |=-   berrange at redhat.com  -  Daniel Berrange  -  dan at berrange.com    -=|
> 
> 
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at gborg.postgresql.org
> http://gborg.postgresql.org/mailman/listinfo/slony1-general
> 
> 
> 
>