cbbrowne at ca.afilias.info cbbrowne
Tue May 17 13:55:49 PDT 2005
>
> I've gotten the proof-of-concept stage for our replication system set up,
> and now I'm looking at going to production.  Just a few questions:
>
> 1. How does one typically start the slon daemons?  I would prefer to have
> a
> chkconfig-standard file that I could stick into /etc/init.d , and I'll
> build
> one if I have to, but I'm sure someone out there has something that will
> help.

On some of our systems, we have cron scripts that periodically (~ every 15
minutes) "thump and restart" the watchdog processes.

That has three notable effects:
 - If the system reboots, this means slons will only be down for a
   few minutes.  DBAs may not get paged in the middle of the night if
   a non-critical replica goes down.

 - We found some condition that allowed the watchdogs to get snarled
   up; in effect, the "thump/restart" is a watchdog on the watchdog.

 - This "thumping" doesn't involve "thumping" the slons themselves.
   If all is going well, the slons are never aware of the "Changing
   of the Guard."  (I see a renaming of the script in the near
   future, and a move to bright red uniforms for the documentation,
   though there probably won't be too much pomp or
   circumstance to this changing of the guard...)

> 2. What are good "best practices" for dealing with failure modes?  I've
> recreated a cluster with a failed master by removing the slony instance
> from
> the slave, and then re-creating the whole system, but is there a better
> way
> to promote surviving slave to take over for a dead master?

That's highly policy-driven, and somewhat dangerous to prescribe.

If you can more precisely describe the failure mode, there may be
suggestions to be made.

But failover policy does need to be individualized to the specific
environment.

> 3. Anyone have any good scripts for encapsulating the slonik commands?

Well, there's the altperl stuff...

What we have found is that the altperl tools have been very useful for
generating slonik scripts.  After that, there's little substitute for
reviewing (and perhaps revising) and running the resulting scripts.

Embedding them out of the way isn't that helpful.  If they disappear from
view, then you don't know what they're doing, and as a result, you don't
know what you're doing to your replication environment.

I don't think that's likely to be much of a "best practice."



More information about the Slony1-general mailing list