Christopher Browne cbbrowne
Mon Sep 19 04:59:48 PDT 2005
"Deon van der Merwe" <dvdm at truteq.co.za> writes:
> We have a master and 4 slave servers running Slony1 on a LAN.  Everything is
> working great, except for one thing:
> - each of the slon processes runs on its own server
> - they each run in an endless loop, so that they can always start again for
> whatever reason
> - we had do a reboot of the master server
> - after the reboot, all the slaves reconnected
> - the problem is this: the actual replication of data stopped.  With a
> restart of the slon process on every slave the replication started to work
> again.
>
> My question this is:
> - what is the expected behavior for the above scenario?
> - I need to investigate some more... What can/should/must I check in order
> to find out why this is happened?  That is if I am able to repeat it!
> - I will need to find out if I can repeat what happened...
>
> We are running on FC4 (so that is PostgreSQL 8.0.3) on all the servers using
> Slony-I 1.1.0.

So, the only database that "fell over" was the master?

It sounds like what happened is that the remote worker threads that
pointed to the "master" saw that DB go away, and shut down the one
relevant remote worker thread.

This left all the other threads up and running, which would have been
OK had subscriptions been provided by the other threads...

I have to call this behaviour "not unexpected."

An interesting retry would be to have one or more cascaded
subscribers.

Expected result there: If you restart the slons for the direct
subscribers, that should suffice to get all the subscribers back
going.  The cascaded subscribers should pick up once the direct
subscribers have their slons restarted.
-- 
"cbbrowne","@","ca.afilias.info"
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)


More information about the Slony1-general mailing list