Mon Sep 19 10:19:20 PDT 2005
- Previous message: [Slony1-general] Master server reboot
- Next message: [Slony1-general] con_timestamp default problem
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Christopher, At 05:59 AM 9/19/2005, Christopher Browne wrote: >"Deon van der Merwe" <dvdm at truteq.co.za> writes: > > We have a master and 4 slave servers running Slony1 on a > LAN. Everything is > > working great, except for one thing: > > - each of the slon processes runs on its own server > > - they each run in an endless loop, so that they can always start again for > > whatever reason > > - we had do a reboot of the master server > > - after the reboot, all the slaves reconnected > > - the problem is this: the actual replication of data stopped. With a > > restart of the slon process on every slave the replication started to work > > again. > > > > My question this is: > > - what is the expected behavior for the above scenario? > > - I need to investigate some more... What can/should/must I check in order > > to find out why this is happened? That is if I am able to repeat it! > > - I will need to find out if I can repeat what happened... > > > > We are running on FC4 (so that is PostgreSQL 8.0.3) on all the > servers using > > Slony-I 1.1.0. > >So, the only database that "fell over" was the master? Correct. All 4 slaves was untouched, and we rebooted the master. >It sounds like what happened is that the remote worker threads that >pointed to the "master" saw that DB go away, and shut down the one >relevant remote worker thread. > >This left all the other threads up and running, which would have been >OK had subscriptions been provided by the other threads... From what I could see (off the little that I know of Slony1...) was that they did reconnect. >I have to call this behaviour "not unexpected." >An interesting retry would be to have one or more cascaded >subscribers. I will try and make a plan on the test system, as the above was on the live system. >Expected result there: If you restart the slons for the direct >subscribers, that should suffice to get all the subscribers back >going. The cascaded subscribers should pick up once the direct >subscribers have their slons restarted. On restart of the slons on each slave did restart the actual replication without any delay. I really want to investigate this more, but need to know what to check where in order provide more/better detailed information. Any suggestions? -Deon _____________________________________________________ TruTeq Wireless (Pty) Ltd. | Tel: +27 (0)12 667 1530 http://www.truteq.co.za | Fax: +27 (0)12 667 1531 Wireless communications for remote machine management
- Previous message: [Slony1-general] Master server reboot
- Next message: [Slony1-general] con_timestamp default problem
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list