Thu Jan 26 11:29:32 PST 2006
- Previous message: [Slony1-general] Replication fails after network outage
- Next message: [Slony1-general] Slony-I 1.1.5 Released
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Glen Eustace wrote: >I am using slony-1.1.0 with postgresql-8.0.6 and have a situation that I >hope has a better resolution than I am currently using. > >One of my 2 slaves is some distance away and over the last 6 months or >so we have had quite a few network brown or black outs between it and >the master. After such an event, replication fails and the only way I am >managing to get it to go again is to drop the node and database and >start again. I have done this now so many times I have scripted it so >that I can get the slave back online relatively quickly. > >I get errors, like the following, in the slony log > >2006-01-26 08:11:13 NZDT ERROR remoteWorkerThread_1: "start >transaction; set enable_seqscan = off; set enable_indexscan = on; " >PGRES_FATAL_ERROR 2006-01-26 08:11:13 NZDT ERROR remoteWorkerThread_1: >"close LOG; " PGRES_FATAL_ERROR 2006-01-26 08:11:13 NZDT ERROR remot >eWorkerThread_1: "rollback transaction; set enable_seqscan = default; >set enable_indexscan = default; " PGRES_FATAL_ERROR 2006-01-26 08 >:11:13 NZDT ERROR remoteWorkerThread_1: helper 1 finished with error >2006-01-26 08:11:13 NZDT ERROR remoteWorkerThread_1: SYNC aborted > >Stopping and restarting all the various slony processes doesn't seem to >clear things. > >NB: It only ever seems to happen after a network event. Any advice on >how to get replication started again without rebuilding would be >appreciated. > > > > One thought... You might want to turn the logging up to a higher level; it looks as though it's at level 1, and I'd expect "-d 2" to give more useful information. Another notion... My suspicion is that what is happening is that the connection between the slon and the database it is managing was broken by the network event. Higher debug levels might display a message like "a slon is already servicing node #2;" that would be a good tell-tale sign... The next time this happens, connect in to the database and look at pg_stat_activity to see what slony-related backends are in use. My suspicion is that you'll see several of them, possibly (if statement logging is on) indicating "<IDLE> in transaction". Solution #1... Those idle-in-transaction backends are, in effect, 'zombies' of sorts. They haven't yet figured out that the network connection has died and won't be coming back. They could persist (depending on TCP/IP configuration) for up to a couple hours. Kill them off, and see if starting new slon processes works out better. Solution #2... It is preferable if each slon lives on the same network as the database it is managing. That would prevent some of the above from happening, notably in that restarting slons would do some good.
- Previous message: [Slony1-general] Replication fails after network outage
- Next message: [Slony1-general] Slony-I 1.1.5 Released
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list