Tue Apr 11 02:03:29 PDT 2006
- Previous message: [Slony1-general] Slave server dies after a few days of replication
- Next message: [Slony1-general] temporary tables
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Guys, Thanks for the replys... Jan Wieck wrote: > On 4/7/2006 2:16 PM, Christopher Browne wrote: >> Aaron Randall <aaron.randall at visionoss.com> writes: >> >>> Hi all! >>> >>> I am seeing a problem occurring after a few days of replication >>> between two of my servers - they replicate fine and then suddenly >>> the slon process stops on the slave. >> >> Does the slon start back up happily after this? Yes it does. I get a message something like "cleaning up old slon process" (sorry I can't give the exact message, it is on a live system so I cannot reproduce the messages). But yes, the whole process starts up nicely again. >> >>> The log file gives good information...I just need help in >>> understanding it. Here is the point in the slave logs where the >>> slon process shuts down: >>> >>> "2006-03-31 12:47:40 GMT DEBUG2 remoteHelperThread_1_1: 0.007 >>> seconds until close cursor >>> 2006-03-31 12:47:40 GMT DEBUG2 remoteWorkerThread_1: new >>> sl_rowid_seq value: 1000000000000000 >>> 2006-03-31 12:47:40 GMT DEBUG2 remoteWorkerThread_1: SYNC 244391 >>> done in 0.034 seconds >>> 2006-03-31 12:47:47 GMT DEBUG2 syncThread: new sl_action_seq 1 - >>> SYNC 230540 >>> 2006-03-31 12:47:47 GMT DEBUG2 localListenThread: Received event >>> 2,230540 SYNC >>> 2006-03-31 12:47:47 GMT DEBUG2 remoteWorkerThread_1: forward confirm >>> 2,230540 received by 1 >>> 2006-03-31 12:47:50 GMT DEBUG2 remoteListenThread_1: queue event >>> 1,244392 SYNC >>> 2006-03-31 12:47:50 GMT DEBUG2 remoteWorkerThread_1: Received event >>> 1,244392 SYNC >>> 2006-03-31 12:47:50 GMT DEBUG2 remoteWorkerThread_1: SYNC 244392 >>> processing >>> 2006-03-31 12:47:50 GMT DEBUG2 remoteWorkerThread_1: syncing set 1 >>> with 250 table(s) from mytable 1 >>> 2006-03-31 12:47:50 GMT DEBUG2 remoteHelperThread_1_1: 0.006 seconds >>> delay for first row >>> 2006-03-31 12:47:50 GMT DEBUG2 remoteHelperThread_1_1: 0.007 seconds >>> until close cursor >>> 2006-03-31 12:47:50 GMT DEBUG2 remoteWorkerThread_1: new >>> sl_rowid_seq value: 1000000000000000 >>> 2006-03-31 12:47:50 GMT DEBUG2 remoteWorkerThread_1: SYNC 244392 >>> done in 0.032 seconds >>> 2006-03-31 12:47:56 GMT FATAL syncThread: "start transaction;set >>> transaction isolation level serializable;select last_value from >>> "_my_replication".sl_action_seq;" - FATAL: terminating connection >>> due to administrator command >>> server closed the connection unexpectedly >>> This probably means the server terminated abnormally >>> before or while processing the request. > > There must be a) something in the postmaster log explaining why the > postmaster killed the backend and b) probably a coredump somewhere > laying around in $PGDATA, explaining in more detail what happened. > > > Jan I will take a look next time I have access and post the results if needed, thanks for the tips! > > >>> 2006-03-31 12:47:56 GMT DEBUG1 slon: shutdown requested >>> 2006-03-31 12:47:56 GMT DEBUG2 slon: notify worker process to shutdown >>> 2006-03-31 12:47:56 GMT DEBUG2 slon: wait for worker process to >>> shutdown >>> 2006-03-31 12:47:56 GMT INFO remoteListenThread_1: disconnecting >>> from 'host=1.1.1.2 dbname=mydb user=slonyuser port=5432' >>> 2006-03-31 12:47:56 GMT DEBUG1 remoteListenThread_1: thread done >>> 2006-03-31 12:47:56 GMT DEBUG1 localListenThread: thread done >>> 2006-03-31 12:47:56 GMT DEBUG1 cleanupThread: thread done >>> 2006-03-31 12:47:56 GMT DEBUG1 main: scheduler mainloop returned >>> 2006-03-31 12:47:56 GMT DEBUG2 main: wait for remote threads >>> 2006-03-31 12:47:56 GMT DEBUG2 sched_wakeup_node(): no_id=1 (0 >>> threads + worker signaled) >>> 2006-03-31 12:47:56 GMT DEBUG1 remoteWorkerThread_1: helper thread >>> for provider 1 terminated >>> 2006-03-31 12:47:56 GMT DEBUG1 remoteWorkerThread_1: disconnecting >>> from data provider 1 >>> 2006-03-31 12:47:56 GMT DEBUG1 remoteWorkerThread_1: thread done >>> 2006-03-31 12:47:56 GMT DEBUG2 main: notify parent that worker is done >>> 2006-03-31 12:47:56 GMT DEBUG1 main: done >>> 2006-03-31 12:47:56 GMT DEBUG2 slon: worker process shutdown ok >>> 2006-03-31 12:47:56 GMT DEBUG2 slon: exit(-1) >>> " >> >> Something sent a SIGTERM signal to the backend supporting the >> syncThread, which, if memory serves, could mean that *any* of the >> backends that slon is listening to were terminated. >> >> You should figure out why something is sending SIGTERM signals to your >> databases; this isn't a Slony-I issue per se. >> >> Out of memory problems have historically caused this; you should check >> database logs to see what's up. Slony-I won't fix your database >> problems; it is simply vulnerable to them :-(. > > >
- Previous message: [Slony1-general] Slave server dies after a few days of replication
- Next message: [Slony1-general] temporary tables
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list