Mon Apr 10 06:55:31 PDT 2006
- Previous message: [Slony1-general] Slave server dies after a few days of replication
- Next message: [Slony1-general] Slave server dies after a few days of replication
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 4/7/2006 2:16 PM, Christopher Browne wrote: > Aaron Randall <aaron.randall at visionoss.com> writes: > >> Hi all! >> >> I am seeing a problem occurring after a few days of replication between >> two of my servers - they replicate fine and then suddenly the slon >> process stops on the slave. > > Does the slon start back up happily after this? > >> The log file gives good information...I >> just need help in understanding it. Here is the point in the slave logs >> where the slon process shuts down: >> >> "2006-03-31 12:47:40 GMT DEBUG2 remoteHelperThread_1_1: 0.007 seconds >> until close cursor >> 2006-03-31 12:47:40 GMT DEBUG2 remoteWorkerThread_1: new sl_rowid_seq >> value: 1000000000000000 >> 2006-03-31 12:47:40 GMT DEBUG2 remoteWorkerThread_1: SYNC 244391 done in >> 0.034 seconds >> 2006-03-31 12:47:47 GMT DEBUG2 syncThread: new sl_action_seq 1 - SYNC 230540 >> 2006-03-31 12:47:47 GMT DEBUG2 localListenThread: Received event >> 2,230540 SYNC >> 2006-03-31 12:47:47 GMT DEBUG2 remoteWorkerThread_1: forward confirm >> 2,230540 received by 1 >> 2006-03-31 12:47:50 GMT DEBUG2 remoteListenThread_1: queue event >> 1,244392 SYNC >> 2006-03-31 12:47:50 GMT DEBUG2 remoteWorkerThread_1: Received event >> 1,244392 SYNC >> 2006-03-31 12:47:50 GMT DEBUG2 remoteWorkerThread_1: SYNC 244392 processing >> 2006-03-31 12:47:50 GMT DEBUG2 remoteWorkerThread_1: syncing set 1 with >> 250 table(s) from mytable 1 >> 2006-03-31 12:47:50 GMT DEBUG2 remoteHelperThread_1_1: 0.006 seconds >> delay for first row >> 2006-03-31 12:47:50 GMT DEBUG2 remoteHelperThread_1_1: 0.007 seconds >> until close cursor >> 2006-03-31 12:47:50 GMT DEBUG2 remoteWorkerThread_1: new sl_rowid_seq >> value: 1000000000000000 >> 2006-03-31 12:47:50 GMT DEBUG2 remoteWorkerThread_1: SYNC 244392 done in >> 0.032 seconds >> 2006-03-31 12:47:56 GMT FATAL syncThread: "start transaction;set >> transaction isolation level serializable;select last_value from >> "_my_replication".sl_action_seq;" - FATAL: terminating connection due >> to administrator command >> server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. There must be a) something in the postmaster log explaining why the postmaster killed the backend and b) probably a coredump somewhere laying around in $PGDATA, explaining in more detail what happened. Jan >> 2006-03-31 12:47:56 GMT DEBUG1 slon: shutdown requested >> 2006-03-31 12:47:56 GMT DEBUG2 slon: notify worker process to shutdown >> 2006-03-31 12:47:56 GMT DEBUG2 slon: wait for worker process to shutdown >> 2006-03-31 12:47:56 GMT INFO remoteListenThread_1: disconnecting from >> 'host=1.1.1.2 dbname=mydb user=slonyuser port=5432' >> 2006-03-31 12:47:56 GMT DEBUG1 remoteListenThread_1: thread done >> 2006-03-31 12:47:56 GMT DEBUG1 localListenThread: thread done >> 2006-03-31 12:47:56 GMT DEBUG1 cleanupThread: thread done >> 2006-03-31 12:47:56 GMT DEBUG1 main: scheduler mainloop returned >> 2006-03-31 12:47:56 GMT DEBUG2 main: wait for remote threads >> 2006-03-31 12:47:56 GMT DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + >> worker signaled) >> 2006-03-31 12:47:56 GMT DEBUG1 remoteWorkerThread_1: helper thread for >> provider 1 terminated >> 2006-03-31 12:47:56 GMT DEBUG1 remoteWorkerThread_1: disconnecting from >> data provider 1 >> 2006-03-31 12:47:56 GMT DEBUG1 remoteWorkerThread_1: thread done >> 2006-03-31 12:47:56 GMT DEBUG2 main: notify parent that worker is done >> 2006-03-31 12:47:56 GMT DEBUG1 main: done >> 2006-03-31 12:47:56 GMT DEBUG2 slon: worker process shutdown ok >> 2006-03-31 12:47:56 GMT DEBUG2 slon: exit(-1) >> " > > Something sent a SIGTERM signal to the backend supporting the > syncThread, which, if memory serves, could mean that *any* of the > backends that slon is listening to were terminated. > > You should figure out why something is sending SIGTERM signals to your > databases; this isn't a Slony-I issue per se. > > Out of memory problems have historically caused this; you should check > database logs to see what's up. Slony-I won't fix your database > problems; it is simply vulnerable to them :-(. -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck at Yahoo.com #
- Previous message: [Slony1-general] Slave server dies after a few days of replication
- Next message: [Slony1-general] Slave server dies after a few days of replication
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list