Richard Yen dba at richyen.com
Mon Aug 23 12:23:29 PDT 2010
Hi everyone,

Wondering why the following would happen (running slony 2.0.3 on postgres 8.4.2 in a CentOS 2.6.18 box):

I have 1 provider (nodeid=5) and 3 subscribers (nodeids 2,3,4).  Sorry, long story, but it just happened that way that the nodeIDs are the way they are.

Node 2 generates the logshipping files (started with `slon -a <blah> ...`) and has been running for well over 6 months now, I think.  When I went to do a dump/reload on node 4 today, node 2 stopped generating files when I performed the DROP NODE for node 2.

In the logs for Node 2, I find the following:
> Aug 23 10:52:32 shipper_db slon[17121]: [1747401-1] 2010-08-23 10:52:32 PDT WARN   serialization problem updating sl_archive_counter: restarting slon
> Aug 23 10:52:32 shipper_db slon[17121]: [1747402-1] 2010-08-23 10:52:32 PDT CONFIG disableNode: no_id=3
> Aug 23 10:52:32 shipper_db slon[17121]: [1747403-1] 2010-08-23 10:52:32 PDT CONFIG storeListen: li_origin=4 li_receiver=2 li_provider=4
> Aug 23 10:52:32 shipper_db slon[17121]: [1747404-1] 2010-08-23 10:52:32 PDT CONFIG storeListen: li_origin=4 li_receiver=2 li_provider=5
> Aug 23 10:52:32 shipper_db slon[17121]: [1747405-1] 2010-08-23 10:52:32 PDT CONFIG storeListen: li_origin=6 li_receiver=2 li_provider=4
> Aug 23 10:52:32 shipper_db slon[17121]: [1747406-1] 2010-08-23 10:52:32 PDT CONFIG storeListen: li_origin=6 li_receiver=2 li_provider=5
> Aug 23 10:52:32 shipper_db slon[17121]: [1747407-1] 2010-08-23 10:52:32 PDT INFO   localListenThread: got restart notification
> Aug 23 10:52:32 shipper_db slon[17121]: [1747408-1] 2010-08-23 10:52:32 PDT CONFIG storeListen: li_origin=5 li_receiver=2 li_provider=5
> Aug 23 10:52:32 shipper_db slon[17121]: [1747409-1] 2010-08-23 10:52:32 PDT CONFIG remoteWorkerThread_5: update provider configuration
> Aug 23 10:52:32 shipper_db slon[25593]: [5-1] 2010-08-23 10:52:32 PDT INFO   slon: restart requested
> Aug 23 10:52:32 shipper_db slon[17121]: [1747410-1] 2010-08-23 10:52:32 PDT CONFIG remoteWorkerThread_5: added active set 1 to provider 5
> Aug 23 10:52:32 shipper_db slon[25593]: [6-1] 2010-08-23 10:52:32 PDT INFO   slon: notify worker process to shutdown
> Aug 23 10:52:32 shipper_db slon[17121]: [1747411-1] 2010-08-23 10:52:32 PDT INFO   remoteListenThread_4: disconnecting from 'dbname=tii host=db5.xxx.xxxxxxx.com port=5432
> Aug 23 10:52:32 shipper_db slon[17121]: [1747411-2]  user=slony password=xxxxxxx'
> Aug 23 10:52:32 shipper_db slon[17121]: [1747412-1] 2010-08-23 10:52:32 PDT CONFIG remoteWorkerThread_4: update provider configuration
> Aug 23 10:52:32 shipper_db slon[17121]: [1747413-1] 2010-08-23 10:52:32 PDT INFO   remoteWorkerThread_4: thread done
> Aug 23 10:52:32 shipper_db slon[17121]: [1747414-1] 2010-08-23 10:52:32 PDT DEBUG1 cleanupThread: thread done
> Aug 23 10:52:32 shipper_db slon[17121]: [1747415-1] 2010-08-23 10:52:32 PDT INFO   remoteListenThread_5: disconnecting from 'dbname=tii host=db6.xxx.xxxxxxx.com port=5432
> Aug 23 10:52:32 shipper_db slon[17121]: [1747415-2]  user=slony password=xxxxxxx'
> Aug 23 10:52:32 shipper_db slon[17121]: [1747416-1] 2010-08-23 10:52:32 PDT INFO   remoteListenThread_3: disconnecting from 'dbname=tii host=db4.xxx.xxxxxxx.com port=5432
> Aug 23 10:52:32 shipper_db slon[17121]: [1747416-2]  user=slony password=xxxxxxx'
> Aug 23 10:52:32 shipper_db slon[17121]: [1747417-1] 2010-08-23 10:52:32 PDT INFO   syncThread: thread done
> Aug 23 10:52:32 shipper_db slon[17121]: [1747418-1] 2010-08-23 10:52:32 PDT CONFIG remoteWorkerThread_5: update provider configuration
> Aug 23 10:52:32 shipper_db slon[17121]: [1747419-1] 2010-08-23 10:52:32 PDT CONFIG remoteWorkerThread_3: update provider configuration
> Aug 23 10:52:32 shipper_db slon[17121]: [1747420-1] 2010-08-23 10:52:32 PDT DEBUG1 remoteListenThread_4: thread done
> Aug 23 10:52:32 shipper_db slon[17121]: [1747421-1] 2010-08-23 10:52:32 PDT DEBUG1 remoteListenThread_5: thread done
> Aug 23 10:52:32 shipper_db slon[17121]: [1747422-1] 2010-08-23 10:52:32 PDT DEBUG1 remoteListenThread_3: thread done
> Aug 23 10:52:32 shipper_db slon[17121]: [1747423-1] 2010-08-23 10:52:32 PDT INFO   remoteWorkerThread_3: thread done
> Aug 23 10:52:32 shipper_db slon[17121]: [1747424-1] 2010-08-23 10:52:32 PDT CONFIG remoteWorkerThread_5: helper thread for provider 5 terminated
> Aug 23 10:52:32 shipper_db slon[17121]: [1747425-1] 2010-08-23 10:52:32 PDT CONFIG remoteWorkerThread_5: disconnecting from data provider 5
> Aug 23 10:52:32 shipper_db slon[17121]: [1747426-1] 2010-08-23 10:52:32 PDT INFO   remoteWorkerThread_5: thread done
> Aug 23 10:52:32 shipper_db slon[17121]: [1747427-1] 2010-08-23 10:52:32 PDT INFO   main: scheduler mainloop returned
> Aug 23 10:52:32 shipper_db slon[17121]: [1747428-1] 2010-08-23 10:52:32 PDT CONFIG main: wait for remote threads
> Aug 23 10:52:32 shipper_db slon[17121]: [1747429-1] 2010-08-23 10:52:32 PDT CONFIG main: done
> Aug 23 10:52:32 shipper_db slon[25593]: [7-1] 2010-08-23 10:52:32 PDT CONFIG slon: child terminated status: 0; pid: 17121, current worker pid: 17121
> Aug 23 10:52:32 shipper_db slon[25593]: [1-1] 2010-08-23 10:52:32 PDT CONFIG main: slon version 2.0.3 starting up
> Aug 23 10:52:32 shipper_db slon[25593]: [2-1] 2010-08-23 10:52:32 PDT INFO   slon: watchdog process started
> Aug 23 10:52:32 shipper_db slon[25593]: [3-1] 2010-08-23 10:52:32 PDT CONFIG slon: watchdog ready - pid = 25593

Once I did an ADD NODE to put Node 3 back in as a subscriber, Node 2 generated 2 logfiles and now it's not generating anymore.  Perhaps this is because no new events are being generated from Node 3 while the COPY SET is finishing up.  Also, these ten logfiles were empty:

> ------------------------------------------------------------------
> -- Slony-I log shipping archive
> -- Node 3, Event 5000000004
> ------------------------------------------------------------------
> set session_replication_role to replica;
> start transaction;
> select "_slony_schema".archiveTracking_offline('13801039', '2010-08-23 11:44:31.541321');
> -- end of log archiving header
> ------------------------------------------------------------------
> -- start of Slony-I data
> ------------------------------------------------------------------
> 
> ------------------------------------------------------------------
> -- End Of Archive Log
> ------------------------------------------------------------------
> commit;
> vacuum analyze "_slony_schema".sl_archive_tracking;


Would anyone be able to explain why dropping a subscriber from replication cause another subscriber to stop generating logfiles?  Or, for that matter, why the actions on another subscriber affects logshipping?

Regards,
--Richard


More information about the Slony1-general mailing list