Thu Jan 14 05:30:23 PST 2010
- Previous message: [Slony1-general] drop node not working correctly
- Next message: [Slony1-general] drop node not working correctly
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi, I have attempted to investigate further into why the failover/drop node is not being picked up on node 3. Here is the actual output of the slonik script in my original post: [oper at backup slonik]$ slonik forceProviderChangeToBackup.sk INFO: calling failedNode(1,2) on node 1 forceProviderChangeToBackup.sk:9: NOTICE: failedNode: set 1 has other direct receivers - change providers only forceProviderChangeToBackup.sk:9: NOTICE: failedNode: set 2 has no other direct receivers - move now forceProviderChangeToBackup.sk:9: NOTICE: failedNode: set 3 has no other direct receivers - move now INFO: calling failedNode(1,2) on node 3 forceProviderChangeToBackup.sk:9: NOTICE: failedNode: set 1 has other direct receivers - change providers only forceProviderChangeToBackup.sk:9: NOTICE: failedNode: set 2 has no other direct receivers - move now forceProviderChangeToBackup.sk:9: NOTICE: failedNode: set 3 has no other direct receivers - move now INFO: Waiting for slon engines to restart IMPORTANT: Last known SYNC for set 1 = 383 INFO: Node with highest sync for set 1 is 2 INFO: Node with highest sync for set 2 is 2 INFO: Node with highest sync for set 3 is 2 After the inspecting the logfile generated by the slon process at node 3 and it seems to pick up on the fact that the set has been moved to node 2, but it does not remove node 1. DEBUG2 remoteWorkerThread_2: Received event 2,180 ACCEPT_SET DEBUG2 start processing ACCEPT_SET DEBUG2 ACCEPT: set=1 DEBUG2 ACCEPT: old origin=1 DEBUG2 ACCEPT: new origin=2 DEBUG2 ACCEPT: move set seq=384 DEBUG2 got parms ACCEPT_SET DEBUG2 ACCEPT_SET - node not origin DEBUG2 remoteListenThread_2: queue event 2,183 SYNC DEBUG2 remoteListenThread_2: queue event 2,184 DROP_NODE DEBUG2 remoteListenThread_2: queue event 2,185 SYNC DEBUG2 ACCEPT_SET - MOVE_SET or FAILOVER_SET not received yet - sleep ERROR slon_connectdb: PQconnectdb("dbname=db host=node1 port=5432 user=postgres") failed - could not connect to server: Connection refused Is the server running on host "node 1" and accepting TCP/IP connections on port 5432? WARN remoteListenThread_1: DB connection failed - sleep 10 seconds DEBUG2 syncThread: new sl_action_seq 1 - SYNC 181 DEBUG2 remoteListenThread_2: LISTEN DEBUG2 remoteListenThread_2: queue event 2,186 SYNC DEBUG2 remoteListenThread_2: UNLISTEN DEBUG2 ACCEPT_SET - MOVE_SET or FAILOVER_SET not received yet - sleep DEBUG2 localListenThread: Received event 3,181 SYNC ERROR slon_connectdb: PQconnectdb("dbname=db host=node1 port=5432 user=postgres") failed - could not connect to server: Connection refused Is the server running on host "node 1" and accepting TCP/IP connections on port 5432? Does the below line mean it is waiting for some kind of notification from somewhere? : DEBUG2 ACCEPT_SET - MOVE_SET or FAILOVER_SET not received yet - sleep Additionally, does anyone know how to make the slon logs contain a timestamp (e.g. DEBUG2 [2009-01-14 12:12] syncThread), as I find it pretty hard to follow what is going on when comparing the log files at multiple nodes. Cheers, Andy 2010/1/13 Andy Dale <andy.dale at gmail.com> > Hi, > > I have set up a simple 3 node slony cluster, and every thing works pretty > much as I would expect, however I am running into a few issue when using the > drop node (in a failover scenario). > > I have a simple slonik script to perform a failover as follows (node 1 is > the old master node to removed): > > #!/usr/bin/slonik > > include <preamble.sk>; > > # hard failover to the backup system > failover (id = 1, backup node = 2); > > # purge out the opersystem node > drop node (id = 1, event node = 2); > > > The purges node 1 from the current cluster, the updated cluster is the > correct at node 2 (sl_node has node 2 and 3), but node 3 does not get/apply > the drop node command (sl_node still has nodes 1, 2, 3). > > Looking at the slon log file on node 3 it is still trying to connect to > node 1, and I do not understand why the cluster topology change has not been > detected on node 3. > > Does anyone have any suggestions as to what the problem might be ? I am > using Slony 1.2.16 > > Many thanks, > > Andy > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20100114/e58cc697/attachment.htm
- Previous message: [Slony1-general] drop node not working correctly
- Next message: [Slony1-general] drop node not working correctly
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list