Andy Dale andy.dale at gmail.com
Thu Jan 14 05:30:23 PST 2010
Hi,

I have attempted to investigate further into why the failover/drop node is
not being picked up on node 3.  Here is the actual output of the slonik
script in my original post:

[oper at backup slonik]$ slonik forceProviderChangeToBackup.sk
INFO: calling failedNode(1,2) on node 1
forceProviderChangeToBackup.sk:9: NOTICE:  failedNode: set 1 has other
direct receivers - change providers only
forceProviderChangeToBackup.sk:9: NOTICE:  failedNode: set 2 has no other
direct receivers - move now
forceProviderChangeToBackup.sk:9: NOTICE:  failedNode: set 3 has no other
direct receivers - move now
INFO: calling failedNode(1,2) on node 3
forceProviderChangeToBackup.sk:9: NOTICE:  failedNode: set 1 has other
direct receivers - change providers only
forceProviderChangeToBackup.sk:9: NOTICE:  failedNode: set 2 has no other
direct receivers - move now
forceProviderChangeToBackup.sk:9: NOTICE:  failedNode: set 3 has no other
direct receivers - move now
INFO: Waiting for slon engines to restart
IMPORTANT: Last known SYNC for set 1 = 383
INFO: Node with highest sync for set 1 is 2
INFO: Node with highest sync for set 2 is 2
INFO: Node with highest sync for set 3 is 2

After the inspecting the logfile generated by the slon process at node 3 and
it seems to pick up on the fact that the set has been moved to node 2, but
it does not remove node 1.

DEBUG2 remoteWorkerThread_2: Received event 2,180 ACCEPT_SET
DEBUG2 start processing ACCEPT_SET
DEBUG2 ACCEPT: set=1
DEBUG2 ACCEPT: old origin=1
DEBUG2 ACCEPT: new origin=2
DEBUG2 ACCEPT: move set seq=384
DEBUG2 got parms ACCEPT_SET
DEBUG2 ACCEPT_SET - node not origin
DEBUG2 remoteListenThread_2: queue event 2,183 SYNC
DEBUG2 remoteListenThread_2: queue event 2,184 DROP_NODE
DEBUG2 remoteListenThread_2: queue event 2,185 SYNC
DEBUG2 ACCEPT_SET - MOVE_SET or FAILOVER_SET not received yet - sleep
ERROR  slon_connectdb: PQconnectdb("dbname=db host=node1 port=5432
user=postgres") failed - could not connect to server: Connection refused
        Is the server running on host "node 1" and accepting
        TCP/IP connections on port 5432?
WARN   remoteListenThread_1: DB connection failed - sleep 10 seconds
DEBUG2 syncThread: new sl_action_seq 1 - SYNC 181
DEBUG2 remoteListenThread_2: LISTEN
DEBUG2 remoteListenThread_2: queue event 2,186 SYNC
DEBUG2 remoteListenThread_2: UNLISTEN
DEBUG2 ACCEPT_SET - MOVE_SET or FAILOVER_SET not received yet - sleep
DEBUG2 localListenThread: Received event 3,181 SYNC
ERROR  slon_connectdb: PQconnectdb("dbname=db host=node1 port=5432
user=postgres") failed - could not connect to server: Connection refused
        Is the server running on host "node 1" and accepting
        TCP/IP connections on port 5432?


Does the below line mean it is waiting for some kind of notification from
somewhere? :
    DEBUG2 ACCEPT_SET - MOVE_SET or FAILOVER_SET not received yet - sleep

Additionally, does anyone know how to make the slon logs contain a timestamp
(e.g. DEBUG2 [2009-01-14 12:12] syncThread), as I find it pretty hard to
follow what is going on when comparing the log files at multiple nodes.

Cheers,

Andy

2010/1/13 Andy Dale <andy.dale at gmail.com>

> Hi,
>
> I have set up a simple 3 node slony cluster, and every thing works pretty
> much as I would expect, however I am running into a few issue when using the
> drop node (in a failover scenario).
>
> I have a simple slonik script to perform a failover as follows (node 1 is
> the old master node to removed):
>
> #!/usr/bin/slonik
>
> include <preamble.sk>;
>
> # hard failover to the backup system
> failover (id = 1, backup node = 2);
>
> # purge out the opersystem node
> drop node (id = 1, event node = 2);
>
>
> The purges node 1 from the current cluster, the updated cluster is the
> correct at node 2 (sl_node has node 2 and 3), but node 3 does not get/apply
> the drop node command (sl_node still has nodes 1, 2, 3).
>
> Looking at the slon log file on node 3 it is still trying to connect to
> node 1, and I do not understand why the cluster topology change has not been
> detected on node 3.
>
> Does anyone have any suggestions as to what the problem might be ? I am
> using Slony 1.2.16
>
> Many thanks,
>
> Andy
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.slony.info/pipermail/slony1-general/attachments/20100114/e58cc697/attachment.htm 


More information about the Slony1-general mailing list