Steve Singer ssinger at ca.afilias.info
Tue Aug 6 08:47:16 PDT 2013
On 08/06/2013 11:33 AM, Glyn Astill wrote:


> Hi Guys,
>
> We're running slony 2.1.3, and one of my slaves has failed. The issue is
> that the failed slave node is a provider to another downstream slave; am
> I right in thinking I have to drop both the failed node and the
> downstream subscriber slave?
>
> My setup basically looks like this, where subscriber2 has failed:
>
> origin ---> subscriber1
> ---> subscriber2 ---> subscriber3
>
>
> First I tried to reshape the subscription on subscriber3, but this
> didn't work:
>
> SUBSCRIBE SET ( ID=@my_set, PROVIDER = @origin, RECEIVER = @subscriber3,
> FORWARD = YES);
>
> This failed with the following message:
>
> glyn at x:/usr/share/slonik$ slonik reshape_provider.scr
> reshape_provider.scr:3: could not connect to server: Connection refused
> Is the server running on host "10.16.10.101" and accepting
> TCP/IP connections on port 5432?

You need to make the resubscribe set work before doing the DROP NODE, 
you can't drop a provider node.

It isn't obvious to me why why slonik is trying to connect to node 2. 
Which command is line 3 of that script?  What is on lines 1 and 2?  Are 
the conninfo lines correct for nodes 1 and 3?




>
> Where 10.16.10.101 is the IP of subscriber2. So I tried to just drop the
> node:
>
> DROP NODE ( ID = @subscriber2, EVENT NODE = @origin );
>
> And the following happened:
>
> glyn at x:/usr/share/slonik$ slonik drop_node.scr
> drop_node.scr:3: could not connect to server: Connection refused
> Is the server running on host "10.16.10.101" and accepting
> TCP/IP connections on port 5432?
> waiting for events (7,5014269532) only at (7,5014260307) to be confirmed
> on node 5
> waiting for events (7,5014269532) only at (7,5014260307) to be confirmed
> on node 5
> waiting for events (7,5014269532) only at (7,5014260307) to be confirmed
> on node 5
> waiting for events (7,5014269532) only at (7,5014260307) to be confirmed
> on node 5
> waiting for events (7,5014269532) only at (7,5014260307) to be confirmed
> on node 5
>
> Where "node 5" is subscriber3.
>
> So now slonik is waiting on subscriber3 to come in sync, but it's just
> trying to sync from subscriber2 which has gone. Heres the log from
> subscriber3:
>
> 2013-08-06_163034 BSTERROR slon_connectdb: PQconnectdb("dbname=SEE
> host=10.16.10.101 user=slony") failed - could not connect to server:
> Connection refused
> Is the server running on host "10.16.10.101" and accepting
> TCP/IP connections on port 5432?
> 2013-08-06_163034 BSTWARN remoteListenThread_4: DB connection failed -
> sleep 10 seconds
> 2013-08-06_163034 BSTDEBUG2 remoteWorkerThread_7: SYNC 5014260308 processing
> 2013-08-06_163034 BSTERROR slon_connectdb: PQconnectdb("dbname=SEE
> host=10.16.10.101 user=slony") failed - could not connect to server:
> Connection refused
> Is the server running on host "10.16.10.101" and accepting
> TCP/IP connections on port 5432?
> 2013-08-06_163034 BSTERROR remoteWorkerThread_7: cannot connect to data
> provider 4 on 'dbname=SEE host=10.16.10.101 user=slony'
> 2013-08-06_163034 BSTDEBUG2 remoteListenThread_7: queue event
> 7,5014270211 SYNC
> 2013-08-06_163034 BSTDEBUG2 remoteWorkerThread_8: forward confirm
> 7,5014270210 received by 8
> 2013-08-06_163036 BSTDEBUG2 syncThread: new sl_action_seq 1 - SYNC
> 5005139878
> 2013-08-06_163036 BSTDEBUG2 remoteListenThread_7: queue event
> 7,5014270212 SYNC
> 2013-08-06_163036 BSTDEBUG2 remoteListenThread_8: queue event
> 8,5013135166 SYNC
> 2013-08-06_163036 BSTDEBUG2 remoteWorkerThread_8: Received event #8 from
> 5013135166 type:SYNC
> 2013-08-06_163036 BSTDEBUG1 calc sync size - last time: 1 last length:
> 10069 ideal: 5 proposed size: 3
> 2013-08-06_163036 BSTDEBUG2 remoteWorkerThread_8: SYNC 5013135166 processing
> 2013-08-06_163036 BSTDEBUG1 remoteWorkerThread_8: no sets need syncing
> for this event
> 2013-08-06_163036 BSTDEBUG2 remoteWorkerThread_8: forward confirm
> 7,5014270211 received by 8
> 2013-08-06_163042 BSTDEBUG2 localListenThread: Received event
> 5,5005139878 SYNC
> 2013-08-06_163042 BSTDEBUG2 remoteListenThread_7: queue event
> 7,5014270213 SYNC
> 2013-08-06_163042 BSTDEBUG2 remoteListenThread_7: queue event
> 7,5014270214 SYNC
> 2013-08-06_163042 BSTDEBUG2 remoteListenThread_7: queue event
> 7,5014270215 SYNC
> 2013-08-06_163042 BSTDEBUG2 remoteWorkerThread_8: forward confirm
> 5,5005139878 received by 8
> 2013-08-06_163042 BSTDEBUG2 remoteWorkerThread_8: forward confirm
> 7,5014270214 received by 8
>
>
> So what do I do? I presume I'll be waiting forever, so do I kill slonik
> and drop subscriber3 too?



More information about the Slony1-general mailing list