Tue Aug 6 08:47:16 PDT 2013
- Previous message: [Slony1-general] Removing dead provider node gone wrong?
- Next message: [Slony1-general] Removing dead provider node gone wrong?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 08/06/2013 11:33 AM, Glyn Astill wrote: > Hi Guys, > > We're running slony 2.1.3, and one of my slaves has failed. The issue is > that the failed slave node is a provider to another downstream slave; am > I right in thinking I have to drop both the failed node and the > downstream subscriber slave? > > My setup basically looks like this, where subscriber2 has failed: > > origin ---> subscriber1 > ---> subscriber2 ---> subscriber3 > > > First I tried to reshape the subscription on subscriber3, but this > didn't work: > > SUBSCRIBE SET ( ID=@my_set, PROVIDER = @origin, RECEIVER = @subscriber3, > FORWARD = YES); > > This failed with the following message: > > glyn at x:/usr/share/slonik$ slonik reshape_provider.scr > reshape_provider.scr:3: could not connect to server: Connection refused > Is the server running on host "10.16.10.101" and accepting > TCP/IP connections on port 5432? You need to make the resubscribe set work before doing the DROP NODE, you can't drop a provider node. It isn't obvious to me why why slonik is trying to connect to node 2. Which command is line 3 of that script? What is on lines 1 and 2? Are the conninfo lines correct for nodes 1 and 3? > > Where 10.16.10.101 is the IP of subscriber2. So I tried to just drop the > node: > > DROP NODE ( ID = @subscriber2, EVENT NODE = @origin ); > > And the following happened: > > glyn at x:/usr/share/slonik$ slonik drop_node.scr > drop_node.scr:3: could not connect to server: Connection refused > Is the server running on host "10.16.10.101" and accepting > TCP/IP connections on port 5432? > waiting for events (7,5014269532) only at (7,5014260307) to be confirmed > on node 5 > waiting for events (7,5014269532) only at (7,5014260307) to be confirmed > on node 5 > waiting for events (7,5014269532) only at (7,5014260307) to be confirmed > on node 5 > waiting for events (7,5014269532) only at (7,5014260307) to be confirmed > on node 5 > waiting for events (7,5014269532) only at (7,5014260307) to be confirmed > on node 5 > > Where "node 5" is subscriber3. > > So now slonik is waiting on subscriber3 to come in sync, but it's just > trying to sync from subscriber2 which has gone. Heres the log from > subscriber3: > > 2013-08-06_163034 BSTERROR slon_connectdb: PQconnectdb("dbname=SEE > host=10.16.10.101 user=slony") failed - could not connect to server: > Connection refused > Is the server running on host "10.16.10.101" and accepting > TCP/IP connections on port 5432? > 2013-08-06_163034 BSTWARN remoteListenThread_4: DB connection failed - > sleep 10 seconds > 2013-08-06_163034 BSTDEBUG2 remoteWorkerThread_7: SYNC 5014260308 processing > 2013-08-06_163034 BSTERROR slon_connectdb: PQconnectdb("dbname=SEE > host=10.16.10.101 user=slony") failed - could not connect to server: > Connection refused > Is the server running on host "10.16.10.101" and accepting > TCP/IP connections on port 5432? > 2013-08-06_163034 BSTERROR remoteWorkerThread_7: cannot connect to data > provider 4 on 'dbname=SEE host=10.16.10.101 user=slony' > 2013-08-06_163034 BSTDEBUG2 remoteListenThread_7: queue event > 7,5014270211 SYNC > 2013-08-06_163034 BSTDEBUG2 remoteWorkerThread_8: forward confirm > 7,5014270210 received by 8 > 2013-08-06_163036 BSTDEBUG2 syncThread: new sl_action_seq 1 - SYNC > 5005139878 > 2013-08-06_163036 BSTDEBUG2 remoteListenThread_7: queue event > 7,5014270212 SYNC > 2013-08-06_163036 BSTDEBUG2 remoteListenThread_8: queue event > 8,5013135166 SYNC > 2013-08-06_163036 BSTDEBUG2 remoteWorkerThread_8: Received event #8 from > 5013135166 type:SYNC > 2013-08-06_163036 BSTDEBUG1 calc sync size - last time: 1 last length: > 10069 ideal: 5 proposed size: 3 > 2013-08-06_163036 BSTDEBUG2 remoteWorkerThread_8: SYNC 5013135166 processing > 2013-08-06_163036 BSTDEBUG1 remoteWorkerThread_8: no sets need syncing > for this event > 2013-08-06_163036 BSTDEBUG2 remoteWorkerThread_8: forward confirm > 7,5014270211 received by 8 > 2013-08-06_163042 BSTDEBUG2 localListenThread: Received event > 5,5005139878 SYNC > 2013-08-06_163042 BSTDEBUG2 remoteListenThread_7: queue event > 7,5014270213 SYNC > 2013-08-06_163042 BSTDEBUG2 remoteListenThread_7: queue event > 7,5014270214 SYNC > 2013-08-06_163042 BSTDEBUG2 remoteListenThread_7: queue event > 7,5014270215 SYNC > 2013-08-06_163042 BSTDEBUG2 remoteWorkerThread_8: forward confirm > 5,5005139878 received by 8 > 2013-08-06_163042 BSTDEBUG2 remoteWorkerThread_8: forward confirm > 7,5014270214 received by 8 > > > So what do I do? I presume I'll be waiting forever, so do I kill slonik > and drop subscriber3 too?
- Previous message: [Slony1-general] Removing dead provider node gone wrong?
- Next message: [Slony1-general] Removing dead provider node gone wrong?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list