John Sidney-Woollett johnsw
Sat Aug 13 08:22:37 PDT 2005
Can anyone explain why slon on (the master) node #1 stopped when a MOVE 
SET command was issued on that node - see the FATAL error notice below. 
But when slon on node 1 was restarted, it processed the as yet 
unprocessed MOVE SET correctly.

The slonik script below was executed on fs01b, node #1, and this is the 
server where the slon process died. The slon process on db01a, node #2 
stayed up fine during all the move set operations.

No applications were running against either database during the switch 
over. But I did have one psql session open against each db to check that 
the moves worked OK by issuing SQL statements for the appropriate tables 
after the move.

We had 6 sets to move and only the first move (set #6) didn't terminate 
the slon process. However, all the move sets seem to have worked OK.

I checked the _bpreplicate2.sl_set table (on both nodes), and the 
set_origin is now 2, and all the tables are unlocked on node #2. All 
sequences and tables seem to be being replicated correctly (now from 
node #2 to node #1).

I'm using slony 1.1 with postgres 7.4.6

Any ideas?

John

slonik move set script
======================
#!/bin/bash

/usr/local/pgsql/bin/slonik << _END_

# define the cluster namespace
cluster name = bpreplicate2;

# define two nodes connection information
node 1 admin conninfo = 'dbname=bp_live host=fs01b user=postgres';
node 2 admin conninfo = 'dbname=bp_live host=db01a user=postgres';

# move set from node 1 to node 2
lock set (id = 5, origin = 1);
wait for event (origin = 1, confirmed = 2);
move set (id = 5, old origin = 1, new origin = 2);
wait for event (origin = 1, confirmed = 2);

_END_



output from slonlog for node 1
==============================
2005-08-13 05:56:21 GMT FATAL  localListenThread: MOVE_SET but no 
provider found for set 5
2005-08-13 05:56:21 GMT DEBUG1 slon: shutdown requested
2005-08-13 05:56:21 GMT DEBUG2 slon: notify worker process to shutdown
2005-08-13 05:56:21 GMT DEBUG2 slon: wait for worker process to shutdown
2005-08-13 05:56:21 GMT DEBUG1 syncThread: thread done
2005-08-13 05:56:21 GMT DEBUG1 cleanupThread: thread done
2005-08-13 05:56:21 GMT INFO   remoteListenThread_2: disconnecting from 
'dbname=bp_live host=db01a user=postgres'
2005-08-13 05:56:21 GMT DEBUG2 remoteWorkerThread_2: forward confirm 
1,10810 received by 2
2005-08-13 05:56:21 GMT DEBUG1 remoteListenThread_2: thread done
2005-08-13 05:56:21 GMT DEBUG1 main: scheduler mainloop returned
2005-08-13 05:56:21 GMT DEBUG2 main: wait for remote threads
2005-08-13 05:56:21 GMT DEBUG2 sched_wakeup_node(): no_id=2 (0 threads + 
worker signaled)
2005-08-13 05:56:21 GMT DEBUG1 remoteWorkerThread_2: helper thread for 
provider 2 terminated
2005-08-13 05:56:21 GMT DEBUG1 remoteWorkerThread_2: disconnecting from 
data provider 2
2005-08-13 05:56:21 GMT DEBUG1 remoteWorkerThread_2: thread done
2005-08-13 05:56:21 GMT DEBUG2 main: notify parent that worker is done
2005-08-13 05:56:21 GMT DEBUG1 main: done
2005-08-13 05:56:21 GMT DEBUG2 slon: worker process shutdown ok
2005-08-13 05:56:21 GMT DEBUG2 slon: exit(-1)
[snipped]

[slon restarted on node 1]

[snipped]
2005-08-13 05:58:07 GMT DEBUG2 start processing ACCEPT_SET
2005-08-13 05:58:07 GMT DEBUG2 ACCEPT: set=5
2005-08-13 05:58:07 GMT DEBUG2 ACCEPT: old origin=1
2005-08-13 05:58:07 GMT DEBUG2 ACCEPT: new origin=2
2005-08-13 05:58:07 GMT DEBUG2 ACCEPT: move set seq=7304
2005-08-13 05:58:07 GMT DEBUG2 got parms ACCEPT_SET
2005-08-13 05:58:07 GMT DEBUG2 ACCEPT_SET - on origin node...
2005-08-13 05:58:07 GMT DEBUG2 ACCEPT_SET - done...



More information about the Slony1-general mailing list