Michael Lorenz mlorenz at evryx.com
Thu Oct 23 12:38:54 PDT 2008
Hi all,

I ran into a problem with Slony yesterday, and I'm hoping someone can  
help me resolve it.  I've checked the mailing lists, searched the web,  
etc., but I didn't find a definitive answer anywhere.  The closest set  
of messages I've found on this list can be found at:  http://www.mail-archive.com/pgadmin-support@postgresql.org/msg07704.html

What I was doing was adding a new table to the schema and trying to  
set up replication for it.  I did it twice;  once for a test table  
(which worked), then a second time with a different table (which  
didn't).  The script I used looked like this:

	CREATE SET ( ID = 2, ORIGIN = 1, COMMENT = 'set comment' );
	SET ADD TABLE ( SET ID = 2, ORIGIN = 1, ID = 39, FULLY QUALIFIED NAME  
= 'public.testtable', COMMENT = 'table comment' );
	SUBSCRIBE SET (ID = 2, PROVIDER = 1, RECEIVER = 2);
	SYNC ( ID = 1 );
	WAIT FOR EVENT (ORIGIN = 1, CONFIRMED = 2);
	MERGE SET ( ID = 1, ADD ID = 2, ORIGIN = 1 );

Now, I'm not sure if this is relevant, but I tried to add the second  
table about 20-30 minutes after the first one.  As far as I can tell,  
the MERGE SET from the first table had completed, since there was no  
set 2 in the sl_set table anymore.

Replication has stopped since that error happened, and here's what  
keeps on showing up in the log now:

	2008-10-23 13:26:47 GMT DEBUG2 localListenThread: Received event  
2,1082313 SYNC
	2008-10-23 13:26:50 GMT DEBUG2 remoteListenThread_1: LISTEN
	2008-10-23 13:26:50 GMT DEBUG1 copy_set 2
	2008-10-23 13:26:50 GMT ERROR  remoteWorkerThread_1: set 2 not found  
in runtime configuration
	2008-10-23 13:26:50 GMT WARN   remoteWorkerThread_1: data copy for  
set 2 failed - sleep 60 seconds

Short of doing a node subscribe/unsubscribe (which I don't want to try  
if this happens in production), is there some way to get this cleaned  
up and replication started again?  If it means deleting some record(s)  
from sl_event, is there a good way to identify which is/are the  
offending records to remove?  Or, since replication seems to be  
frozen, could I just blow away all records in sl_event on the slave,  
or...?

Thanks,
     Michael



More information about the Slony1-general mailing list