Michael Lorenz mlorenz at evryx.com
Mon Oct 27 11:04:13 PDT 2008
OK, so blowing away all of the records in sl_event *did* get  
replication started again.

However, as I said in the original post, this is not something I want  
to try in production.  Can anyone provide any information as to how to  
get Slony back on track when replication gets messed up as I described  
below?  Of particular use would be how to identify problem sl_event  
records (if doing that is the best way).

Thanks,
     Michael

On Oct 23, 2008, at 12:38 PM, Michael Lorenz wrote:

> Hi all,
>
> I ran into a problem with Slony yesterday, and I'm hoping someone  
> can help me resolve it.  I've checked the mailing lists, searched  
> the web, etc., but I didn't find a definitive answer anywhere.  The  
> closest set of messages I've found on this list can be found at:  http://www.mail-archive.com/pgadmin-support@postgresql.org/msg07704.html
>
> What I was doing was adding a new table to the schema and trying to  
> set up replication for it.  I did it twice;  once for a test table  
> (which worked), then a second time with a different table (which  
> didn't).  The script I used looked like this:
>
> 	CREATE SET ( ID = 2, ORIGIN = 1, COMMENT = 'set comment' );
> 	SET ADD TABLE ( SET ID = 2, ORIGIN = 1, ID = 39, FULLY QUALIFIED  
> NAME = 'public.testtable', COMMENT = 'table comment' );
> 	SUBSCRIBE SET (ID = 2, PROVIDER = 1, RECEIVER = 2);
> 	SYNC ( ID = 1 );
> 	WAIT FOR EVENT (ORIGIN = 1, CONFIRMED = 2);
> 	MERGE SET ( ID = 1, ADD ID = 2, ORIGIN = 1 );
>
> Now, I'm not sure if this is relevant, but I tried to add the second  
> table about 20-30 minutes after the first one.  As far as I can  
> tell, the MERGE SET from the first table had completed, since there  
> was no set 2 in the sl_set table anymore.
>
> Replication has stopped since that error happened, and here's what  
> keeps on showing up in the log now:
>
> 	2008-10-23 13:26:47 GMT DEBUG2 localListenThread: Received event  
> 2,1082313 SYNC
> 	2008-10-23 13:26:50 GMT DEBUG2 remoteListenThread_1: LISTEN
> 	2008-10-23 13:26:50 GMT DEBUG1 copy_set 2
> 	2008-10-23 13:26:50 GMT ERROR  remoteWorkerThread_1: set 2 not  
> found in runtime configuration
> 	2008-10-23 13:26:50 GMT WARN   remoteWorkerThread_1: data copy for  
> set 2 failed - sleep 60 seconds
>
> Short of doing a node subscribe/unsubscribe (which I don't want to  
> try if this happens in production), is there some way to get this  
> cleaned up and replication started again?  If it means deleting some  
> record(s) from sl_event, is there a good way to identify which is/ 
> are the offending records to remove?  Or, since replication seems to  
> be frozen, could I just blow away all records in sl_event on the  
> slave, or...?
>
> Thanks,
>    Michael
>
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general
>

Michael Lorenz

evryx technologies inc.
412 w. broadway, suite 201
glendale, california 91204
818.552.3568 x117            www.evryx.com





More information about the Slony1-general mailing list