Casey Duncan casey
Wed Oct 4 11:21:35 PDT 2006
I am working on a schema upgrade script for a simple two node slony  
cluster (slony version 1.1.5, pg 8.1.4). Along with the secondary, we  
also use log shipping to forward to other nodes. In my test, however  
that is not important. I start by upgrading the schema and executing  
slonik commands like so:

CREATE SET (ID = 9999, ORIGIN = 1, COMMENT = 'Temporary set for add  
and merge');
[..Lots of SET ADD TABLE and SET ADD SEQUENCE commands...]
SUBSCRIBE SET (ID = 9999, PROVIDER = 1, RECEIVER = 2, FORWARD = yes);
MERGE SET (ID = 1, ADD ID = 9999, ORIGIN = 1);

This executes happily.

The secondary slon is run from a service script using the following  
command:

/usr/lib/postgresql/bin/slon -d 2 -a ${SPOOL_DIRECTORY} radio "$ 
{SECONDARY_CONNINFO}"

the spool directory exists and the secondary conninfo is correct.  
After running for a few seconds, it blows up with the following error:

2006-10-02 16:25:41 PDT ERROR  remoteWorkerThread_1: "delete from  
"_radio".sl_setsync_offline   where ssy_setid= 9999;notify  
"_radio_Event"; notify "_radio_Confirm"; insert into  
"_radio".sl_event     (ev_origin, ev_seqno, ev_timestamp,       
ev_minxid, ev_maxxid, ev_xip, ev_type , ev_data1    ) values ('1',  
'51', '2006-10-02 16:06:46.377823', '41823619', '69491150',  
'''41823619'',''41823624'',''41823629''', 'DROP_SET', '9999'); insert  
into "_radio".sl_confirm  (con_origin, con_received, con_seqno,  
con_timestamp)    values (1, 2, '51', now()); commit transaction;"  
PGRES_FATAL_ERROR ERROR:  relation "_radio.sl_setsync_offline" does  
not exist

Of course neither the secondary nor the primary have such a table  
_radio.sl_setsync_offline and near as I can tell only a log shipping  
subscriber node ever would. In the code this table is created only by  
the tools/slony1_dump.sh which is not run for "live" nodes in the  
cluster AFAIK.

In remote_worker.c I see code like so (starting line 774):

else if (strcmp(event->ev_type, "MERGE_SET") == 0)
{
	int set_id = (int)strtol(event->ev_data1, NULL, 10);
	int add_id = (int)strtol(event->ev_data2, NULL, 10);
	rtcfg_dropSet(add_id);

	slon_appendquery(&query1,
			 "select %s.mergeSet_int(%d, %d); ",
			 rtcfg_namespace,
			 set_id, add_id);
	
	/* Log shipping gets the change here
	 * that we need to delete the table
	 * being merged from the set being
	 * maintained. */
	if (archive_dir) {
		rc = open_log_archive(rtcfg_nodeid, seqbuf);
		rc = generate_archive_header(rtcfg_nodeid, seqbuf);
		rc = slon_mkquery(&query1,
				  "delete from %s.sl_setsync_offline "
				  "  where ssy_setid= %d;",
				  rtcfg_namespace, add_id);
		rc = submit_query_to_archive(&query1);
		rc = close_log_archive();
	}
}


AFAICS, this is where the 'delete from "_radio".sl_setsync_offline    
where ssy_setid= 9999;' query is generated. It looks like it should  
just be written to the archive file, but from what I can tell it is  
trying to execute the query on the secondary as well.

Perhaps this has been addressed in 1.2, though it's not really an  
option for me to upgrade to that within the release schedule we're  
under. Any suggestions for a workaround or an obvious error on my  
part? Seems like I could temporarily run slon without -a, but then  
the log shipping secondaries won't get updated properly.

Thanks.

-Casey




More information about the Slony1-general mailing list