Casey Duncan casey
Mon Jan 29 17:37:08 PST 2007
I recently upgraded to slony 1.2.6. We were rehearsing a database  
schema upgrade for a two node slony cluster and came across an error  
at the end. We need to avoid having the execute script do its  
exclusive locks with the application talking to the database, so we  
do it using the following dance:

1. Turn off slon daemons
2. Switch application to use the secondary (read-only of course)
3. Run the upgrade script on the primary using execute script and add  
tables & sequences into a new set and merge waiting for subscriptions  
to be confirmed (which blocks).
4. Switch the application back to the primary db
5. Turn the slon daemons back on (which unblocks #3).

All was well until step #5 when we got this error:
<stdin>:12: PGRES_FATAL_ERROR select "_radio".mergeSet(1, 9999);  -  
ERROR:  Slony-I: set 9999 has subscriptions in progress - cannot merge

To provide more details, here's what we actually run in step 3 (minus  
the ddl):

CREATE SET (ID = 9999, ORIGIN = 1, COMMENT = 'Temporary set for add  
and merge');
...Execute DDL via EXECUTE SCRIPT...
SET ADD TABLE (SET ID = 9999, ORIGIN = 1, ID = 39, FULLY QUALIFIED  
NAME = 'public.new_table1');
SET ADD SEQUENCE (SET ID = 9999, ORIGIN = 1, ID = 18, FULLY QUALIFIED  
NAME = 'public.new_seq');
SET ADD TABLE (SET ID = 9999, ORIGIN = 1, ID = 40, FULLY QUALIFIED  
NAME = 'public.new_table2');
WAIT FOR EVENT (ORIGIN = ALL, CONFIRMED = ALL, TIMEOUT = 0);
SUBSCRIBE SET (ID = 9999, PROVIDER = 1, RECEIVER = 2, FORWARD = yes);
WAIT FOR EVENT (ORIGIN = ALL, CONFIRMED = ALL, TIMEOUT = 0);
MERGE SET (ID = 1, ADD ID = 9999, ORIGIN = 1);

The cluster we are upgrading has two nodes, id 1 (origin) and id 2.

So we explicitly wait for the ADDs to complete before SUBSCRIBE and  
we also wait for the SUBSCRIBE to complete before the MERGE (assuming  
I understand WAIT properly, which I may not). So we get this error  
which seems to indicate that the SUBSCRIBE was not in fact complete  
before the MERGE was executed, does that make sense to anyone? Note  
that step #3 above did indeed block as I would expect until we turned  
the slon daemons back on, so the WAITs were doing something.

The slon logs look relatively uninteresting and predictably have  
trouble subsequent to this because the new tables in the unmerged set  
cannot be found:

2007-01-29 16:41:56 PST ERROR  remoteWorkerThread_1: Could not find  
table "public"."new_table1" on subscriber

Any insights are appreciated. I'll be trying to reproduce this in a  
bit more isolated environment too.

-Casey



More information about the Slony1-general mailing list