Sven Willenberger sven
Fri Jan 26 06:55:55 PST 2007
I came across an issue with 1.2.6 involving using execute script to make
a basic DDL change to a table, namely adding a column to an existing
replicated table. In a nutshell, a node not part of the set in question
was asked to make the DDL change to the [non-existent] table and then
subsequently the call to alterTableForReplication fails.

The setup: 3 servers A B and C are set up with differents SETs
containing different nodes. For example:
NODE 1 = server A
NODE 2 = server B
NODE 3 = server C

SET 1 does some replication between A and B
SET 2 does some replication between B and C
SET 3 does some replication between C and A B

Now there is a [tablename] on B and C to which I added a column and then
subsequently created a default value for.

then: 

EXECUTE SCRIPT (SET ID = 2, FILENAME = 'path/to/DDL_script', EVENT NODE
= 2);

What ended up happening was, tables were locked on B and C, the table
was updated with the new changes, then the tables were made ready for
relication again. 

Problem: Server A (node 1) ended up trying to make the change to the
same table (which doesn't exist) and ultimately the slon process died
when it cycled through enough pids to have a duplicate entry when
"cleaning stale entries". I added the table to A just so as to eliminate
that error at which point alterTableForReplication() errored out stating
that 'Table "public"."<first table in sl_tables>" is already in altered
state'. (I ended up getting past this by altering the function to bypass
the guts and just return the value expected which finally allowed
replication to continue).

Sven




More information about the Slony1-general mailing list