[Slony1-general] slon won't start after EXECUTE QUERY

Thu Nov 11 13:04:24 PST 2004

Hello,

I'm brand new to SlonyI. Someone else set it up with a master node (node 1) 
and two slaves (node 2 and node 3). I needed to change the schema, and have 
successfully managed to break node 2 in the process (happily this is still in 
the development stage). Here's what happened. Hopefully someone can tell me 
what I did wrong:

1. First, I should mention that node 1 and node 2 are on the same machine 
(Linux), with node 3 on a seperate machine. I needed to change the data type 
of a column, using sql like this:
ALTER TABLE table ADD COLUMN field_new;
UPDATE table SET field_new = field;
ALTER TABLE table DROP COLUMN field;
ALTER TABLE table RENAME COLUMN field_new TO field.

2. I ran this script using the EXECUTE QUERY command in slonik. It failed 
initially, because I forgot that the schema containing the table I needed to 
modify was not in the search path for the 'slony' user. It failed on node 1, 
and appeared to be isolated there (i.e. the event did not get sent to the 
other two nodes). I've checked the Schemadoc, and this seems to be what 
happens. I also double checked the process list at that point, and verified 
that two slon processes were still running (for nodes 1 and 2).

3. I fixed the script and ran it a second time. It succeeded on node 1, and on 
node 3. But node 2 was unchanged, and further investigation showed that the 
corresponding slon process was dead. I tried restarting it, and it complained 
a few times about there being no remote worker thread for node 1, and died 
with an empty error message.

4. I manually fixed the schema on node 2, and started slon again. Slon died in 
the same way.

I checked the slonyI tables, and it appears the node 2 confirmed the SYNC 
event sent by node 1 just before the DDL_SCRIPT event (the timestamps of both 
events match). This suggests that the script killed node 2, and a quick glance 
at the remote worker thread source code suggests that if a script were to 
fail, the thread would immediately die. But I can't figure out why the slon 
process refuses to restart.

Does anyone have any thoughts?

David Pitkin