Jan Wieck JanWieck
Sat Nov 13 23:03:04 PST 2004
On 11/13/2004 4:42 PM, David Pitkin wrote:
> I guess my little problem was a bit vague, but I'm somewhat surprised no-one 
> thought of what now seems like the obvious reason that slon wouldn't start. I 
> was right in that node 2 died on the DDL SCRIPT. Then, everytime I tried to 
> restart slon, it re-received the event, and died again. I'm still quite 
> confused as to why the script fails on only this node (especially since the 
> exact same script succeeds when run from psql).

On the same node? Hmmm ... it might have to do with different 
permissions or search paths. What is the exact error message you get in 
the postmaster log ... the one that causes the script to abort the 
transaction?

> 
> Anyways, I need to get this problem fixed, which means one of three things: 
> deleting the DDL SCRIPT event from node 1, adding a fake confirmation from 
> node 2, or changing the script itself (the one saved in sl_event) to something 
> that will definitely succeed.

What about fixing the root of the problem? And without really knowing 
what causes the script to fail, the next question is a bit hard to answer.


Jan

> 
> Can anyone tell me which would be the best solution, and more importantly, how 
> to do it safely?
> 
> David Pitkin
> 
> 
> -------Original Message-------------------------
> Hello,
> 
> I'm brand new to SlonyI. Someone else set it up with a master node (node 1)
> and two slaves (node 2 and node 3). I needed to change the schema, and have
> successfully managed to break node 2 in the process (happily this is still in
> the development stage). Here's what happened. Hopefully someone can tell me
> what I did wrong:
> 
> 1. First, I should mention that node 1 and node 2 are on the same machine
> (Linux), with node 3 on a seperate machine. I needed to change the data type
> of a column, using sql like this:
> ALTER TABLE table ADD COLUMN field_new;
> UPDATE table SET field_new = field;
> ALTER TABLE table DROP COLUMN field;
> ALTER TABLE table RENAME COLUMN field_new TO field.
> 
> 2. I ran this script using the EXECUTE QUERY command in slonik. It failed
> initially, because I forgot that the schema containing the table I needed to
> modify was not in the search path for the 'slony' user. It failed on node 1,
> and appeared to be isolated there (i.e. the event did not get sent to the
> other two nodes). I've checked the Schemadoc, and this seems to be what
> happens. I also double checked the process list at that point, and verified
> that two slon processes were still running (for nodes 1 and 2).
> 
> 3. I fixed the script and ran it a second time. It succeeded on node 1, and on
> node 3. But node 2 was unchanged, and further investigation showed that the
> corresponding slon process was dead. I tried restarting it, and it complained
> a few times about there being no remote worker thread for node 1, and died
> with an empty error message.
> 
> 4. I manually fixed the schema on node 2, and started slon again. Slon died in
> the same way.
> 
> I checked the slonyI tables, and it appears the node 2 confirmed the SYNC
> event sent by node 1 just before the DDL_SCRIPT event (the timestamps of both
> events match). This suggests that the script killed node 2, and a quick glance
> at the remote worker thread source code suggests that if a script were to
> fail, the thread would immediately die. But I can't figure out why the slon
> process refuses to restart.
> 
> Does anyone have any thoughts?
> 
> David Pitkin
> 
> 
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at gborg.postgresql.org
> http://gborg.postgresql.org/mailman/listinfo/slony1-general


-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #


More information about the Slony1-general mailing list