Bug 47 - EXECUTE SCRIPT runs on ALL nodes
Summary: EXECUTE SCRIPT runs on ALL nodes
Status: RESOLVED WORKSFORME
Alias: None
Product: Slony-I
Classification: Unclassified
Component: stored procedures (show other bugs)
Version: 1.2
Hardware: PC Linux
: medium normal
Assignee: Slony Bugs List
URL:
Depends on:
Blocks:
 
Reported: 2008-04-02 08:23 UTC by Johan Ström
Modified: 2010-06-18 09:31 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Johan Ström 2008-04-02 08:23:38 UTC
Hello

I'm running 1.2.9 (altough this seems to be accurate for CVS too) against pg 1.2. I just did an EXECUTE SCRIPT with a simple ALTER command. I have two sets, three nodes. Two masteres (1 and 2), with one set each, replicates each set to the one single slave (101).
The ALTER was performed on set 1, with no "EXECUTE ONLY ON" parameter.
So, what I expected here was that the ALTER command should be executed on node 1 and node 101, and node2 not touched. But instead the slon running against node2 borked. Looking at the logs, it showed that it was trying to run the ALTER TABLE command against that database. Which ofcourse failed since that database did not have set 1 (this one was master for set 2).

Looking closer I found http://slony.info/documentation/function.ddlscript-prepare-int-integer-integer.html and in particular this part:

	if v_set_origin <> v_no_id
			and not exists (select 1 from sl_subscribe
						where sub_set = p_set_id
						and sub_receiver = v_no_id)
	then
		return 0;
	end if;


Now if im right here, that would be pseudocode :

IF the origin node for this set is not me, AND i dont subscribe to this set, then abort without errors..

Now.. that sounds wrong? Shouldnt that be 


IF the origin node for this set is not me, OR i dont subscribe to this set, then abort without errors..


Do I just need to sleep and check my subscription setups (which looks fine to me in sl_subscribe, node 2 is not a subscriber of ANYTHING), or is this wrong?

In the meantime I guess one can run EXECUTE ONLY ON=101, which isnt very much problem here in my dev setup, but more of a problem when you have many slaves i guess.

Otherwise, thanks for a, even if somewhat tricky to administer, product! :) 

Regards
Johan
Comment 1 M O'Shea 2008-04-07 20:18:08 UTC
I don't have any more information about your problem.  However your question about whether an OR or an AND is required in the code you quoted I can answer.  It is definitely AND that you want there.  Using OR would lead to the query never being executed on any of the slaves (as they would all answer yes to "Am I not the origin of this set").
Comment 2 Johan Ström 2008-04-07 22:46:45 UTC
Yes, good point. I guess if it should be an OR it should read:

IF the origin node for this set IS me, OR i dont subscribe to this set,
then abort without errors..

or in sql:

        if v_set_origin = v_no_id
                        or not exists (select 1 from sl_subscribe
                                                where sub_set = p_set_id
                                                and sub_receiver = v_no_id)
        then
                return 0;
        end if;
Comment 3 M O'Shea 2008-04-07 23:36:04 UTC
No, that logic would lead to execution on the slaves that are subscribed to that set only.  The origin would exit as well as any other nodes in the cluster.

I believe that the original logic is correct.  So that means that either this piece of code is not the cause of your problem or there is another bug in there somewhere.

If you are still not sure can you try executing some of the code in ddlscript_prepare_int (with your real values) and check that it all follows.
Comment 4 Johan Ström 2008-04-08 00:07:11 UTC
Ah, the masters execute this way to, then it makes sense. I guess I just needed to sleep hehe. 

Well, I tried the ddlscript_prepare_int(1, -1); and it gave the expected results.. So I guess the problem lies somewhere else... I don't really have time to dig too deep in this right now though :/
If anyone else might try, or if you got some quick pointers to what I could test?
Comment 5 Steve Singer 2010-06-18 09:31:53 UTC
We can't replicate this and the original reporter hasn't provided additional details in years.
Closing