Bug 133 - DROP set in the middle of a subscribe to the same set confuses slon
Summary: DROP set in the middle of a subscribe to the same set confuses slon
Status: NEW
Alias: None
Product: Slony-I
Classification: Unclassified
Component: slon (show other bugs)
Version: 2.0
Hardware: PC Linux
: low normal
Assignee: Slony Bugs List
URL:
Depends on:
Blocks:
 
Reported: 2010-06-01 12:01 UTC by Steve Singer
Modified: 2010-07-27 11:55 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Singer 2010-06-01 12:01:13 UTC
Consider a setup such as

1==>3===>4

Node 1 is the origin for a replication set (set 2).

This set is subscribed to on node 2.

We then concurrently issue (with different slonik instances)

1:  subscribe set(set id=2, provider=3, receiver=4)
2:  drop set (id=2, origin=1);

It is possible for the slons at various nodes to get confused and start logging things like

db5 - 2010-06-01 14:51:02 EDTERROR  remoteWorkerThread_3: "select "_disorder_replica".subscribeSet_int(2, 3, 4, 't', 'f'); insert into "_disorder_replica".sl_event     (ev_origin, ev_seqno, ev_timestamp,      ev_snapshot, ev_type , ev_data1, ev_data2, ev_data3, ev_data4, ev_data5    ) values ('3', '5000000005', '2010-06-01 14:50:36.965441', '690742:690742:', 'SUBSCRIBE_SET', '2', '3', '4', 't', 'f'); insert into "_disorder_replica".sl_confirm     (con_origin, con_received, con_seqno, con_timestamp)    values (3, 5, '5000000005', now()); commit transaction;" PGRES_FATAL_ERROR ERROR:  Slony-I: subscribeSet_int(): set 2 not found



Note that the slon complaining is for node 5, and is not directly involved in the actions on set 2 (but other slons log similar errors).


I realize that I am trying to shoot myself in the foot with this example but I can see how this can happen in real life (two admins that don't talk or misbehaving scripts) and it shouldn't be that easy to corrupt my replication cluster.  I would think that the subscribeSet_int should log an error but mark the event as processed?
Comment 1 Christopher Browne 2010-06-15 15:27:04 UTC
(In reply to comment #0)
Yep, there should be something here to behave less badly.
Comment 2 Steve Singer 2010-07-27 11:55:26 UTC
I am worried that marking that we can't just ignore+mark confirmed the subscribe set event on nodes where the set does not exist in sl_set.

node 5 would have no way of knowing if their is no row in sl_set because a DROP SET has already been processed or if it is because the CREATE SET has not yet been received by node 5.   Both the create set and the drop set will be coming from the origin but the subscribe set could be coming from a receiver.

We could try to 'remmeber' the set after a DROP SET for some period of time/messages but what would that period be? An arbitrary value would only delay the issue.   You could remember all old sets, 

We could try to make the subscribe set come from the origin not from the provider. This would mean that it comes from the same place as the create/drop set commands but that will have implications elsewhere.

Ideas welcome