Wed Dec 22 13:30:21 PST 2010
- Previous message: [Slony1-hackers] automatic WAIT FOR proposal
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 10-12-22 10:45 AM, Steve Singer wrote: > On 10-12-10 04:54 PM, Christopher Browne wrote: > > I will write a design proposal that explores what things might look like > if we went that route. > First I will define some notation. Let an event with sequence number s originating at node n be known as E(n,s) (ie 1,1234) An event E(n,s) has been confirmed by node a if the function C(a,E(n,s)) returns true. If a direct path exists between two nodes a,b then the function P(a,b) returns true. Note P(a,b) != P(b,a) I will now define a function that returns the events from other nodes confirmed on node a when E(a,s) is generated. Visible(a,E(a,s) = [E(1,s1),E(2,s2)....E(z,sz)] For example: If when event 1,100 was generated on node 1 node 1 had sl_confirm entires indicating that it has confirmed events 2,5 3,10 and 4,50. Visible(1,E(1,100)) = [E(1,100), E(2,5), E(3,10), E(4,50) ] The order in which a set of events leading up to an Event E(n,s) is processed on a node can be expressed as an ordered list returned by the Order(a,n,s) function: Order(3,2,2)= E(1,1), E(2,1),E(1,2),E(3,1),E(2,2) This means that on node 3, the events leading up to processing of event E(2,2) were E(2,1) then E(1,2) then E(3,1). A race condition is said to exists when ever the system allows two different orderings for Order(a,n,s) that produce a different resulting action. Two different orderings leading to an event being processed might not lead to a race condition if their are no dependencies between the events that were processed in a different order. Two events conflict Conflict(E(n1,s1),E(n2,s2))=TRUE If a different result is produced by Order(n3,E(n1,s1)=E(n2,s2),E(n1,s1) compared with Order(n3,E(n2,s2)=E(n1,s1),E(n2,s2) Whether two events conflict is dependent on the event type and the arguments to those events. Proposition P1: If an event E(1,1) followed by an event E(1,1+x) is submitted to node 1 by slonik then these two events will not cause a race condition with each other. Proof: The createEvent() function obtains an exclusive lock on sl_event and gets the next sequence number for the event id. Because the exclusive lock on sl_event is held until the transaction is committed it is impossible for a row to be visible in sl_event with a higher sequence number than another row that will be committed later. (assumes all events get generated while holding the lock on sl_event). This means that on the event node E(1,1+x) will always happen after E(1,1). It also means that if a remote node sees E(1,1+x) in the sl_event table then it will also see (or have already processed) E(1,1) because E(1,1) would have committed before and we don't delete sl_event rows that have not been processed. Therefore as long as the remote nodes process the events from a particular node in event id order (which they do) then the remote node will always process E(1,1) before it processes E(1,1+x). Proposition 2: If an Event (n1,s1) is only applied on node n3 when Visible(n1,E(n1,s1)) CONTAINED IN Order(n3,E(n1,s1)) then no race condition is possible. Counter Proof: Consider the following sequence of events: E(1,2) : Visible(1,2)=>E(2,1) E(3,1) E(2,2): Visible(2,2)=>E(1,1) E(3,1) gives Order(1,E(2,2)=> E(1,2), E(2,2) Order(2,E(1,2)=> E(2,2), E(1,2) At node three E(2,2) might get applied before or after E(1,2). The second part of the counter proof is to show that there exist event types that can be generated on nodes 1,2 such that Conflict(E(1,2),E(2,2)=TRUE. create set(id=1,origin=1); #E(1,2) create set(id=1,origin=2); #E(2,2) The behavior on node 3 is clearly differ based on which command arrives at node 3 first. Furthermore node 1 and node 2 will be left in different states as well. Proposition 3: If an event E(n1,s1) is not applied on node n3 until all events in Visible(n1,E(n1,s1) have all been confirmed on n3 then no event E(n2,s1) exists that produces a race condition on n3 unless the conflict existed on the event nodes n1 and n2. Proof: Assume Visible(1,E(1,1)) does not include an event E(2,1) such that there exists a race condition at node 3. This means that Order(3,E(1,1)==> E(2,1),E(1,1) produces different result than Order(3,E(1,1)===> E(1,1),E(2,1) this means that Conflict(E(1,1), E(2,1))=TRUE and that conflict would exist on node 1 and node 2. If E(2,1) had reached node 1 before E(1,1) was created then E(2,1) would be part of Visible(1,E(1,1)) and would always be confirmed on node 3 before E(2,1) gets applied the assumptions in the proposition. If E(1,1) had reached node 2 before E(2,1) was generated then Visible(2,E(2,1)) would include E(1,1) and E(1,1) would always get applied on node 3 before E(2,1) because of the assumptions we made in our proposition. This means that if IF a) we can somehow avoid race conditions at the event nodes AND b) implement the rule "apply all events that were visible on the event origin before applying the event" then we can eliminate race conditions. i) I feel (a) needs to be handled by slonik and can be handled by slonik waiting for the next event node to be caught up to previous event nodes before submitting an event to it (discussed in a previous email on this thread, though I have not formally tried to prove this ii) can be implemented by adding a new column to sl_event that stores an array of event tuples - which consist of the highest events from each node confirmed on the node at the time sl_event is generated. The a remoteWorker won't apply an event to its local node until the pre-conditions have been met. Questions: What is the performance impact of getting the highest confirmed event id values? This is unknown - but will probably involve querying sl_event and/or sl_confirm. Is it possible for an event to be included in Visible(1,E(1,1)) that might never make it to a node n3 that is trying to process E(1,1)? I think no. Every event E(x,y) that is in Visible(1,E(1,1)) is in the sl_event table on node 1 by virtue of it having been confirmed on node 1 but not confirmed elsewhere. If slon always pulls events (no matter what the event origin) into node 3 from the node 1 sl_event table then it will include E(x,y). Even if node 3 isn't directly talking to node 1, any sl_event table in the cluster that contains E(1,1) would also contain E(x,y) by virtue of that event being on node 1 at the time. I feel this produces a solution to wait for where slonik only needs to wait to make sure the next event node has received all events so far. This does NOT deal with race conditions that can be created by multiple concurrent slonik scripts using different event nodes but I might be able to solve this with an sl_sloniklock type of table. Issues relating to TRY blocks are still somewhat outstanding - as discussed in the other email.
- Previous message: [Slony1-hackers] automatic WAIT FOR proposal
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-hackers mailing list