Jan Wieck JanWieck
Fri Dec 23 16:05:39 PST 2005
On 12/23/2005 4:57 AM, Florian G. Pflug wrote:
> Jan Wieck wrote:
>> On 12/21/2005 8:18 PM, Florian G. Pflug wrote:
>> 
>>> Florian G. Pflug wrote:
>>> <snipped my own mail>
>>>
>>> Can anyone confirm that this is actually a bug? I pretty sure
>>> (Did multiple setups of my cluster, and the problem persisted -
>>> I used the altperl scripts for setting up the cluster, so I
>>> see no way I could have causes this).
>>>
>>> If it's really I bug, I would at least be worth a note in
>>> the docs or in the 1.1.5 release notes - I took me hours to
>>> nail down the problem, and it wasn't fun, so preventing
>>> others from having to do the same would be a good thing.
>> 
>> Rebuild listen entries is indeed broken. This is a show stopper for 
>> 1.1.5 ... I am working at it.
> 
> Is there a reason for not generating all "sensible" sl_listen entries?
> I didn't find any documentation on the performance overhead a
> sl_listen entry causes.

Exactly the "sensible" part of that all is important. Problems arise 
when a node receives an event from any set-origin, which has not yet 
been processed by its data provider for that set. For example

     1 -> 2

       3

1 being origin, 2 is subscriber, 3 is a new node not subscribed yet. 3 
has paths for 1 and 2, so naturally it would listen on each of them for 
their events. If we now subscribe 3 as a cascaded node with 2 as its 
data provider, the ENABLE_SUBSCRIPTION event that will follow from node 
1, on which node 3 will start copy_set, must be received by 3 from 2. 
That is the only way that 2 at the moment where 3 starts to copy data 
actually has data itself. It could still be busy with it's own copy_set, 
meaning that not only the data in the tables is missing, the tables 
themself aren't in sl_table either yet.

And to spice this up a little more, reading the events is done async in 
the remote_listen thread. They are queued and the remote_worker thread 
will process them from the queue. At the moment where node 3 gets the 
SUBSCRIBE_SET event, it will have a lot of stuff already queued, so it 
better restart ASAP to throw that away and listen again, this time for 
all 1-events on 2.


Jan

> 
> With "sensible" I mean: Telling node X via sl_listen to ask neighbour-nodes
> (Those for which a sl_path entry exists) for events from all other nodes,
> apart from those for which the events must have travelled via node X to
> reach the neighbour of X in question.
> 
> I tried writing an algorithm to do that, but it turned out that isn't quite
> as easy as I initially believed, because all "iterative" algorithms
> I could think off (Which were all based basically on the idea, that
> if X receives events from Y, and Y from Z, then X can receive events from Z
> via Y) failed because there is not enough information in sl_listen to figure
> out if Y already needs X receive events from Z).
> 
> greetings, Florian Pflug


-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #


More information about the Slony1-general mailing list