Cyril Scetbon cscetbon.ext at orange-ftgroup.com
Sat May 22 14:13:36 PDT 2010

Jan Wieck a écrit :
> On 5/21/2010 10:42 AM, Cyril Scetbon wrote:
>   
>> Jan Wieck a écrit :
>>     
>>> I don't care much about that one old event. It does no harm other than 
>>> currently confusing test_slony_state. What I worry about is attempting 
>>> to failover in the case of emergency with an only half functioning path 
>>> network.
>>>   
>>>       
>> I don't really understand the issue you're talking about... Certainly 
>> I've a weak knowledge of your code :)
>> You're talking about missing errors in network cause there are no SYNC 
>> generated on a receiver ? If yes, if it confirms events from others it's 
>> not enough to say that everything works ?
>>     
>
> Let me try to explain the problem.
>
> In a multi node cluster, not every node necessarily needs to be able to 
> talk to every other node. Let us just look at a cascaded 3 node cluster:
>
>      1 - 2 - 3
>
> This setup requires 4 sl_path entries to work:
>
>      server=1, client=2
>      server=2, client=1
>      server=2, client=3
>      server=3, client=2
>
> And it is supposed to generate the following sl_listen rows:
>
>      origin=1, receiver=2, provider=1
>      origin=1, receiver=3, provider=2
>      origin=2, receiver=1, provider=2
>      origin=2, receiver=3, provider=2
>      origin=3, receiver=1, provider=2
>      origin=3, receiver=2, provider=3
>
> It does not matter which node is currently the origin of any set at all. 
> All these paths and connections are important for the health and well 
> being of the Slony cluster. If for example the listening for events from 
> 2, receiver=3 would be broken, then node 3 would still perfectly fine 
> replicate data originating from 1. But as soon as you move set to node 
> 2, it would start falling behind and you effectively lose your second 
> level backup.
>
> This is why Slony originally created a SYNC on EVERY node at least every 
> 10 seconds. Just so there is some harmless event passing going on to 
> have something to monitor and keep sl_status looking good.
>
> That is what got removed and that is what I think we should put back.
>   
thanks for the explanation ! I agree too.
>
> Jan
>
>   

-- 
Cyril SCETBON



More information about the Slony1-general mailing list