Jan Wieck JanWieck at Yahoo.com
Mon Jul 12 08:03:21 PDT 2010
On 7/12/2010 10:38 AM, Steve Singer wrote:
> Jan Wieck wrote:
>> On 7/9/2010 9:21 AM, Steve Singer wrote:
>>> Steve Singer wrote:
>> 
>> Me thinks this would possibly be a slightly too old sl_confirm, 
>> resulting in the clone trying to apply events that the source had 
>> already processed again.
>> 
>> This would very much depend on how the clone was actually created. 
>> Assume you create a clone of node 2 as node 3. If there is a gap between 
>> the prepare clone and when the actual copy is made, a running slon for 
>> node 2 has time to process more events. IIRC it is the clone preparation 
>> that creates the configuration copy including the content of sl_confirm.
>> 
>> Can you make sure that in your tests you stop the slon before doing the 
>> clone prepare and restart it only after the pg_dump transaction for the 
>> cloning has taken its snapshot?
> 
> I'm thinking that this was actually being caused by an issue in my test 
> that was leaving an old version of the 'clone' database around from the 
> previous run.  Since addressing that issue I've been unable to reproduce 
> those duplicates in my log.
> 
> That doesn't really answer the question of do you need to stop the slons 
>   in-between the CLONE PREPARE and taking the pg_dump.  If this is the 
> case then we should update the documentation but I'm not yet convinced 
> that this is the case.

If my assumption is right and the sl_confirm data for the new node is 
generated at CLONE PREPARE time, then you definitely need to stop at 
least the slon that serves the node you are cloning during that. 
Otherwise the clone thinks it did not process events yet that got 
processed and confirmed by the clone source in between.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


More information about the Slony1-patches mailing list