[Slony1-general] Issue when adding node to replication

Thu Sep 27 12:26:17 PDT 2012

On 9/27/2012 2:34 PM, Brian Fehrle wrote:
> Hi all,
>
> PostgreSQL v 9.1.5 - 9.1.6
> Slony version 2.1.0
>
> I'm having an issue that's occurred twice now. I have 4 node slony
> cluster, and one of the operations is to drop a node from replication,
> do maintenance on it, then add it back to replication.
>
> Node 1 = master
> Node 2 = slave
> Node 3 = slave  -> dropped then readded
> Node 4 = slave

First, why is the node actually dropped and readded so fast, instead of 
just doing the maintenance while it falls behind, then let it catch up?

You apparently have a full blown path network from everyone to everyone. 
This is not good under normal circumstances since the automatic listen 
generation will cause every node to listen on every other node for 
events, from non-origins. Way too many useless database connections.

What seems to happen here are some race conditions. The node is dropped 
and when it is added back again, some third node still didn't process 
the DROP NODE and when node 4 looks for events from node 3, it finds old 
ones somewhere else (like on 1 or 2). When node 3 then comes around to 
use those event IDs again, you get the dupkey error.

What you could do if you really need to drop/readd it, use an explicit 
WAIT FOR EVENT for the DROP NODE to make sure all traces of that node 
are gone from the whole cluster.

Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin