This bug is present in 2.2.0 beta 4 Consider a 6 node cluster with two sets such that the subscribe network for the two sets is like set 1 1 --------> 2 /\ /\ 3 4 5 6 set 2 3-----------> 5 /\ /\ 1 4 2 6 If Nodes 1,3,4 all fail at the same time (because they are in the same location) --- FAILOVER( node=(id=1,backup node=2), node=(id=3,backup node=5)); --- doesn't work because slonik still thinks that node 4 is part of the cluster and will wait for it. Requiring a user to DROP node 4 before the failover like -- drop node(id=4, event node = 2); -- is an issue because slonik will still think that node 1,3 are valid nodes and if any ev_origin=4 events have made it to some nodes but not others slonik will wait for that event to be replicated everywhere.
If one tries the FAILOVER command like: ---- FAILOVER(node=(id=1,backup node=2), node=(id=3,backup node=5), node=(id=4,backup node=2)); ----- slonik segfaults This turns out to reveal a more generic problem where if you try a FAILOVER when no valid failover targets exist for the node (in sl_failover_targers) slonik is segfaulting. We can avoid this by a check on num_nodes in fail_node_promote but I wondering if that masks the symptoms. If a origin node really has no failover targets then what should happen? If the node is an origin node then I think we need to log an error. Slonik currently has a error message "error no failover candidates for %d" but that isn't working properly (requires a 2 line fix) If the node is not an origin then we could, treat the node as failed so we don't consider it as a failover candidate for another node but not actually call the failover functions for it This can still leave as with a problem of what to do with the sl_subscribe entries with sl_receiver=this_node when sl_provider=another_failed_node. Currently slony tries to set sl_provider=backup_node but if the paths are missing then this fails. We could create 'fake paths' or remove the sl_subscribe rows with sl_receiver being the failed node , with no backup, that is going away. Other ideas are also welcome
My proposal is that we should do the following 1. Require users to list all of the failed nodes on the FAILOVER command, even non-origin nodes. 2. If a node isn't listed in the FAILOVER command as a failed node and if it is a RECEIVER of a set from a failed node it MUST have a direct PATH to the backup node. We will check this early on and give an error. Users can then do a STORE PATH or add that node to the list of failed nodes. 3. If a node is a non-origin and is listed as a failed node we redirect it to the backup node by manipulating sl_subscribe and we post a FAILOVER event on the backup node so all other nodes can update sl_subscribe as needed. We avoid issues with foreign key violations between sl_subscribe -> sl_path and sl_listen -> sl_path because the direct path exists.
Created attachment 165 [details] proposed fixes for 296 I have pushed a set of fixes for these issues at https://github.com/ssinger/slony1-engine/commits/bug296 or in the attached patch. This set of changes follows 1 and 2 from my previous comment but doesn't exactly follow point 3. We pass the complete list of failed nodes in the FAILOVER_NODE event, so in the above example we would do FAILOVER(node=(id=1, backup node=2), node=(id=3,backup node=5), node=(id=4, backup node=2)) slonik will generate two FAILOVER_NODE events. FAILOVER_NODE (backup_node=2,failed_node=1, seq_no=XXXXXXX, failed_node_list=1,3,4]) FAILOVER_NODE (backup_node=5,failed_node=1, seq_no=XXXXXXX, failed_node_list=1,3,4]) Each node that proceses these events will know that the other two nodes have also failed and to not repoint the sl_subscribe line for node 4 to a backup node.
A version of this patch along with additional failover fixes has been pushed to master for 2.2.0 b5