Bug 296 - FAILOVER doesn't work when non-origin nodes fail at the same time
Summary: FAILOVER doesn't work when non-origin nodes fail at the same time
Status: RESOLVED FIXED
Alias: None
Product: Slony-I
Classification: Unclassified
Component: slonik (show other bugs)
Version: devel
Hardware: PC Linux
: low enhancement
Assignee: Slony Bugs List
URL:
Depends on:
Blocks:
 
Reported: 2013-06-21 09:03 UTC by Steve Singer
Modified: 2013-07-10 10:30 UTC (History)
1 user (show)

See Also:


Attachments
proposed fixes for 296 (40.63 KB, patch)
2013-06-27 13:49 UTC, Steve Singer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Singer 2013-06-21 09:03:14 UTC
This bug is present in 2.2.0 beta 4

Consider a 6 node cluster with two sets such that the subscribe network for the
two sets is like

 set 1
    1 --------> 2
   /\          /\ 
  3  4        5  6


 set 2 
   3-----------> 5
  /\            /\
 1  4          2  6

If Nodes 1,3,4  all fail at the same time (because they are in the same location)


---
FAILOVER( node=(id=1,backup node=2), node=(id=3,backup node=5));
---

doesn't work because slonik still thinks that node 4 is part of the cluster and will wait for it.    

Requiring a user to DROP node 4 before the failover like
--
drop node(id=4, event node = 2);
--

is an issue because slonik will still think that node 1,3 are valid nodes and if any ev_origin=4 events have made it to some nodes but not others slonik will wait for that event to be replicated everywhere.
Comment 1 Steve Singer 2013-06-21 12:28:53 UTC
If one tries the FAILOVER command like:

----
FAILOVER(node=(id=1,backup node=2), node=(id=3,backup node=5), node=(id=4,backup node=2));   
-----

slonik segfaults

This turns out to reveal a more generic problem where if you try a FAILOVER
when no valid failover targets exist for the node (in sl_failover_targers) slonik is segfaulting.  We can avoid this by a check on num_nodes in 
fail_node_promote but I wondering if that masks the symptoms.

If a origin node really has no failover targets then what should happen?

If the node is an origin node then I think we need to log an error.  Slonik currently has a error message "error no failover candidates for %d"  but that isn't working properly (requires a 2 line fix)

If the node is not an origin then we could, treat the node as failed so we don't consider it as a failover candidate for another node but not actually call the
failover functions for it

This can still leave as with a problem of what to do with the sl_subscribe entries with sl_receiver=this_node when sl_provider=another_failed_node.   Currently slony tries to set sl_provider=backup_node but if the paths are missing then this fails.  We could create 'fake paths' or remove the sl_subscribe rows with sl_receiver being the failed node , with no backup, that is going away. Other ideas are also welcome
Comment 2 Steve Singer 2013-06-24 12:56:59 UTC
My proposal is that we should do the following

1. Require users to list all of the failed nodes on the FAILOVER command, even non-origin nodes.

2. If a node isn't listed in the FAILOVER command as a failed node and if it is a RECEIVER of a set from a failed node it MUST have a direct PATH to the backup node.  We will check this early on and give an error.  Users can then do a STORE PATH or add that node to the list of failed nodes.  

3. If a node is a non-origin and is listed as a failed node we redirect it to the backup node by manipulating sl_subscribe and we post a FAILOVER event on the backup node so all other nodes can update sl_subscribe as needed.  We avoid issues
with foreign key violations between sl_subscribe -> sl_path and sl_listen -> sl_path because the direct path exists.
Comment 3 Steve Singer 2013-06-27 13:49:23 UTC
Created attachment 165 [details]
proposed fixes for 296

I have pushed a set of fixes for these issues at https://github.com/ssinger/slony1-engine/commits/bug296 or in the attached patch.

This set of changes follows 1 and 2 from my previous comment but doesn't exactly follow point 3.

We pass the complete list of failed nodes in the FAILOVER_NODE event, so in the above example we would do
FAILOVER(node=(id=1, backup node=2), node=(id=3,backup node=5), node=(id=4, backup node=2))

slonik will generate two FAILOVER_NODE events.  

FAILOVER_NODE (backup_node=2,failed_node=1,  seq_no=XXXXXXX, failed_node_list=1,3,4])

FAILOVER_NODE (backup_node=5,failed_node=1,  seq_no=XXXXXXX, failed_node_list=1,3,4])

Each node that proceses these events will know that the other two nodes have also failed and to not repoint the sl_subscribe line for node 4 to a backup node.
Comment 4 Steve Singer 2013-07-10 10:30:59 UTC
A version of this patch along with additional failover fixes has been pushed to master for 2.2.0 b5