Summary: | More sophisticated FAILOVER | ||
---|---|---|---|
Product: | Slony-I | Reporter: | Christopher Browne <cbbrowne> |
Component: | slonik | Assignee: | Jan Wieck <jan> |
Status: | RESOLVED FIXED | ||
Severity: | enhancement | CC: | slony1-bugs |
Priority: | high | ||
Version: | devel | ||
Hardware: | PC | ||
OS: | Linux | ||
Bug Depends on: | 179 | ||
Bug Blocks: | 261 | ||
Attachments: |
patch for proposed multi-node failover
patch for proposed multi-node failover v2 proposed multi-node failover v3 |
Description
Christopher Browne
2010-12-07 12:12:06 UTC
Note that this requires the WAIT FOR EVENT changes of Bug #179 Since no patch was forthcoming for the original description I propose an alternative. I propose only to address the problem of a cluster like this Node 1-------------->Node 2 (set 1) (set 2) | | | | V V Node 3--------------> Node 4 If node 1 and 2 are both lost at the same time Assuming: - There are no oustanding configuration/subscription events at the time of failure. - No additional failures happen while the FAILOVER command is executing This patch proposes FAILOVER ( NODE=(ID=1,BACKUP NODE=2), NODE=(ID=3, BACKUP NODE=4)); It also includes a multi-node DROP NODE DROP NODE( id='1,2', event node=3); https://github.com/ssinger/slony1-engine/tree/multi_node_failover_steve Please review and comment on the syntax. Created attachment 135 [details]
patch for proposed multi-node failover
Created attachment 136 [details]
patch for proposed multi-node failover v2
Created attachment 137 [details]
proposed multi-node failover v3
The failover procedure (at a high level) is as follows * 1. Get a list of failover candidates for each failed node. * 2. validate that we have conninfo to all of them * 3. blank communications paths to the failed nodes * 4. Wait for slons to restart (implies need to tell slons to restart) * 5. for each failed node get the highest xid for each candidate * 6. execute FAILOVER on the highest canidate * 7. MOVE SET to the backup node. This work was completed in the 2.2 development cycle and was primarily committed as part of http://git.postgresql.org/gitweb/?p=slony1-engine.git;a=commit;h=5e625828d1aefdeabd4ac1e138f54f8aae686f2 |