Tue May 3 09:22:20 PDT 2005
- Previous message: [Slony1-general] Working out who is master after failover
- Next message: [Slony1-general] Working out who is master after failover
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
>> However, after a failover, there's a problem. Both nodes think >> they're the master node according to the criteria above, because the >> old master hasn't received the news of his demotion. Is there a >> simple way of working out on the "abandoned" node that it has indeed >> been abandoned? > Consider some scenarios: > > 1. Node #1 is in Ottawa; other nodes are in Toronto. Failover due to > persistent network failure. > > The network falls over, and we decide that the Ottawa data source must > be abandoned. The database host and its database is undamaged, and > since we had no way to communicate with node #1 that it got stepped on, > it thinks it's running fine. > > Note that in this case, no data was ever corrupted in any way. > > Note also that since the network was dead, we had no way to tell node #1 > that it is has been abandoned. > > Supposing there are some client machines in Ottawa, they might be able > to talk to node #1 even after it is abandoned, as they were on the > subnet there. Could be trouble... Yup, that's the scenario I'm worried about. My application has a somewhat special view of what "corrupted" means. We have a constant stream of write-once-read-many perishable data. If I lose 10 minutes of data, the entire database might as well be corrupted, because it's incomplete (which is in many ways worse than unavailable). If I failover from node 1 to node 2 as soon as I've detected a failure of connectivity of node 1, node 2 can takeover as master with no data loss. I want my client applications that feed the data in to *realise* that node 2 is now master and the data can be fed to it, not node 1. I'd prefer that realisation to be stateless -- i.e. they don't have to remember who is currently master. The nightmare is that node 1 suddenly comes back online, and the applications start feeding data to node 1 instead of node 2 because node 1 is then looks like a master node. I then have *two* incomplete databases! One workaround would be for the applications feeding the data to check that there is one and only one master by executing the query select "_T1".getlocalnodeid('_T1') = (select set_origin from"_T1".sl_set where set_id = 1) on *all* the hosts every time they want to write data. If more that one claims to be master, something's wrong. But that's a little inefficient when the first host they try is usually the master. Julian
- Previous message: [Slony1-general] Working out who is master after failover
- Next message: [Slony1-general] Working out who is master after failover
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list