Tue May 3 09:22:20 PDT 2005
- Previous message: [Slony1-general] Working out who is master after failover
- Next message: [Slony1-general] Working out who is master after failover
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
>> However, after a failover, there's a problem. Both nodes think
>> they're the master node according to the criteria above, because the
>> old master hasn't received the news of his demotion. Is there a
>> simple way of working out on the "abandoned" node that it has indeed
>> been abandoned?
> Consider some scenarios:
>
> 1. Node #1 is in Ottawa; other nodes are in Toronto. Failover due to
> persistent network failure.
>
> The network falls over, and we decide that the Ottawa data source must
> be abandoned. The database host and its database is undamaged, and
> since we had no way to communicate with node #1 that it got stepped on,
> it thinks it's running fine.
>
> Note that in this case, no data was ever corrupted in any way.
>
> Note also that since the network was dead, we had no way to tell node #1
> that it is has been abandoned.
>
> Supposing there are some client machines in Ottawa, they might be able
> to talk to node #1 even after it is abandoned, as they were on the
> subnet there. Could be trouble...
Yup, that's the scenario I'm worried about.
My application has a somewhat special view of what "corrupted" means. We
have a constant stream of write-once-read-many perishable data. If I lose
10 minutes of data, the entire database might as well be corrupted, because
it's incomplete (which is in many ways worse than unavailable).
If I failover from node 1 to node 2 as soon as I've detected a failure of
connectivity of node 1, node 2 can takeover as master with no data loss. I
want my client applications that feed the data in to *realise* that node 2
is now master and the data can be fed to it, not node 1. I'd prefer that
realisation to be stateless -- i.e. they don't have to remember who is
currently master.
The nightmare is that node 1 suddenly comes back online, and the
applications start feeding data to node 1 instead of node 2 because node 1
is then looks like a master node. I then have *two* incomplete databases!
One workaround would be for the applications feeding the data to check that
there is one and only one master by executing the query
select "_T1".getlocalnodeid('_T1') =
(select set_origin from"_T1".sl_set where set_id = 1)
on *all* the hosts every time they want to write data. If more that one
claims to be master, something's wrong. But that's a little inefficient
when the first host they try is usually the master.
Julian
- Previous message: [Slony1-general] Working out who is master after failover
- Next message: [Slony1-general] Working out who is master after failover
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list