Christopher Browne cbbrowne
Wed Oct 5 21:08:31 PDT 2005
tgoodair at ca.afilias.info (Tim Goodaire) writes:
> On Wed, Oct 05, 2005 at 10:50:44AM -0400, Andrew Sullivan wrote:
>> On Tue, Oct 04, 2005 at 04:27:47PM -0700, elein wrote:
>> > Yes, it should. But it doesn't.  I believe any message is ever
>> > sent to the 3rd node. This is the same in my example.  See also the sl_setsync
>> > table.  It has a reference to node 1 (or 10).
>> 
>> Do we have a reproducible case yet?  It sure looks like a bug, but
>> without being able to isolate it, I dunno how we're going to fix it.
>
> Chris has been able to reproduce this:
> http://gborg.postgresql.org/pipermail/slony1-general/2005-September/003051.html

I reproduced it once, out of half a dozen failover runs, but don't yet
have anything I can call a "why."

That is why 1.1.1 didn't have any failover-related changes; not gonna
change things until there is a good answer "why."

- If we can track down why it's happening, that should lead to a remedy;
- If we can find some clearly-definable heuristic that helps, which can be
  clearly shown to be a "right thing" to do, I'd be fine with adding that.

My best guess is that some updates submitted as part of the slonik
FAIL OVER are failing, and that there is some inconsistency.  The
inconsistency hasn't emerged.

Yes, of course, the contents of some of the tables wind up looking
broken _afterwards_; that's not the problem I'm thinking of; the
*real* problem is that something breaks as FAILOVER tries to take
place.
-- 
let name="cbbrowne" and tld="ca.afilias.info" in name ^ "@" ^ tld;;
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)


More information about the Slony1-general mailing list