elein elein
Thu Sep 29 20:28:39 PDT 2005
I believe I waited a while for the failover to get
over to the third node.  Should I have killed and
restarted the slon processes?  

If I get a chance today, I will try to reconstruct
my test cases.

--elein

On Wed, Sep 28, 2005 at 06:08:34PM -0400, Christopher Browne wrote:
> elein wrote:
> 
> >Chris Browne has been helpful but busy.
> >Can someone else try to look into this problem.
> >
> >It seems to be a basic failure on failover when
> >there are cascaded replicas.  You are left unable
> >to add back in the failed node.
> >  
> >
> I am getting convinced that the problem (a problem?) lies in trying to
> delete the dead node too soon.
> 
> If I work slowly enough, it all "works out;" I can drop the failed node
> and add it back in.
> 
> If I wait (oh, say 10 minutes) between FAILOVER and dropping node 1,
> then that's enough time for references to node 1 to get purged out of
> event logs and sl_setsync and such.
> 
> If I don't wait, the "unrelated" nodes wind up falling over, unhappy
> about the attempt to delete a node that they still are referencing.
> 
> This seems a nicely evil bit of log...
> 
> ERROR  remoteWorkerThread_2: "begin transaction; set transaction
> isolation level serializable; lock table "_failtest".sl_config_lock;
> select "_failtest".dropNode_int(1); notify "_failtest_Restart"; notify
> "_failtest_Event"; notify "_failtest_Confirm"; insert into
> "_failtest".sl_event     (ev_origin, ev_seqno, ev_timestamp,     
> ev_minxid, ev_maxxid, ev_xip, ev_type , ev_data1    ) values ('2',
> '102', '2005-09-28 21:55:53.176607', '462075', '462076', '',
> 'DROP_NODE', '1'); insert into "_failtest".sl_confirm     (con_origin,
> con_received, con_seqno, con_timestamp)    values (2, 3, '102',
> CURRENT_TIMESTAMP); commit transaction;" PGRES_FATAL_ERROR ERROR: 
> update or delete on "sl_node" violates foreign key constraint
> "ssy_origin-no_id-ref" on "sl_setsync"
> DETAIL:  Key (no_id)=(1) is still referenced from table "sl_setsync".
> 
> This bit of log file gets generated on node #3 (the extra subscriber)
> when I submit the DROP NODE request.
> 
> It's not suggesting an analytical fix to me just yet...  I'm not sure I
> want to automagically delete the sl_setsync entry just yet at the point
> that this happens...
> 


More information about the Slony1-general mailing list