Tue Oct 16 09:56:40 PDT 2012
- Previous message: [Slony1-hackers] Failover never completes
- Next message: [Slony1-hackers] Failover never completes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 10/16/2012 05:50 AM, Steve Singer wrote: > On 12-10-15 11:20 PM, Joe Conway wrote: >> We are using 2.1.0. We tried upgrading to 2.1.2 but got stuck because we >> cannot have a mixed 2.1.0/2.1.2 cluster. We have constraints that do not >> allow for upgrade-in-place of existing nodes, which is why we want to >> add a new node and failover to it (to facilitate upgrades of components >> other than slony, e.g. postgres itself). > > So your > 1. Adding a new node > 2. Stopping the old node > 3. Running UPGRADE FUNCTIONS on the new node > 4. Starting up the new slon and running 'FAILOVER' ? No, as I understand it from http://slony.info/documentation/slonyupgrade.html we would need to: 1) Stop the slon processes on all nodes. (e.g. - old version of slon) 2) Install the new version of slon software on all nodes. 3) Execute a slonik script containing the command update functions (id = [whatever]); for each node in the cluster. We are trying to avoid #1, and in any case cannot easily do #2 (no upgrade in place). At the moment we are testing with clusters that are all running 2.1.0. It is in this configuration where failover is failing. We *attempted* to run a mixed 2.1.0/2.1.2 cluster so that we could failover to the new version, but slon refused to start up in a mixed cluster. We could possibly test a cluster with all 2.1.2, which might be instructive, especially if it turns out that the problem we are running into is solved in 2.1.2. However we would still have the challenge of getting from existing 2.1.0 clusters to 2.1.2 clusters without excessive downtime. >> Is bug 260 issue #2 deterministic or a race condition? Our current >> process works 9 out of 10 times... > > My recollection was that #260 usually tended to happen, but there are a > lot of other rare race conditions I had occasionally hit which lead to > the failover changes in 2.2 > > Does your sl_listen table have any cycles in it, ie > a-->b > b--->a > (or even cycles through a third node) I assume you mean provider->receiver? If so, tons of cycles: A->C C->A C->B B->C C->D D->C A->B B->A ...and more... > Which nodes have processed the FAILVOVER_SET event? Which (if any) > nodes have processed the ACCEPT_SET? Which node is the 'most ahead > node', I think slonik reports this on stdout when it runs. Are the > remoteWorkerThread_'A' threads running on the other nodes and what are > they doing? I am not seeing any events in the slony tables now except SYNC events -- does that mean slon has cleaned out the ones from yesterday when I ran into this? > I'm asking these questions to try and get a sense of what the cluster > state is and where the problem might be. Node D (slave2) has processed the failover and shows node C (new master) as the set origin. It also seems to have correct/expected rows in the other tables (based on comparison with a run that was successful). Node B (slave1) shows node A (original master) as the set origin. However sl_subscribe is correct (provider is C, B and D as the receivers, no extra rows), sl_path looks correct, sl_node looks correct. Node C (new master) shows node A (original master) as the set origin. sl_subscribe has two correct rows (provider is C, B and D as the receivers) and one extra row (provider B, subscriber C, active false). sl_path looks correct, sl_node looks correct. Node A (orig master) shows node A (original master) as the set origin. sl_subscribe has three incorrect rows (provider is A, B and D as the receivers; and provider B, subscriber C, active true). The sl_path table has "Event Pending" in the path rows for B->C and D->C. Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support
- Previous message: [Slony1-hackers] Failover never completes
- Next message: [Slony1-hackers] Failover never completes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-hackers mailing list