Tue Jun 27 08:59:56 PDT 2017
- Previous message: [Slony1-general] Wrongly configured trigger when upgrading slony from 2.0.7 to 2.2.5
- Next message: [Slony1-general] failover failure and mysterious missing paths
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello Slony-I community, Hoping someone can advise on a strange and serious problem. We performed a slony service failover yesterday. For the first time ever, our slony service FAILOVER op errored out. We recently expanded our cluster to 7 consumers from a single provider. There are no load issues during normal operations. As the error output below shows, though, our node 4 and node 5 consumers never got the events they needed. Here’s where it gets weird: closer inspection has shown that node 2->4 and node 2->5 path data went missing out of the service at some point. It seems clear that’s the main issue, but in spite of that, both node 4 and node 5 continued to find and process node 2 SYNC events for a full week! The logs show this happened in spite of multiple restarts. How can this happen? If missing path data stymies the failover, wouldn’t it also prevent normal SYNC processing? In the case where a failover is begun with inadequate path data, what’s the best resolution? Can path data be quickly applied to allow failover to succeed? Thanks in advance for any insights. ---- failover error ---- /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: NOTICE: calling restart node 1 /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:55: 2017-06-26 18:33:02 executing preFailover(1,1) on 2 executing preFailover(1,1) on 3 executing preFailover(1,1) on 4 executing preFailover(1,1) on 5 executing preFailover(1,1) on 6 executing preFailover(1,1) on 7 executing preFailover(1,1) on 8 NOTICE: executing "_ams_cluster".failedNode2 on node 2 /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for event (2,5000061664). node 8 only on event 5000061654, node 4 only on event 5000061654, node 5 only on event 5000061655, node 3 only on event 5000061662, node 6\ only on event 5000061654, node 7 only on event 5000061656 /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for event (2,5000061664). node 4 only on event 5000061657, node 5 only on event 5000061663, node 3 only on event 5000061663, node 6 only on event 5000061663 /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for event (2,5000061664). node 4 only on event 5000061663, node 5 only on event 5000061663, node 6 only on event 5000061663 /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for event (2,5000061664). node 4 only on event 5000061663, node 5 only on event 5000061663 /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for event (2,5000061664). node 4 only on event 5000061663, node 5 only on event 5000061663 /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for event (2,5000061664). node 4 only on event 5000061663, node 5 only on event 5000061663 /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for event (2,5000061664). node 4 only on event 5000061663, node 5 only on event 5000061663 /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for event (2,5000061664). node 4 only on event 5000061663, node 5 only on event 5000061663 /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for event (2,5000061664). node 4 only on event 5000061663, node 5 only on event 5000061663 /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for event (2,5000061664). node 4 only on event 5000061663, node 5 only on event 5000061663 /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for event (2,5000061664). node 4 only on event 5000061663, node 5 only on event 5000061663 /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for event (2,5000061664). node 4 only on event 5000061663, node 5 only on event 5000061663 /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for event (2,5000061664). node 4 only on event 5000061663, node 5 only on event 5000061663 /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for event (2,5000061664). node 4 only on event 5000061663, node 5 only on event 5000061663 ---- node 4 log archive ---- bos-mpt5c:odin-9353 ttignor$ egrep 'disableNode: no_id=2|storePath: pa_server=2 pa_client=4|restart notification' prod4/node4-pathconfig.out 2017-06-15 15:14:00 UTC [5688] INFO localListenThread: got restart notification 2017-06-15 15:14:10 UTC [8431] CONFIG storePath: pa_server=2 pa_client=4 pa_conninfo="dbname=ams 2017-06-15 15:53:00 UTC [8431] INFO localListenThread: got restart notification 2017-06-15 15:53:10 UTC [23701] CONFIG storePath: pa_server=2 pa_client=4 pa_conninfo="dbname=ams 2017-06-16 17:29:13 UTC [10253] CONFIG storePath: pa_server=2 pa_client=4 pa_conninfo="dbname=ams 2017-06-16 20:43:42 UTC [2707] CONFIG storePath: pa_server=2 pa_client=4 pa_conninfo="dbname=ams 2017-06-19 15:11:45 UTC [2707] CONFIG disableNode: no_id=2 2017-06-19 15:11:45 UTC [2707] INFO localListenThread: got restart notification 2017-06-20 18:40:15 UTC [31224] INFO localListenThread: got restart notification 2017-06-21 14:31:42 UTC [6253] INFO localListenThread: got restart notification 2017-06-21 14:35:26 UTC [32367] INFO localListenThread: got restart notification 2017-06-26 18:21:25 UTC [9278] INFO localListenThread: got restart notification 2017-06-26 18:33:04 UTC [28839] INFO localListenThread: got restart notification 2017-06-26 18:33:30 UTC [1785] INFO localListenThread: got restart notification bos-mpt5c:odin-9353 ttignor$ ---- node 5 log archive ---- bos-mpt5c:odin-9353 ttignor$ egrep 'disableNode: no_id=2|storePath: pa_server=2 pa_client=5|restart notification' prod5/node5-pathconfig.out 2017-06-15 15:13:56 UTC [20700] INFO localListenThread: got restart notification 2017-06-15 15:14:06 UTC [20374] CONFIG storePath: pa_server=2 pa_client=5 pa_conninfo="dbname=ams 2017-06-15 15:53:01 UTC [20374] INFO localListenThread: got restart notification 2017-06-15 15:53:11 UTC [2859] CONFIG storePath: pa_server=2 pa_client=5 pa_conninfo="dbname=ams 2017-06-16 17:28:19 UTC [2859] INFO localListenThread: got restart notification 2017-06-16 17:28:29 UTC [10753] CONFIG storePath: pa_server=2 pa_client=5 pa_conninfo="dbname=ams 2017-06-19 15:11:40 UTC [10753] CONFIG disableNode: no_id=2 2017-06-19 15:11:40 UTC [10753] INFO localListenThread: got restart notification 2017-06-20 18:40:11 UTC [450] INFO localListenThread: got restart notification 2017-06-21 14:31:41 UTC [22300] INFO localListenThread: got restart notification 2017-06-21 14:35:28 UTC [26777] INFO localListenThread: got restart notification 2017-06-26 18:21:27 UTC [28366] INFO localListenThread: got restart notification 2017-06-26 18:33:04 UTC [29345] INFO localListenThread: got restart notification 2017-06-26 18:33:27 UTC [1299] INFO localListenThread: got restart notification bos-mpt5c:odin-9353 ttignor$ Tom ☺ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20170627/d6abab77/attachment-0001.htm
- Previous message: [Slony1-general] Wrongly configured trigger when upgrading slony from 2.0.7 to 2.2.5
- Next message: [Slony1-general] failover failure and mysterious missing paths
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list