Tue Jul 23 12:22:39 PDT 2013
- Previous message: [Slony1-general] Slony Watchdog failed starting up the child process
- Next message: [Slony1-general] Slony Watchdog failed starting up the child process
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 07/23/2013 03:08 PM, Christopher Browne wrote: > My intuition from seeing it say "FATAL" is that that's indicating "death > of process," and that there's not much coming back from it. > > This behaviour is pretty consistent with what happens with a Postgres > postmaster; if the attempt to start up fails due to seeming already to > have a postmaster, it doesn't retry, pg_ctl immediately gives up. > This came up a few years ago with bug #132. http://git.postgresql.org/gitweb/?p=slony1-engine.git;a=commit;h=acd46819bad1613764708b138ebcfa895467ac51 Changed slon to behave as Rose expected, retry to get the node lock every few seconds. A few weeks later we modified this to only retry getting the node lock in response to a slon requested restart and not retry if the initial start fails. http://git.postgresql.org/gitweb/?p=slony1-engine.git;a=commit;h=7d3e6659542ad337feb2fbe39f05b780c37afe97 I don't really remember the discussion around this change and exactly why we didn't like my original patch, possibly for reasons like you argue above, if slon keeps looping it never really 'starts' and it is hard to detect that. > By the way, is this possibly because of a zombied old connection that > got disconnected due to firewall glitch or such? If so, you should > probably see about lowering the TCP keepalive parameters both in the > slon.conf file and in postgresql.conf > > (On postgresql.conf, see tcp_keepalives_(idle|interval|count), and on > slon.conf, see tcp_keepalive, tcp_keepalive_(idle|interval|count).) > > No matter how low you make the postgresql.conf settings it is always possible for the replacement slon to start before the postgresql detects the timeout. I don't know how low you can make the tcp timeout settings before it has other side-effects. One option is to push the issue to whatever is starting the slon and let it retry (which is what we do now). Another option is to let slon loop x times trying to get the node-lock before giving up, but we didn't seem to like that 3 years ago. > On Tue, Jul 23, 2013 at 3:07 PM, Christopher Browne > <cbbrowne at afilias.info <mailto:cbbrowne at afilias.info>> wrote: > > My intuition from seeing it say "FATAL" is that that's indicating > "death of process," and that there's not much coming back from it. > > This behaviour is pretty consistent with what happens with a > Postgres postmaster; if the attempt to start up fails due to seeming > already to have a postmaster, it doesn't retry, pg_ctl immediately > gives up. > > By the way, is this possibly because of a zombied old connection > that got disconnected due to firewall glitch or such? If so, you > should probably see about lowering the TCP keepalive parameters both > in the slon.conf file and in postgresql.conf > > (On postgresql.conf, see tcp_keepalives_(idle|interval|count), and > on slon.conf, see tcp_keepalive, tcp_keepalive_(idle|interval|count).) > > > > > _______________________________________________ > Slony1-general mailing list > Slony1-general at lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general
- Previous message: [Slony1-general] Slony Watchdog failed starting up the child process
- Next message: [Slony1-general] Slony Watchdog failed starting up the child process
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list