Christopher Browne cbbrowne at afilias.info
Wed May 25 09:03:35 PDT 2011
On Wed, May 25, 2011 at 11:37 AM, Dan Goodliffe
<dan at randomdan.homeip.net> wrote:
> I've never had an actual problem caused by this. I regularly just
> hibernate my laptop (which I replicate live data to) and when it wakes
> up again, it logs a network error, reconnects and carries on. Very
> reliable and safe in my experience.

It's possible for you to have "problem b)" for this...

If your laptop was offline for a Remarkably Long Time, and there were
tens of thousands of events logged, in the interim, then the 300
second timeout is liable to be a problem.

The next time the laptop reconnects, it tries to pull in the list of
newer events, and if it takes 15 minutes for that query to pull the
list of events, you'll encounter this error, and it's liable to occur
repeatedly, because the retry will, again, take considerably more than
300 seconds.

That can be remedied by, at least temporarily, increasing the slon
configuration parameter, remote_listen_timeout.

But as has been commented, the work Slony does always takes place
inside the context of a transaction, so that in the vast majority of
cases, it's reasonable to expect that "kill slon, kill all the DB
connections, restart slon" will drop any half-baked updates, and
*cleanly* retry things in the scope of nice, fresh, new transactions.

In the early days, the solution to a whole lot of problems was "kill
off the slons and let them restart," and that wouldn't ever corrupt
data.

In any case, I think the existing documentation is pretty valid.  It
indicates the most likely causes, points to reasonable diagnoses, and
to the configuration fiddling that *might* be needful if the cause was
the more serious one.


More information about the Slony1-general mailing list