Matthew Horoschun matthew
Mon Oct 31 00:15:21 PST 2005
Hi All,

We're now seeing the following message:

	ERROR remoteListenThread_1: timeout for event selection

At the same time as:

	sched_mainloop: select(): Bad file descriptor

appears on standard error.

This appears to be a very similar problem to a post last year:

http://gborg.postgresql.org/pipermail/slony1-general/2004-October/000747.html

In a reply, Jan suggested that this locking problem would be fixed in 1.0.3.

We're running 1.1.0. Can an anybody comment as to whether this really 
has been fixed? Or should we be looking at the patch suggested?

Matthew.

> Oct 25 17:11:55 radius2 slon_radius2[33490]: [28-1] 2005-10-25 17:11:55 
> EST [33490] ERROR  remoteListenThread_2: timeout for event selection
> Oct 25 17:17:49 radius2 slon_radius2[33490]: [29-1] 2005-10-25 17:17:49 
> EST [33490] ERROR  remoteListenThread_2: timeout for event selection
> Oct 25 17:23:34 radius2 slon_radius2[33490]: [30-1] 2005-10-25 17:23:34 
> EST [33490] ERROR  remoteListenThread_2: timeout for event selection
> 
> Obviously, at this stage, replication fails.
> 
> So far, our investigation has found that:
> 
> * It appears to fail only on clusters with more than one slave.
> * It isn't periodic as far as we can tell (it can take a week or two to 
> fail).
> * We haven't had it fail under load (it's currently failing when 
> completely idle -- no changes being submitted to the master).
> 
> All nodes are running slony1-1.1.0 on FreeBSD 4.11.
> 
> Any suggestions on what might be causing this, or where I should look 
> for more useful debugging information?
> 


-- 
Matthew Horoschun
Internet Development
Telstra Internet Direct
Ph: +61 2 6208 1929
Fax: + 61 2 6248 6165

matthew at telstra.net


More information about the Slony1-general mailing list