cbbrowne at ca.afilias.info cbbrowne
Tue Mar 28 06:52:30 PST 2006
> Thanks for all the great help Chris, one last one for now,  how about the
> timeout issue, do you consider this a problem, they seem to be coming at a
> steady pace in the logs, here's a small log grab from this morning:
>
> 2006-03-28 08:54:29 AST ERROR  remoteListenThread_1: timeout for event
> selection
> 2006-03-28 08:54:43 AST DEBUG1 remoteListenThread_1: connected to
> 'dbname=order_lookup host=192.168.20.5 user=postgres'
> 2006-03-28 08:54:58 AST DEBUG2 syncThread: new sl_action_seq 1 - SYNC
> 1534261
> 2006-03-28 08:54:58 AST DEBUG2 localListenThread: Received event 2,1534261
> SYNC
> 2006-03-28 08:55:59 AST DEBUG2 syncThread: new sl_action_seq 1 - SYNC
> 1534262
> 2006-03-28 08:55:59 AST DEBUG2 localListenThread: Received event 2,1534262
> SYNC
> 2006-03-28 08:56:52 AST DEBUG1 cleanupThread:    2.960 seconds for
> cleanupEvent()
> 2006-03-28 08:56:57 AST DEBUG1 cleanupThread:    5.211 seconds for delete
> logs
> 2006-03-28 08:56:58 AST DEBUG2 syncThread: new sl_action_seq 1 - SYNC
> 1534263
> 2006-03-28 08:56:59 AST DEBUG2 localListenThread: Received event 2,1534263
> SYNC
> 2006-03-28 08:57:59 AST DEBUG2 syncThread: new sl_action_seq 1 - SYNC
> 1534264
> 2006-03-28 08:57:59 AST DEBUG2 localListenThread: Received event 2,1534264
> SYNC
> 2006-03-28 08:58:59 AST DEBUG2 syncThread: new sl_action_seq 1 - SYNC
> 1534265
> 2006-03-28 08:58:59 AST DEBUG2 localListenThread: Received event 2,1534265
> SYNC
> 2006-03-28 08:59:43 AST ERROR  remoteListenThread_1: timeout for event
> selection

No, this looks to me as though it may be the "key" problem :-(.

That error message is indicating that it's taking so long to process
events that Slony-I is "giving up" and assuming a timeout so that it tries
again.

Unfortunately, with either 200K or 1M O/S events (not sure which), it's
taking more than the 5 minutes/300 seconds that is the timeout interval in
the code.

I think changing this one would require recompiling slon, and changing the
timeout interval to something much higher.

It's in remote_listen.c; look for the "timeout" error message to find
where...

Practically, this will be more trouble to "fix" than it's worth; you'd be
better off, I think, reinitializing Slony-I.  A couple of "UNINSTALL NODE"
requests can clear things out...

It is "academically" interesting to see what happens if the timeout gets
changed so that the query can complete and you can move on to whatever
issue comes up next :-(.




More information about the Slony1-general mailing list