Jacques Caron jc at oxado.com
Wed Apr 23 09:42:43 PDT 2008
Hi all,

There is a bit of an issue in the way slon gets events from the 
master when there is a (very) large backlog: the remote listener 
thread tries to read all the events available in memory, which means:
- the process can grow quite a lot, eat useful cache memory from the 
DB (if slon is running on the same instance as the DB which is the 
usual case), eventually start swapping, and it worsens the backlog
- and/or the process can actually exceed memory limits, exit and 
restart, fetching events again
- the initial load of all available events itself may never complete 
(if it exceeds memory limits), and thus no replication happens since 
it won't start working until this initial load is complete

One first and easy fix for the last problem is to add a simple "LIMIT 
x" in remoteListen_receive_events. This will at least allow slon to 
start handling events while more are loaded. In situations where 
events can still be read a lot faster than they are handled (which is 
usually the case), forcing a sleep in the loop helps, but I'm not 
sure how this could be made to work in the general case.

A further and better fix would be to also add a count of 
"outstanding" events (that would be incremented when new events are 
loaded and decremented once they have been handled), and to have the 
listener thread sleep a bit when that count exceeds a given 
threshold. No need to have tens of millions of events in memory (with 
the possible complications given above) if we handle at most a few 
thousand at a time...

I also found out that setting desired_sync_time to 0 and increasing 
significantly sync_group_maxsize helps a lot when catching up. Is 
there a specific reason to have a low default value for this? Since 
it's bounded by the number of available events anyway, I'm not sure 
how low values actually help anything -- at least when desired_sync_time=0.

Finally, when in some situations fetching from the log is slow (that 
can happen when trying to fetch log items that happened during a long 
transaction, as the bounds for the index search are quite large), I 
am not sure that the logic behind desired_sync_time and such works 
very well: here the time it takes is not proportional to the number 
of events (the time per event actually decreases when the number of 
events handled at once increases, as most of the time is spent 
-wasted?- in the initial fetch).

Obviously if there was a way to build an index that better matches 
the fetches it would help, but I'm not quite sure that is possible (I 
haven't quite figured the whole minxid/maxxid/xip etc. thing yet).

Comments?

Jacques.



More information about the Slony1-general mailing list