Wed Apr 23 11:56:13 PDT 2008
- Previous message: [Slony1-general] Catching up a large backlog: a few observations
- Next message: [Slony1-general] Catching up a large backlog: a few observations
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jacques Caron <jc at oxado.com> writes: > Hi all, > > There is a bit of an issue in the way slon gets events from the master > when there is a (very) large backlog: the remote listener thread tries > to read all the events available in memory, which means: > - the process can grow quite a lot, eat useful cache memory from the > DB (if slon is running on the same instance as the DB which is the > usual case), eventually start swapping, and it worsens the backlog > - and/or the process can actually exceed memory limits, exit and > restart, fetching events again > - the initial load of all available events itself may never complete > (if it exceeds memory limits), and thus no replication happens since > it won't start working until this initial load is complete > > One first and easy fix for the last problem is to add a simple "LIMIT > x" in remoteListen_receive_events. This will at least allow slon to > start handling events while more are loaded. In situations where > events can still be read a lot faster than they are handled (which is > usually the case), forcing a sleep in the loop helps, but I'm not sure > how this could be made to work in the general case. That sounds pretty plausible. I don't see any reason in the code why limiting the number of events processed should break anything. I think I'd want to set the limit based on a configuration parameter, but at first blush, the following seems reasonable: Index: remote_listen.c =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/src/slon/remote_listen.c,v retrieving revision 1.40 diff -c -u -r1.40 remote_listen.c cvs diff: conflicting specifications of output style --- remote_listen.c 6 Feb 2008 20:20:50 -0000 1.40 +++ remote_listen.c 23 Apr 2008 18:29:14 -0000 @@ -1,4 +1,4 @@ -/* ---------------------------------------------------------------------- +* ---------------------------------------------------------------------- * remote_listen.c * * Implementation of the thread listening for events on @@ -697,7 +697,7 @@ { slon_appendquery(&query, ")"); } - slon_appendquery(&query, " order by e.ev_origin, e.ev_seqno"); + slon_appendquery(&query, " order by e.ev_origin, e.ev_seqno limit 2000"); rtcfg_unlock(); > A further and better fix would be to also add a count of "outstanding" > events (that would be incremented when new events are loaded and > decremented once they have been handled), and to have the listener > thread sleep a bit when that count exceeds a given threshold. No need > to have tens of millions of events in memory (with the possible > complications given above) if we handle at most a few thousand at a > time... That's not a bad thought... This would *definitely* point at adding a config parameter or two... Yes, indeed, we maintain a variable on the queue (or perhaps across queues???), so that we have a count of the number of outstanding messages. - Every time a message is added to the queue in remote_listen.c, we add to the counter - Every time a message is processed from the queue in remote_worker.c, we decrement the counter - In remote_listen.c, any time the size of the queue is larger than "os_event_threshold" (defaults to > the LIMIT used in the query in remote_listen.c), then we sleep for "os_event_sleep" milliseconds before processing another iteration of the "event search loop." Alternatively, this could get more sophisticated, with some extra config parms: * os_event_limit - How many events to pull at a time, and the threshold for further stuff * os_event_initialsleep - If os_events > limit, then, initially, sleep this many ms * os_event_increment - When os_events continues to be > os_event_limit, add this to the sleep time * os_event_maxsleep - Don't let sleep time exceed this With defaults... os_event_limit = 2000 os_event_initialsleep = 2000 os_event_increment = 500 # add 0.5s each time os_event_maxsleep = 15000 Any time the queue shrinks below os_event_limit, then we reset the sleep time back to os_event_initialsleep. > I also found out that setting desired_sync_time to 0 and increasing > significantly sync_group_maxsize helps a lot when catching up. Is > there a specific reason to have a low default value for this? Since > it's bounded by the number of available events anyway, I'm not sure > how low values actually help anything -- at least when > desired_sync_time=0. There is a reason, if you're running log shipping; you might want to be sure that each SYNC is kept separate, so that you could most closely associate the set of data with its SYNC time. > Finally, when in some situations fetching from the log is slow (that > can happen when trying to fetch log items that happened during a > long transaction, as the bounds for the index search are quite > large), I am not sure that the logic behind desired_sync_time and > such works very well: here the time it takes is not proportional to > the number of events (the time per event actually decreases when the > number of events handled at once increases, as most of the time is > spent -wasted?- in the initial fetch). > > Obviously if there was a way to build an index that better matches > the fetches it would help, but I'm not quite sure that is possible > (I haven't quite figured the whole minxid/maxxid/xip etc. thing > yet). > > Comments? There are some improvements in 2.0 to the query on the log table, notably for the special case where you have a really long running transaction. For sure, the "desired_sync_time" is only an approximation. It has the implicit assumption that run time time for a set of SYNCs is be roughly proportional to the number of SYNCs, which isn't always true. Any policy that is applied here is necessarily an approximation, so I don't know that it is too likely to see *huge* improvements from a substitute policy. If you can describe another that is readily coded, I'll certainly listen :-). -- let name="cbbrowne" and tld="linuxdatabases.info" in String.concat "@" [name;tld];; http://linuxfinances.info/info/x.html "When campaigning, be swift as the wind; in leisurely march, majestic as the forest; in raiding and plundering, like fire; in standing, firm as the mountains. As unfathomable as the clouds, move like a thunderbolt." -- Sun Tzu, "The Art of War"
- Previous message: [Slony1-general] Catching up a large backlog: a few observations
- Next message: [Slony1-general] Catching up a large backlog: a few observations
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list