[Slony1-general] Buffering problem

Mon Sep 19 08:23:41 PDT 2005

Thanks for the reply.

Christopher Browne wrote:

> Hey, if we can change the behaviour of slon so that it consumes Very
> Little Memory, or, rather, to prevent it from Behaving Badly, that
> hardly seems a bad thing.

That's what I'd like to do. If a slon process is killed, or a link goes
down too long, I end up with a DB that can not be made 'current'.
Sometimes, I just get a big update, and slon dies (and will not restart).

> (e.g. - when sl_log_1 entries are
>being applied, they are done in sets of 10) involves some records that
>are Very Large.
>  
>
Indeed; we can have a single record of 37MB (max).

>--> If size of current data being processed > 10MB then don't
>    bother moving on from record 7 to record 8; push the 
>    data out NOW and clear the data structure...
>  
>
This is the kind of approach I was thinking of; just stop fetching, and
wake up the code that does the replication.

>The only downside that I see is the cost of searching through the
>strings to see how big they are.
>  
>
Shouldn't be a problem; AFAICT the rows are fetched and stored in a
queue which does a realloc to store the data. Must know the size.

>That's with the "FETCH 100 FROM LOG" queries.  That's going to draw in
>100 rows from sl_log_1 each time, and if some are Pathological Big
>Records, _there_ lies the problem.
>  
>
I think that's part of the problem; we could fetch 10 at a time, but the
way  read the code, it will just loop and fetch the next 10. So either
way we'll consume lots of memory.

My proposed solution is to:

(a) use the 'max buffer space' as a guideline only
(b) use fetch 1 or fetch 10 (they both seem fast)
(c) when the current *used* buffer space (ie buffers in the
'to-be-processed' queue) exceeds the buffer limit (after the fetch-10 is
complete), then don't fetch the next 10. Wait for the used buffer space
to drop below an 'empty/restart-limit'.
(d) dealloc/free memory from the queues once processed (if not already done)

This means that the buffer limit would always be exceed, but by at most
10 fetches.If every one of those 10 log rows is 37MB, then I'll blow my
process address size again, but that is *very* unlikely.

Running a query to get the strig sizes (as you may have suggested above,
now I think about it) could/would prevent problems: "select
sum(length(log_cmddata)) from...". We could then adjust the fetch size
to 'fetch 1' if the total size will blow the current buffer size.

>This is NOT easily changeable at runtime, but if you're running into
>these problems, then I'd suggest decreasing these values in slon.h.
>  
>
Why is that? There are a lot of places that reference the values, but is
there a real problem with making them variable?

>Ideally, it would be nice for slon to figure out suitable values by
>itself.  How to do that when a bad choice would lead to slon running
>out of memory and falling over seems, erm, troublesome :-(.
>  
>
Using the "select sum(length(log_cmddata)) from..." approach, then using
either fetch-100 or fetch-1 based on the result seems like a good first
pass. Also has the advantage of no change from the users perspective
unless they have huge queries -- and then the change improves reliability.

To summarize, the basic approach would be:

 - prior to doing an open cursor, run "select sum(length(log_cmddata))
from...limit 100".
 - use this to determine fetch size (1 or 100).
 - open cursor/fetch etc. At the end of the fetch loop, see how much
buffered data we have in the queues. If too much, pause and store in
replicated DB.
 - repeat.

Does this sound broadly OK? Not sure how best to do the pause/restart.
Just sleep/wake? Use thread events? etc etc

Later, this could be refined by selecting lengths of next 100 and tuning
how many we fetch based on this knowledge. But as a first pass it seems
to satisfy the requirements.