[Slony1-general] Buffering problem

Mon Sep 19 18:40:41 PDT 2005

On 9/19/2005 10:31 AM, Philip Warner wrote:
> Christopher Browne wrote:
> 
>>The thought is to store record size in a new field on sl_log_1, call
>>it "log_cmdsize", and populate it at insert time.
>>  
>>
> May not need to store it; I would expect that the size of a text field
> could be determined cheaply (Jan would know - he wrote toast).
> 
>>We declare LOG cursor as being ...
>>
>>declare LOG cursor for select log_origin, log_xid, log_tableid, 
>>       log_actionseq, log_cmdtype, log_cmdsize
>>from sl_log_1 [with various other criteria] order by log_actionseq;
>>  
>>
> ...etc. Looks good to me, *except*, my reading of remote_worker.c made
> me believe it would loop retrieving 100 rows repeatedly, while another
> thread sends to the replicated db. If I am right, we would still need
> some way of pausing the 'pull' part of the pull->push mechanism. Or I
> misread the code -- having just looked at it again, it may only read 100
> at a time.

You didn't misread the code. It indeed buffers based on a compiled in 
number of rows only and doesn't take the size into account at all. So 
yes, the fetching thread needs to stop if the buffer grows too large. 
Since it does block if all buffers are filled, that part wouldn't be too 
complicated.

What gets complicated is the fact that the buffer never shrinks! All the 
buffer lines stay allocated and eventually get enlarged until slon 
exits. So even if you stop fetching after you hit large rows, slowly 
over time all buffer lines will get adjusted to that huge size. On some 
operating systems (libc implementations to be precise) free() isn't a 
solution here as it never returns memory to the OS, but keeps the pages 
for future alloc()s. The best way to tackle that would IMHO be to allow 
only certain buffer lines to be used for huge rows and block if none of 
them is available.

Jan

> 
> You may even be able to refine this (depending on cost of such things)
> by selecting substring(log_cmddata from 1 for 1024) so that small
> commands require no extra IO.
> 
>>The mechanism for efficiently pulling the detail data from sl_log_1
>>based on a set of keys probably requires generating a 2D array of
>>index values to pass in to a set-returning stored procedure.
>>  
>>
> You may find that reiterating the cursor is simplest, especially if we
> have length stored: you can select where log_cmdlength>1024.
> 
>>1.  Set the FETCH value(s) to 1 rather than 100 at compile time if you
>>know you have problems with Fat Rows.
>>  
>>
> See above. I have not tried recompiling with a value of 1, but I thought
> it would just loop. Maybe I misread the code.
> 
>>2.  We have to do something quite a bit cleverer, probably similar to
>>what I outlined, if we don't want to injure users that don't use Fat
>>Rows.
>>  
>>
> If we don't need to change the schema (eg. if length(log_cmddata) is
> cheap) then would 1.1 be possible?
> 
> 
> Thanks for the continuing help.
> 
> 
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at gborg.postgresql.org
> http://gborg.postgresql.org/mailman/listinfo/slony1-general

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #