Chris Browne cbbrowne
Thu Dec 1 16:29:53 PST 2005
We were discussing on IRC a notion for "saving memory" (recall the
various posts where people have seen slon processes bloat to ginormous
size due to there being enormous tuples); I thought I'd sum up some of
the thoughts (the really good ones being Jan's :-)) as well as adding
a bit of "aftermath thoughts."

The trouble is that, at present, how things work is we set up a cursor thus:
(presented in pseudocode to just give the idea...)

declare LOG cursor for select * from sl_log_1 where (plenty of
parameters!!!);

loop until done
  fetch 100 from LOG;
  for each entry
    Apply to subscriber
  done
done

The trouble with this is that if you have 100 entries in sequence that
are 5MB in size, because you have some table storing enormous
BLOB-like stuff, all 100 have to be loaded into memory several times.

100 x 5MB = 500MB, which probably becomes over 1GB by the time you're
through.

The alternative discussed:

declare LOG cursor for select [various fields], 
case when octet_length(log_cmddata < LIMIT) then log_cmddata else NULL end
from sl_log_1 where (same plenty of
parameters as before);

loop until done
  fetch RECCOUNT from LOG;
  for each entry
    if log_cmddata is not null then
      Apply to subscriber
    else
      select log_cmddata into temporary memory buffer and apply to subscriber
    endif
  done
done

octet_length() evidently can draw the size of the column out of the
TOAST table without needing to actually read in the data.

Memory usage is then based on three things:
 1.  LIMIT - what is the maximum size of a record that will be drawn in by the fetch?
 2.  RECCOUNT - number of records drawn in...
 3.  Well, when the size overflows LIMIT, you'll have ONE large record in memory at
     a time.  No way around that...

The practical upshot of that is that memory consumption is [LIMIT x RECCOUNT].

5000 bytes x 1000 records = 5MB, which might still need doubling, but
which is, regardless, rather less than 1GB :-)

A thought came to me, this morning as followup.  The above approach
has 2 "free parameters," LIMIT and RECCOUNT, which need to get
configured.

A thought is to have the slon control this via one input parameter,
MAXMEM, which would be the desired multiple of the two values.

Then we start by setting RECCOUNT to 100, LIMIT to MAXMEM / RECCOUNT,
and can then adapt, after each FETCH has been processed.

Then some rules can automatically modify those limits...  Here's
something arbitrarily made up that is probably not outrageous :-).

-> If we discover that "too many" records are being drawn in as
   singletons, say, > 10%, then we need to increase LIMIT

   LIMIT = LIMIT * 1.1
   if LIMIT > MAXMEM / 20 then  -- Surely we don't want a fetch < 20
      LIMIT = MAXMEM / 20
   endif
   RECCOUNT = MAXMEM / LIMIT
   

-> If we discover that "slightly too many" records are being drawn in as
   singletons, say, > 5%, then we need to increase LIMIT

   LIMIT = LIMIT * 1.05
   if LIMIT > MAXMEM / 20 then  -- Surely we don't want a fetch < 20
      LIMIT = MAXMEM / 20
   endif
   RECCOUNT = MAXMEM / LIMIT
   
-> If we discover that < 1% of records (a lot too few) are drawn in as singletons,
   then reduce LIMIT

   LIMIT = LIMIT / 1.1
   if LIMIT < 500 then
     LIMIT = 500
   end if
   RECCOUNT = MAXMEM / LIMIT

-> If we discover that < 3% of records (somewhat too few) are drawn in as singletons,
   then reduce LIMIT

   LIMIT = LIMIT / 1.05
   if LIMIT < 500 then
     LIMIT = 500
   end if
   RECCOUNT = MAXMEM / LIMIT

That'll tune towards some form of "reasonableness."  I don't think
"optimality" is the right word; the goal is to have LIMIT set high
enough that you normally process *nearly* all of the records without
needing the extra bit of "singleton" logic.
-- 
output = reverse("moc.enworbbc" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/lisp.html
Q: How many Newtons does it take to change a light bulb?
A: Faux!  There to eat lemons, axe gravy soup!


More information about the Slony1-general mailing list