[Slony1-general] Cutting down size of sl_seqlog

Wed Mar 14 13:20:25 PDT 2007

People have observed that the table sl_seqlog can grow to be rather
large, if you have a lot of sequences.  It, in effect, generates one
tuple per sequence per SYNC.

If you have 30 sequences, and generate a SYNC per second, then you
will have 1800 records generated per minute, irrespective of whether
or not any given sequence value changed during that period of time.

create table @NAMESPACE at .sl_seqlog (
	seql_seqid			int4,
	seql_origin			int4,
	seql_ev_seqno		int8,
	seql_last_value		int8,
) WITHOUT OIDS;

create index sl_seqlog_idx on @NAMESPACE at .sl_seqlog
	(seql_origin, seql_ev_seqno, seql_seqid);

There have been some vague mutterings going around now and again for
quite some time now to the effect that we really ought to find some
way of cutting down on the number of tuples at least for cases where a
sequence isn't being updated regularly.

My inchoate thoughts finally coalesced into something today.  The
thought...

- Augment sl_seqlog to present a range rather than a value, thus...

alter table sl_seqlog alter column seql_ev_seqno rename to seql_min_seqno;
alter table sl_seqlog add column seql_max_seqno int8;

drop index sl_seqlog_idx;
create index sl_seqlog_idx1 on sl_seqlog(seql_origin, seql_seqid);
create index sl_seqlog_idx2 on sl_seqlog(seql_min_seqno);
create index sl_seqlog_idx3 on sl_seqlog(seql_max_seqno);

- Rather than doing straight inserts,  we do one of two things:

   - If the current sequence value <> the most recent value, then

     insert into sl_seqlog (seql_seqid, seql_origin, seql_min_seqno, seql_max_seqno, seql_last_value)
       values (seql_seqid, node->no_id, [seqbuf], [seqbuf], seql_last_value);

    In this case, our tuple has an extra column, but there is not
    anything materially worse than what we have today.

  - If the current sequence value is equal to the previous one, then...

    update sl_seqlog set seql_max_seqno = [seqbuf] where seql_seqid = [seql_seqid] 
          and seql_origin = node->no_id  and seql_last_value = [seql_last_value] and
          seql_max_seqno = (select max(seql_max_seqno) from sl_seqlog 
                                           where [match sequence, origin]);

   In this case, we wind up with a dead tuple for each update, but
   this isn't really worse than what is the case now, as there are no
   extra tuples being generated, and, with this revision, there's only
   one live tuple that needs to be kept around for a sequence that is
   very infrequently updated.

Overall, this isn't worse than before as we aren't generating more
tuples than before.  It is *somewhat* better because we wind up
"killing" obsolete tuples (e.g. - ones where a SYNC does not lead to a
change in value) immediately.

It's not a LOT better, but I think it's at least *somewhat* better.
And perhaps someone can point out a way of doing better still, perhaps
to not bother generating some of the tuples altogether.

Thoughts?
-- 
"cbbrowne","@","linuxdatabases.info"
http://www3.sympatico.ca/cbbrowne/multiplexor.html
Rules of the Evil Overlord #206. "When my Legions of Terror park their
vehicle  to do  reconnaissance on  foot,  they will  be instructed  to
employ The Club." <http://www.eviloverlord.com/>