[Slony1-general] Proposal: Move gradually to sync_group_maxsize

Wed Dec 15 22:29:27 PST 2004

After having gone through a fair bit of non-fun in tuning watchdog
processes to be more delicate, I have noticed one place where I'd like
to tune slon behaviour a bit.

There's the "-g" option, which sets sync_group_maxsize, which
indicates the maximum number of SYNCs that will get grouped together.

I'd really like to handle that more conservatively in a couple of
ways:

 1.  I'd like to have a freshly-started slon process work its way up
     to sync_group_maxsize, as opposed to starting there.

     Reasoning:

     Suppose we have a slon that "goes bump in the night" for a while,
     leaving its node 800 syncs behind its provider.  And the "-g"
     option is trying to group together 50 syncs at a time.

     It is possible that what 'went bump' had something to do with the
     last slon run, which got 30 syncs in, and then fell over.  And
     when we try to do 50 syncs, it'll fall over for the very same
     reason.

     In such a case, perhaps sync #30 is an unusually atrociously
     large one, that size having to do with either slon or a
     postmaster running out of memory and falling over.

     In any of these cases, it would be nice if we started by doing
     just 1 SYNC, and working our way up to 50 gradually.

     Thus, in remote_worker.c, instead of 

     while (sync_group_size < sync_group_maxsize && node->message_head != NULL) {
        stuff...
     }

     we'd use...

     /* Define and initialize variable */
     static int our_group_max_size = 1;

     while (sync_group_maxsize < our_group_max_size && node->message_head != NULL) {
        if (our_group_max_size < sync_group_maxsize) {
          our_group_max_size++;
        }
        stuff...
     }

     This has the effect that if there's one Really Big SYNC that is
     taking something (a postmaster process somewhere runs out of
     memory?) down, it'll get the best chance possible of getting
     through those Really Big SYNCs without falling into the rut of
     falling over over and over.

     Coding this seems pretty trivial; I'd thought I'd solicit
     comments, if anyone has further thoughts.

 2.  Further, it would be rather nice to be able to say:

     "Hey!  We have just finished our 4th sync in this group, and it
     turns out that #4 contained 280,000 updates, meaning that that
     SYNC took 2 minutes to take effect.  This is more than plenty to
     get efficiencies by grouping work together.  No sense in going on
     to 50 in the group. Let's stop and COMMIT now."

     I'm not so sure that is possible/practical.  It looks as though
     once you decide how many SYNCs you're grouping together, that
     commits you up front to a query that might possibly pull more
     data than you wanted to bargain for :-(.
-- 
"cbbrowne","@","ca.afilias.info"
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)